Scraping exercises

Most of the code, however, has been changed because, as time goes by, the web content is modified and old code does not work anymore.

Remember

Exercise 1

Extract all the information load on table.

Extract all the papers names, from 001-30 to 268-30

HINT: Use selectorgagdget to see that selector cite is asso ciated with the paper titles.

Extract all the options (Countries) availables on select button.

Extract all the topics available on the url.

Extract all inmobiliaries names published on first page.

Consider the url=‘http://www.dictionary.com/browse/’ and the words ‘handy’,‘whisper’,‘lovely’,‘scrape’.

Build a data frame, where the first variables is “Word” and the second variables is “definitions”. Scrape the definitions from the url.

Write a script to find out which actor appears in higher number of Star War movies.
Hint: The idea is similar to the previous exercise but with a litlle more work you can

Start in a page that gives you access to the list of Star Wars Films (try googling “Star Wars IMDB”)
From here write a function to extract the authors list from an IMDB whose URL is providd
Apply the function to the URLs list and
Tabulate or plot