The life-cycle of any legal data science project starts with getting your data into R.
The great thing about R is that it allows you to upload data from your local machine, and scrape the web for data (that is, processing websites and downloading their content). Webscraping is an art and skill of its own, so today’s lesson will only scratch the surface.
A word of warning
Some websites prohibit webscraping in their terms of service. As such, you should always double check to see if you are allowed to webscrape a page and, if in doubt, it is best to contact the website’s owner. Websites may also make their data available through other means, such as APIs.
Learning from your error (messages)
As we get into more complicated coding activities, you may encounter errors, i.e. your code is just not working. There are many reasons for this which range from typos in your function to forgetting to activate a package. Don’t be discouraged though. Often errors are wasy to fix. R will give you an error message indicating the source of the error. The most important thing is that you learn from your error messages. They will help you identify the source of the problem and will help you fix it. As we discussed in the first lesson, there also is plenty of help online that will enable you to resolve the issue.
What we do in this lesson
There are many ways to upload data into R. Today we will just consider 4 different methods. We will also teach you how to interact with your working directory and how to install packages.
1. Setting a Working Directory
2. Installing Packages
3. Loading and saving csv files
4. Upload text files
5. Read and upload pdfs
7. Working with XML data