Lesson 1

Getting started

Introduction

Getting started

Version March 2018

This script provides a concise introduction to basic functionalities of R.

R (and for that matter any programming language) is hard to grasp at first. But the learning curve is steep. Moreover, there are a multitude of free resources available online that provide guidance and support to learn on your own. Also disclaimer: I am law professor, not a programmer. So other resources will be better at teaching you how to program. But if you are interested in leveraging data science for law in the programming environment of R, keep on reading.

Setting up R and RStudio

To get started, you will need to download two applications. First, you will need to download the program R here. Second, you should also download the user-friendly interface for working with R called RStudio here. A word on programming before we start. Programming is a bit like cooking using a recipe. The script is the recipe that you follow. The console is the stove you cook on, i.e. execute the code from your script. Like cooking your favorite recipe, the script will ensure that you can execute the same code today, in a week or in a year. You don't have to remember how you did it last time, instead, you just need to follow the steps in your script. Two consequences flow from this. First, whenever you make changes or try out new alternatives, write them in the script (or create a new script) and not in the console. Otherwise you risk losing the information how you did it. Second, abundantly comment on your code to make sure that several months from now, you know what you did and why you did it so that you can change your code or customize it to new data. In R, you comment with the # symbol as you will see below. Now, we are ready to go.
Basic calculations in R
# At its very basic, R is a calculator. You type in an equation and it returns the answer.

1+1
## [1] 2
16/4
## [1] 4
6^8.5
## [1] 4114202
# As discussed above, you can comment on your code in script using #. 
# This way, R will understand that this part of your script is a comment and will only execute your actual code. 


# But you can also store information you have created. 
# In the upper right hand corner of RStudio you see all variables you have created in that session.

x<-1
y<-2+2


# To print the information you have stored you can either type in the name of the variable or do print().
y
## [1] 4
print(y)
## [1] 4
# You can perform operations on the variables you have created.
x+y
## [1] 5
# We call these variables you create OBEJECTs. You have almost complete liberty to name your object.
silly_name<-5+3
silly_name
## [1] 8
Object classes

What makes R powerful is that you can not only work with numbers but also with other types of data.

More specifically there are three types of data forms that we will use.

  • numerical data e.g. 1; 67; 5.56541
  • logical data i.e. TRUE, FALSE
  • character data e.g. "Hello World"
                            # To determine the type of data you simply ask class().
                            class
                            (silly_name)
                          
## [1] "numeric"
                            another_silly_name
                            <-
                            "Hello World"
                            class
                            (another_silly_name)
                          
## [1] "character"
                            # Commands like class() or print() are FUNCTIONS. You can perform functions on R objects.
                            # Whenever you do not know what a function does you can ask using "?".
                            ?
                            class
                            ()
                            # Up to now, we have been dealing with single values: one integer or one string. You can aggregate these values into VECTORS.
                            # To do so you aggregate values with c().
                            numeric_vector
                            <-
                            c
                            (
                            1
                            ,
                            2
                            ,
                            3
                            ,
                            4
                            ,
                            5
                            )
                            numeric_vector
                          
## [1] 1 2 3 4 5
                            # A more efficient way to create the same vector would be:
                            numeric_vector
                            <-
                            c
                            (
                            1
                            :
                            5
                            )
                            numeric_vector
                          
## [1] 1 2 3 4 5
                            # You can create vectors with character strings, too.
                            character_vector
                            <-
                            c
                            (
                            "Days"
                            ,
                            "Months"
                            ,
                            "Year"
                            )
                            character_vector
                          
## [1] "Days"   "Months" "Year"
                            # In turn, vectors can be aggregated into MATRICES and DATAFRAMES. 
                            # They are essentially tables. 
                            # Matrices have to be of the same data type whereas dataframes can combine different data types.
                            # We will work mostly with dataframes.
                            # For instance, we may want to create a dataframe that collects information on US Presidents for a given year.
                            # We can create two vectors and combine them into a dataframe.
                            Years
                            <-
                            c
                            (
                            2015
                            ,
                            2016
                            ,
                            2017
                            ,
                            2018
                            )
                            US_Presidents
                            <-
                            c
                            (
                            "Obama"
                            ,
                            "Obama"
                            ,
                            "Trump"
                            ,
                            "Trump"
                            )
                            Pres_data
                            <-
                            data.frame
                            (Years, US_Presidents,
                            stringsAsFactors
                            =
                            FALSE
                            )
                          

Exercices

There are a couple of sample dataframes in R. As an exercise, we will work with some of them. We will also train your “help yourself” instincts. There is ample help on the web. Platforms like
stackoverflow provide great answers to most of your R-related questions.

Let’s give it a try.

Example 1)

                            # Load the data set on USA Arrest rates.
                            data
                            (
                            "USArrests"
                            )
                          

Answer the following questions (with the help of online resources).

  1. Sort the data by the Murder rate. Which state has the highest murder rate?
  2. What is the average murder rate across all states?
  3. What is the correlation between urban population and murder rates?

Example 2)

                            # Let's look at some data on judges.
                            data
                            (
                            "USJudgeRatings"
                            )
                          

  1. What judge has the highest overall rating?
  2. Which category is the highest rated overall?

Example 3)

                            # Finally, let's take a look at Canada and the Canadian lynx dataset.
                            data
                            (
                            "lynx"
                            )
                          

  1. Plot the number of lynx hunted every year.
  2. Try different plot types. Which visualization is most appropriate?

Dataset

Mirum est notare quam littera gothica, quam nunc putamus parum claram, anteposuerit
litterarum formas humanitatis per seacula quarta decima et quinta decima. Eodem modo
typi, qui nunc nobis videntur parum clari, fiant sollemnes in futurum. Claritas est
etiam processus dynamicus, qui sequitur mutationem consuetudium lectorum eleifend option
congue nihil imperdiet.

chat networking coding local-network layer menu folders diagram panel route line-chart compass search flow data-sharing search-1 message target translator candidates studying