Exercises - Data Science for Lawyers

1) Customizing your stop word list

You probably noticed that amongst the most frequent words are terms that tell us little about a decision’s content. For instance, it is unsurprising that “court” or “canlii” appears often. Take the list of most frequent words as a basis to create a legal stopword list.
1. Write your stopwords into the first column of a csv file. (Hint: Use the word frequency table to help you identify unwanted words)
2. Then upload the file and run your analysis again. What do you find?

# This sample code will help you.

stopwords <- read.csv("stopwords.csv", header = FALSE)

stopwords <- as.character(stopwords$V1)

stopwords <- c(stopwords, stopwords("english"))

corpus <- tm_map(corpus, removeWords, stopwords)

2) Expanding the sentiment analysis

1. Go back to the sentiment counts for each judgment. We saw that all decisions contain more positive words than negative ones. But do some contain more positive or negative words than others. Specifically, you may want to add your sentiment counts to the table with the success/failure classifications. Do decisions use more negative words when they dismiss an appeal?

2. What else, apart from sentiment, could you look for in these judgments? For instance, you may want to check how much latin legalese Canadian judges use. By adapting the sample code for stop words from above, you can create custom dictionaries to count the frequency of terms that you are interested in.

Last update May 11, 2020.