Dictionary Approach I: Term Mapping - Data Science for Lawyers

Dictionary methods relate an outside list of terms to our corpus. This can be useful for a range of tasks. We may want to automatically check for the presence of absence of certain terms as part of a content analysis. Or we may want to select specific documents where specific signalling terms from our dictionary appear.

In this lesson, we use Dictionary methods for two tasks. First, we want to know whether appeals to the Supreme Court of Canada were successful or not. To do so, we will map signalling terms to automate this assignment. Second, we use a sentiment dictionary to determine whether the judges use a positive or a negative tone when in their judgements.

Here, we tackle the first of these tasks – classifying the outcome of decisions. For that, we can come up with two signalling terms that denote each outcome. Of course, rather than using one signalling term, we could use several.

# Load library

library(stringr)

success_formular <- "should be allowed"

reject_formular <- "should be dismissed"

We then check whether that word sequence is present and append it to our cases as success or reject.

success <- str_count(scc$text, success_formular)

reject <- str_count(scc$text, reject_formular)

Since each of these terms is potentially repeated, we substitute occurrences with “yes” and the absence of the term with “no”.

success <- gsub("1|2","yes",success)

success <- gsub("0","no",success)

reject <- gsub("1|2","yes",reject)

reject <- gsub("0","no",reject)

Finally, we add the success and failure column to our dataset.

scc$success <- success

scc$failure <- failure

Take a look at the dataset using the View(scc) command. You will notice that some SCC cases are classified as both success and reject. How can that be? If you open the SCC judgment [2013] 1 S.C.R. 61.txt you will see that the Supreme Court allowed part of the appeal and dismissed another part. So our assignment of success and reject turned out to be correct. In general, it is prudent double check your results for accuracy.

Because legal texts often follow specific drafting rules and convention, such rule-based term mapping can work surprisingly well to map the contents of legal documents. But simple word mapping rules do not work for all tasks. Where no uniform signalling terms exist, researchers are better off to resort to more flexible machine-learning approaches that we will discuss in the section on Classification and Prediction.

Last update May 11, 2020.