Now, let’s pick up where we left off in Lesson 3 and use regex to extract the citations from court decisions.
Here, we will once again use Supreme Court of Canada data.
# Activate package
# Load the Supreme Court of Canada example data into a folder on your hard drive. Write the path to that target folder.
folder <- "~/Google Drive/Teaching/Canada/Legal Data Science/2019/Data/Supreme Court Cases/*"
# Upload the texts from that target folder.
scc <- readtext(folder)
We now want to identify the citations of other Supreme Court decisions found in these decisions. How can we do that?
Every Supreme Court of Canada decision has a unique reference based on its Supreme Court Report. These references use the following format: “ 1 S.C.R. 467.” We can use the unique case reference to build our regex.
The regex would thus be “\\[\\d+]\\s\\d+\\sS.C.R.\\s\\d+”
pattern <- "\\[\\d+]\\s\\d+\\sS.C.R.\\s\\d+"