We now want to extract instances where another Supreme Court decision is cited within our text. For that, we need to loop through all our texts and extract the citation.
As output, we want to have a dataframe with 2 columns. Column 1 will contain the case that contains the citation. Column 2 will contain the case that was cited.
Since we have multiple cases, we need to do a for-loop. We start with an empty shell that we will populate with citations.
# Here our dataframe has 2 columns for citing and cited cases.
all_citations <- data.frame(matrix(ncol = 2, nrow = 0))
Our loop will do two things. First, it will extract the cases that were cited. Second, we will add the case that had the citation in its text.
for (row in 1:nrow(scc)) {
# we first focus on the text of each decision
case_text <- scc[row, "text"] # we first focus on the text of each decision
# and search for our pattern in it
citation_matcher <- gregexpr("\\[\\d+]\\s\\d+\\sS.C.R.\\s\\d+", case_text)
# we then save all the cases cited in that decision
citations <- regmatches(case_text, citation_matcher)[[1]]
# second we focus on the name of the citing case
case_name <- scc[row, "doc_id"]
# and repeat so that every cited case can be accompanied by the name of the decision citing it
case_name <- strrep(case_name,length(citations))
case_name <- strsplit(case_name,".txt")[[1]]
# then we match cited and citing cases
citation_list <- cbind(case_name, citations)
# and save all these references in a dataframe
all_citations <- rbind(all_citations,citation_list)
}
Here, we see that a case will often cite another case multiple times. It is possible to incoporate this information into an analysis by using it to determine the weight of various network ties. Ties would be weighted more heavily when one case cites another multiple times.
Today, however, we want to simplify our data so that cases are connected by a tie irrespective of the number of times a citation occurs.
# We do that by eliminating duplicates.
all_citations <- unique(all_citations)
access_time Last update May 11, 2020.