# Lesson 4

### Introduction

Networks provide a powerful perspective for better understanding the law for several reasons.

First, the law itself is full of networks: statutes that refer to each other, cases that cite prior cases, contracts that connect contractors, lawyers that work together and so forth. Network analysis helps understand the law’s network structures and effects.

Second, network analysis comes with an entire toolkit to investigate network structures. Many of these measures have a foundation in human intuition. Consider the simple network below.

Ask yourself a couple of questions about this network.

• – Who is the most powerful actor in that network?
• – Who would you like to be friends with?
• – What new link is likely to be created next?
• – What is the likely social role of A or B?
• – What happens if A leaves the network?
• – What real life networks could this image represent?

Network analysis provides a framework to turn these intuitions into metrics. A node that has many connections, like node B, has a high “degree centrality” in the network. Node A, in contrast, has a high “betweenness” score, because it connects different groups of the same network.

Third, network analysis is scalable. That means it allows researchers to quickly analyze large amounts of information. What is the most important decision of the Canadian Supreme Court in the last 20 years? A network analysis of citations to Canadian Supreme Court decisions can provide an answer in seconds without requiring a manual review of thousands of cases.

Finally, network analysis provides an appealing way to visualize legal information

###### What we will do in this lesson

We will primarily work on one aspect of legal network analysis: the citation networks of courts.

• 1. Using regexes to find citations
• 2. Creating a citation list
• 3. Finding most cited cases
• 4. Visualizing networks
• 5. Network measures

### R Script

##### Using Regexes to Find Court Citations
Picking up where we left off in Lesson 3, let's use regex to extract citations of court decisions. As example, we will again be using the Canadian Supreme Court data.
```# Activate package
```
```
library(readtext)
```
``` # Load the example data of Canadian Supreme Court cases into a folder on your hard drive. Write the path to that target folder.

```
```folder <- "~/Google Drive/Teaching/Canada/Legal Data Science/2019/Data/Supreme Court Cases/*"
```
```
# Upload the texts from that target folder.
```
```
scc <- readtext(folder)
```
We now want to identify citations in these decisions to other Supreme Court decisions. How can we do that? Every Canadian Supreme Court decision has a unique reference based on its Supreme Court Report, follows the format: "[2013] 1 S.C.R. 467." We can use the unique case reference to build our regex. The regex would thus be "\\[\\d+]\\s\\d+\\sS.C.R.\\s\\d+"
```
pattern <- "\\[\\d+]\\s\\d+\\sS.C.R.\\s\\d+"

```
##### Creating a Citation List

We now want to extract all citations to other supreme court decisions from our texts. For that, we need to loop over all our texts and extract the citation.

As output, we want to have a dataframe with 2 columns. Column 1 will indicate the citing case. Column 2 will indicate the cited case.

Since we have multiple cases, we need to do a for-loop. We start with an empty shell that we will populate with citations.

``` # Here our dataframe has 2 columns for citing and cited cases.

```
```all_citations <- data.frame(matrix(ncol = 2, nrow = 0))
```

Our loop will do two things. First, it will extract all cited citations. Second, we will add the corresponding citing cases.

```
for (row in 1:nrow(scc)) {
```
```# we first focus on the text of each decision

```
```
case_text <- scc[row, "text"] # we first focus on the text of each decision
```
```# and search for our pattern in it

```
```citation_matcher <- gregexpr("\\[\\d+]\\s\\d+\\sS.C.R.\\s\\d+", case_text)
```
```# we then save all the cases cited in that decision

```
```citations <- regmatches(case_text, citation_matcher)[[1]]
```
```# second we focus on the name of the citing case

```
```
case_name <- scc[row, "doc_id"]
```
```# and repeat so that every cited case can be accompanied by the name of the decision citing it

```
```case_name <- strrep(case_name,length(citations))
```

```
case_name <- strsplit(case_name,".txt")[[1]]
```
```# then we match cited and citing cases

```
```
citation_list <- cbind(case_name, citations)
```
```# and save all these references in a dataframe

```
```all_citations <- rbind(all_citations,citation_list)
}

```

Currently, we retain the information that the same case can cite another case multiple times.This could for instance be used to weigh network ties later on. Ties would then be stronger where one case cites another multiple times.

Today, however, we want to simplify our data so that cases are connected by a tie irrespective of the number of times a citation occurs.

```
# We do that by eliminating duplicates.

```
```
all_citations <- unique(all_citations)
```

##### Finding Most Citing and Cited Cases

We can now use our citation list to find the most citing and cited cases. To this end, we use the function table(). This function provides us with frequencies of the values in our dataframe. To make it easier to identify the most cited or citing case, we also sort our dataframe.

We start with the most citing of our 25 cases.

```# Based on our citation list, we can find the most citing and most cited cases in our list.

```
```
most_citing <- as.data.frame(sort((table(all_citations\$case_name)), decreasing = TRUE))
```

```most_citing
```
``` ##
Var1 Freq1    [2013] 1 S.C.R. 61   532   [2013] 1 S.C.R. 623   423     [2015] 1 S.C.R. 3   374   [2015] 2 S.C.R. 398   355   [2013] 1 S.C.R. 467   316   [2015] 3 S.C.R. 511   317  [2013] 3 S.C.R. 1101   278   [2015] 1 S.C.R. 613   249   [2014] 2 S.C.R. 167   2310  [2016] 1 S.C.R. 130   2311 [2015] 3 S.C.R. 1089   2212   [2016] 1 S.C.R. 99   2013  [2013] 2 S.C.R. 227   1914  [2015] 1 S.C.R. 161   1715  [2014] 2 S.C.R. 256   1516     [2014] 1 SCR 704   1217  [2016] 1 S.C.R. 180   1118 [2017] 1 S.C.R. 1069   1119  [2016] 2 S.C.R. 720   1020  [2014] 2 S.C.R. 447    921  [2015] 2 S.C.R. 548    922 [2017] 1 S.C.R. 1099    923 [2013] 3 S.C.R. 1053    624  [2014] 1 S.C.R. 575    525    [2016] 2 S.C.R. 3    3

```

Hence, the most citing case in our network is Quebec (Attorney General) v. A, 2013 SCC 5, [2013] 1 S.C.R. 61.

We can do the same thing with the most cited cases. This list will be considerably longer, so we restrict ourselves to the 50 most cited cases.

```
most_cited <- as.data.frame(sort((table(all_citations\$citations)), decreasing = TRUE))
```

```most_cited[1:50]
```
``` ##                    Var1 Freq1    [1985] 1 S.C.R. 295    82    [2004] 3 S.C.R. 511    83    [1986] 1 S.C.R. 103    64    [2009] 2 S.C.R. 567    55    [1995] 3 S.C.R. 199    56   [1990] 1 S.C.R. 1075    57   [1997] 3 S.C.R. 1010    58    [2008] 2 S.C.R. 483    49    [1996] 2 S.C.R. 507    410   [2005] 3 S.C.R. 388    411   [2010] 3 S.C.R. 103    412   [2013] 1 S.C.R. 623    413    [1998] 1 S.C.R. 27    414  [2013] 3 S.C.R. 1101    415   [2012] 1 S.C.R. 433    416   [2014] 2 S.C.R. 257    417   [1986] 2 S.C.R. 713    318   [2011] 1 S.C.R. 396    319   [2012] 3 S.C.R. 555    320   [1999] 2 S.C.R. 203    321   [2000] 2 S.C.R. 307    322   [2005] 3 S.C.R. 458    323    [2013] 1 S.C.R. 61    324   [1984] 2 S.C.R. 335    325   [1998] 2 S.C.R. 217    326   [2002] 2 S.C.R. 235    327   [2003] 2 S.C.R. 236    328   [2004] 3 S.C.R. 550    329   [2003] 3 S.C.R. 571    330    [2010] 1 S.C.R. 44    331   [2011] 3 S.C.R. 134    332   [1984] 2 S.C.R. 145    333   [1999] 1 S.C.R. 688    334   [2010] 2 S.C.R. 650    335   [2004] 3 S.C.R. 698    336   [1989] 1 S.C.R. 927    237   [1990] 3 S.C.R. 697    238   [1995] 2 S.C.R. 513    239   [1996] 1 S.C.R. 825    240   [2004] 2 S.C.R. 551    241   [2006] 1 S.C.R. 256    242   [2007] 2 S.C.R. 610    243   [2008] 1 S.C.R. 190    244   [2011] 1 S.C.R. 160    245   [2011] 3 S.C.R. 471    246   [2011] 3 S.C.R. 654    247   [2013] 1 S.C.R. 467    248    [1988] 1 S.C.R. 30    249  [1989] 1 S.C.R. 1296    250   [1989] 1 S.C.R. 143    2
```

Hence, the most cited case in our network is R v Big M Drug Mart Ltd., [1985] 1 S.C.R. 295 and Haida Nation v British Columbia, [2004] 3 S.C.R. 511. The former is landmark Charter case and the latter is a leading case on the Crown's duty to consult Aboriginal groups.

Since we worked with a small sample of 25 decisions for this analysis, we should not make too much of these findings. But if we applied this analysis to all Canadian Supreme Court cases, we would fine meaningful patterns as to which are the most cited and most citing decisions of all time.

##### Visualizing Networks

We can next proceed to visiualizing this network in R.

A couple of packages in R deal with network analysis . The most popular of them is igraph.

```library(igraph)
```
```
# The input is our 2-column matrix with the citation list.

```
```citation_network <- graph_from_edgelist(as.matrix(all_citations), directed = TRUE)
```

We can then use the plot() function to visualize the network.

```
plot(citation_network, layout=layout_with_fr, vertex.size=4,
vertex.label.dist=0.5, vertex.color="red", edge.arrow.size=0.2, vertex.label.cex=0.4)

```
``` ##

```

While igraph is highly customizable, I personally prefer to work with self-standing network visualization tools. One of them is called "visone". It can be downloaded free of charge online: http://visone.info.

##### Network Measures
The igraph package also allows to calculate network measures. Several network measures exist that correspond to different node characteristics. Here we want to focus on two of them that are particularly useful for legal citation analysis:
1. Authority scores measure the importance of a node in a network based on the inward ties it attracts. In a legal citation network, this would correspond to a particularly important precedent.
2. Hub scores measure the importance of a node in a network based on the outward ties it sends. In a legal citation network, this would correspond to a decision that well describes the status of the law by citing all relevant authorities.
We start by calculating the authority scores of our nodes to find the most important precedents.
```authority_score_network <- authority_score(citation_network, scale = TRUE, weights = NULL, options = arpack_defaults)
```
```citation_authority <- as.data.frame(sort(authority_score_network\$vector, decreasing=TRUE))
```
Let's compare the raw scores based on the number of citations with the authority scores. You notice that the ranking is a bit different since authority scores weigh the importance of inward citations based on the importance of the citing node.
```head(most_cited)
```
` ## Var1 Freq`
1 [1985] 1 S.C.R. 295 8
2 [2004] 3 S.C.R. 511 8
3 [1986] 1 S.C.R. 103 6
4 [2009] 2 S.C.R. 567 5
5 [1995] 3 S.C.R. 199 5
6 [1990] 1 S.C.R. 1075 5
```

```
```
head(citation_authority)

```
` ## sort(authority_score_network\$vector, decreasing = TRUE)`
[1985] 1 S.C.R. 295 1.0000000
[1986] 1 S.C.R. 103 0.8237840
[2009] 2 S.C.R. 567 0.7722282
[1995] 3 S.C.R. 199 0.7282003
[1986] 2 S.C.R. 713 0.5659069
[2011] 1 S.C.R. 396 0.5063234
```

```
We then turn to hub scores to find decision that well describes the status of the law by citing all relevant authorities.
```hub_score_network <- hub_score(citation_network, scale = TRUE, weights = NULL, options = arpack_defaults)
```
```citation_hub <- as.data.frame(sort(hub_score_network\$vector, decreasing=TRUE))

```
Let's again compare the raw scores based on the number of outward citations with the hub scores. You notice again that the ranking is a bit different, because hub scores weigh the importance of outward citations based on the importance of the cited node.
```head(most_citing)
```
` ## Var1 Freq`
1 [2013] 1 S.C.R. 61 53
2 [2013] 1 S.C.R. 623 42
3 [2015] 1 S.C.R. 3 37
4 [2015] 2 S.C.R. 398 35
5 [2013] 1 S.C.R. 467 31
6 [2015] 3 S.C.R. 511 31
```

```
```head(citation_hub)
```
` ## sort(hub_score_network\$vector, decreasing = TRUE)`
[2013] 1 S.C.R. 61 1.0000000
[2015] 1 S.C.R. 3 0.4699375
[2013] 1 S.C.R. 467 0.4162001
[2015] 2 S.C.R. 398 0.3568186
[2015] 1 S.C.R. 613 0.3111047
[2013] 3 S.C.R. 1101 0.2818147
```

```
Which network metrics best describe the varying characteristics of a legal case is an area of ongoing research. Generally, however, different network measures correspond to different attributes of a case. Selecting an appropriate network measure thus depends on the legal characteristic one is trying to assess.

### Dataset

Sample of cases from the Canadian Supreme Court in txt format. [Download]