Creating a Term-document Matrix - Data Science for Lawyers

Now that we have a pre-processed corpus, we can represent our text as data. More precisely, we can display it as a term-document-matrix where each row is a term and each column is a document, i.e. one of our judgments.

# Create a term-document matrix.

tdm <- TermDocumentMatrix(corpus)

tdm <- as.matrix(tdm)

We can then proceed to the corpus analysis. For instance, we may want to find the most frequent words.

# We can for example sort our matrix by the word frequency in our corpus.

frequent_words <- sort(rowSums(tdm), decreasing=TRUE)

# Here are our most frequent words.

head(frequent_words, 8)

##

court canlii para act scc scr canada rights

1881 1812 1488 1415 1270 1246 1217 1216

Last update May 11, 2020.