Now that we have a pre-processed corpus, we can represent our text as data. More precisely, we can display it as a term-document-matrix where each row is a term and each column is a document, i.e. one of our judgments.
# Create a term-document matrix.
tdm <- TermDocumentMatrix(corpus)
tdm <- as.matrix(tdm)
We can then proceed to the corpus analysis. For instance, we may want to find the most frequent words.
# We can for example sort our matrix by the word frequency in our corpus.
frequent_words <- sort(rowSums(tdm), decreasing=TRUE)
# Here are our most frequent words.
head(frequent_words, 8)
##
court canlii para act scc scr canada rights
1881 1812 1488 1415 1270 1246 1217 1216
access_time Last update May 11, 2020.