Creating a Term-document Matrix

Now that we have a pre-processed corpus, we can represent our text as data. More precisely, we can display it as a term-document-matrix where each row is a term and each column is a document, i.e. one of our judgments.

# Create a term-document matrix.
tdm <- TermDocumentMatrix(corpus)

tdm <- as.matrix(tdm)

We can then proceed to the corpus analysis. For instance, we may want to find the most frequent words.

# We can for example sort our matrix by the word frequency in our corpus.

frequent_words <- sort(rowSums(tdm), decreasing=TRUE)
# Here are our most frequent words.

head(frequent_words, 8)
 ##
court canlii para act scc scr canada rights
1881 1812 1488 1415 1270 1246 1217 1216

access_time Last update May 11, 2020.

chat networking coding local-network layer menu folders diagram panel route line-chart compass search flow data-sharing search-1 message target translator candidates studying chat networking coding local-network layer menu folders diagram panel route line-chart compass search flow data-sharing search-1 message target translator candidates studying