Introduction

A relatively-low tech, but highly effective information retrieval technique is regular expressions. Most of you already use key words to find and retrieve information from full texts. Regular expressions – or short: regex – are like key word searches, but better. Rather than looking only for words, regexes look for patterns.

Fortunately for lawyers and legal researchers, much in law is based on patterns: consistent document identification, standardized citations, formalized text structures and so forth.

Uses of regular expressions

In legal data science, regexes serve two basic purposes.

  • 1) Document Segmentation: For some applications we want to work with parts of a document rather than the entire text. It is thus useful to segment contracts or treaties into constituent articles. Regexes help with that.
  • 2) Informational Retrieval: In other contexts we use regexes to identify and extract information we are interested in. For example, we could extract all the dates, email addresses or citations in a document. Again, regexes help us accomplish this.
What we will do in this lesson

1. What is a Regex?
2. Integrating Regexes into R Code
3. Using Regexes for Text Segmentation
4. Using Regexes for Information Retrieval

Useful Resources

access_time Last update May 8, 2020.

chat networking coding local-network layer menu folders diagram panel route line-chart compass search flow data-sharing search-1 message target translator candidates studying chat networking coding local-network layer menu folders diagram panel route line-chart compass search flow data-sharing search-1 message target translator candidates studying