Introduction - Data Science for Lawyers

On a technical level, prediction is not very different from the classification through supervised machine learning we did in Lesson 7. On a conceptual level, however, it is a world apart.

Prediction is speculating about the future

When we classify a text, we essentially summarize its content under a label. There is nothing speculative or inter-temporal about a classification. When we predict, however, we use the past to speculate about the future. Since we have an imperfect understanding of what determines future events, prediction is fraught with uncertainties. As a result, we must be extra careful when we interpret and rely on predictive results.

The danger of dumb predictions

Machine learning algorithms make it easy to generate predictions. Indeed, predicting is easy. Anyone with access to some sample code can predict an outcome. That gives rise to what I call “dumb predictions”: predictions that lack a causal theory and a deeper understanding of the input data.

While predictions are extremely useful in practice, the researchers and lawyers relying on them should make sure that these are “smart predictions”. Smart predictions are rooted in a sound causal theory that connects causes to outcomes as best as we can. Smart predictions also require studying the input data to detect missing variables, biases and other limitations that make predictions less reliable.

In short, prediction is easy. Smart predictions are hard.

What we do in this lesson

In this lesson, my main point is to show you how easy it is to predict. We will do what is common in practice: use an existing dataset and try different machine learning algorithms to see which one performs best. Keep in mind that the deeper challenge lies not in predicting, but in predicting well.

1. Loading WJBrennan Voting
2. Prediction Using Naive Bayes
3. Prediction Using Support Vector Machines
4. Prediction Using K-Nearest Neighbour

Last update May 11, 2020.