Prediction Using Naive Bayes - Data Science for Lawyers

Lets go back in time to 1980. Judge Brennan had been on the Supreme Court since 1956 and remained a judge until 1989. Let’s try to predict his votes in the 1980s based on his voting history.

In the dataset, note that the voting code works as follows:

[1] means that he voted with the majority.

[2] means he dissented.

We will use three different machine learning algorithms on the same data to predict his voting choices in order to see which algorithm performs best. We starting again with Naive Bayes.


# Load packages.

library(e1071)

We begin by dividing our dataset in two parts: one pre-1980 and one post-1980. We use the pre-1980 to train our model and post-1980 to predict.

 # Creating training and test set.

voting_pre1980 <- voting[c(1:3368),c(2:10)]

We first train our model on the training data.

voting_post1980 <- voting[c(3369:4746),c(2:10)]

model <- naiveBayes(voting_pre1980[,-9], as.factor(voting_pre1980[,9]))

Next, we predict out of sample based on our test data.


prediction <- predict(model, newdata = voting_post1980[,-9])

prediction_Bayes <- as.data.frame(prediction)

Finally, we compare our actual row assignment to our prediction.

prediction_Bayes <- cbind(voting_post1980$vote,prediction_Bayes)

colnames(prediction_Bayes) <- c("vote","prediction")
head(prediction_Bayes)

##

vote prediction

1 majority majority

2 majority majority

3 majority dissent

4 majority majority

5 majority majority

6 majority majority

So, how well did our prediction perform? To compare the quality of our prediction we determine the number of correct predictions

# We again calculate the number of correct assignments.

hits <- 0
for (row in 1:nrow(prediction_Bayes)) {
if (prediction_Bayes$vote[row] ==prediction_Bayes$prediction[row] ) {
hits <- hits+1
}
}

correctness_Bayes <- hits/length(prediction_Bayes$vote)

correctness_Bayes

 ## [1] 0.6748911

In 67% our prediction proved correct. This is far off from perfection, but better than a 50:50 guess.

To further evaluate the performance of the algorithm, we can take a look at the confusion matrix. If the algorithm had predicted all values correctly, all actual decisions (rows) would match the predicted decisions (columns) and the lower left and upper right cell would be 0.

# Compare the results in a confusion matrix

table(prediction_Bayes$vote,prediction_Bayes$prediction)

##

dissent majority

dissent 184 275

majority 173 746

We see that the algorithm got it wrong both ways. Some dissents were mistakenly predicted as majority votes and some majority votes were mistakenly predicted as dissents.

Last update February 16, 2021.