Prediction Using Support Vector Machines - Data Science for Lawyers

We now repeat the same exercise but with another algorithm: Support Vector Machines.

Again, we start by training our model on the training data to then predict out of sample.

# Training the model.


model <- svm(voting_pre1980[,-9], as.factor(voting_pre1980[,9]))

model <- svm(voting_pre1980[,-9], voting_pre1980[,9], kernel ="polynomial", degree = 18, cost = 3)

# Predicting out of sample.


prediction <- predict(model, voting_post1980[,-9])

prediction_SVM <- as.data.frame(prediction)

Finally, to evaluate the performance of our algorithm, we again compare our actual row assignment to our prediction and calculate the percentage of accurately predicted results.

prediction_SVM <- cbind(voting_post1980$vote,prediction_SVM)

colnames(prediction_SVM) <- c("vote","prediction")
head(prediction_SVM)

##

vote prediction

3369 majority majority

3370 majority majority

3371 majority majority

3372 majority majority

3373 majority majority

3374 majority majority

# We again calculate the number of correct assignments.

hits <- 0

for (row in 1:nrow(prediction_SVM)) {

  if (prediction_SVM$vote[row] ==prediction_SVM$prediction[row] ) {

    hits <- hits+1
  }
}

correctness_SVM <- hits/length(prediction_SVM$vote)
correctness_SVM

 ## [1] 0.6669086

The performance of the SVM algorithm with close to 67% correctness is comparable to the performance of the Naive Bayes. But take a look at the confusion matrix!

table(prediction_SVM$vote,prediction_SVM$prediction)

##

dissent majority

dissent 0 459

majority 0 919

You notice that the SVM predicted ALL voting outcomes as majority vote and NONE as dissent. The reason for that is that Judge Brennan voted more with the majority than in dissent. This creates an imbalance in the data and some machine learning algorithms are affected by that imbalance and then predict exclusively the more common category.

Another lesson to learn from this is to always look at the confusion matrix to assess what the algorithm got wrong.

Last update February 16, 2021.