We now repeat the same exercise but with another algorithm: Support Vector Machines.
Again, we start by training our model on the training data to then predict out of sample.
# Training the model.
model <- svm(voting_pre1980[,-9], as.factor(voting_pre1980[,9]))
model <- svm(voting_pre1980[,-9], voting_pre1980[,9], kernel ="polynomial", degree = 18, cost = 3)
# Predicting out of sample.
prediction <- predict(model, voting_post1980[,-9])
prediction_SVM <- as.data.frame(prediction)
Finally, to evaluate the performance of our algorithm, we again compare our actual row assignment to our prediction and calculate the percentage of accurately predicted results.
prediction_SVM <- cbind(voting_post1980$vote,prediction_SVM)
colnames(prediction_SVM) <- c("vote","prediction")
head(prediction_SVM)
##
# We again calculate the number of correct assignments.
hits <- 0
for (row in 1:nrow(prediction_SVM)) {
if (prediction_SVM$vote[row] ==prediction_SVM$prediction[row] ) {
hits <- hits+1
}
}
correctness_SVM <- hits/length(prediction_SVM$vote)
correctness_SVM
## [1] 0.6669086
The performance of the SVM algorithm with close to 67% correctness is comparable to the performance of the Naive Bayes. But take a look at the confusion matrix!
table(prediction_SVM$vote,prediction_SVM$prediction)
##
You notice that the SVM predicted ALL voting outcomes as majority vote and NONE as dissent. The reason for that is that Judge Brennan voted more with the majority than in dissent. This creates an imbalance in the data and some machine learning algorithms are affected by that imbalance and then predict exclusively the more common category.
Another lesson to learn from this is to always look at the confusion matrix to assess what the algorithm got wrong.
access_time Last update February 16, 2021.