Bag of Words Scene Classifier

Overview

This was an assignment for the class, 16-720 Introduction to Computer Vision. Our task was to create a bag of words scene classifier and compare it with a neural network we wrote in PyTorch.

Process

The steps were:

Create a filter bank to apply to a scene, then extract features (I chose Harris Corners).

Then use the features to produce high dimensional “words” of similar regions inside the image. Then use K-means clustering to produce word maps of each image, which we could then use to discriminate scenes in a scene classifier.

Result

The neural network performed better. The bag of words gave an overall accuracy of 54.37%, while the NN gave an overall accuracy of 80.625. This could’ve been improved by using a different activation function, or performing more dropout between layers.