Overview
This was an assignment for the class, 16-720 Introduction to Computer Vision. Our task was to create a bag of words scene classifier and compare it with a neural network we wrote in PyTorch.
Process
The steps were:
Create a filter bank to apply to a scene, then extract features (I chose Harris Corners).
Then use the features to produce high dimensional “words” of similar regions inside the image. Then use K-means clustering to produce word maps of each image, which we could then use to discriminate scenes in a scene classifier.
Park words Windmill words Laundromat words
Result
The neural network performed better. The bag of words gave an overall accuracy of 54.37%, while the NN gave an overall accuracy of 80.625. This could’ve been improved by using a different activation function, or performing more dropout between layers.