In this project, we finetune a pre-trained ResNet to classify cars according to their production year and subsequently compute a score for their modernity. We also investigate which pixels in an image are most informative for the resulting prediction.
We use a pretrained ResNet-18 available from torchvision
for which we freeze its trainable parameters and add an additional trainable nn.Linear
layer used for finetuning:
Lines 5 to 33 in edefb96
We trained the model on the CPU of a MacBook Pro with 16GB of RAM. This places restrictions on the possible size of the training set and the number of epochs. Below we show the result of a training run on 10000 training datapoints run for 10 epochs. Both train and test loss reduce, while the accuracy increases, even if given the computational constraints predictive performance is still limited.
A car's modernity is scored according to the probabilities for different production year categories.
One reason for the decrease in test cross-entropy loss, but limited accuracy after an hour of training could be the stark class imbalance. Many more cars from more recent years exist in the dataset:
Taking a look at the confusion matrix shows that the decrease in cross-entropy loss on the test set corresponds to the model learning to predict rough year ranges for cars, that were produced more recently, even if the exact production year is not yet correctly classified.
We can understand which pixels in input space are most informative by looking at a saliency map. It highlights pixels according to gradient magnitude for the predicted class score.