The aim of this project is to improve the quality of go_emotions dataset and try to establish an automated pipeline that can be used for other text classification datasets.
We made a simple UI to compare the results of SamLowe/roberta-base-go_emotions against a a model trained on an early version of LEGO Emotions. We used the same model roberta base to provide a fair comparison
-
The early results show better capability of recognizing emotions with with higher confidence in the model trained on LEGO emotions.
-
The model is better in recognizing subtle differences (The stock is going up / down) in the examples and change the predictions accordingly.
-
There are some issues in recognizing
neutral
example and the model predicts it with low confidence. -
We have also been able to reduce the size of Go emotions and get very similar results by removing duplicate and similar examples from the dataset. More on that later.