Skip to content

Latest commit

 

History

History
38 lines (27 loc) · 3.46 KB

File metadata and controls

38 lines (27 loc) · 3.46 KB

Targeted Sentiment Regression on Financial News Articles using DeBERTa + Entity-Focused Fine-Tuning

SemEval 2017 Task 5, Subtask 2

Fine-Grained Sentiment Analysis on Financial News

SOTA model for modeling fine-grained sentiment expressions in financial news articles. Detached CNN-BiLSTM regression head trained on fine-tuned DeBERTa entity embeddings1. Refer to the PDF for more detail.

Notebooks

Experiments and Results

The experiments showed that sentiment regression performance was improved by:

  • Incorporating into the classification model the final hidden states of both the [CLS] token as well as the masked target entity token
  • Detaching the classification model from the token-level fine-tuning process
    • In other words, placing complex architectures inside the fine-tuning process performed worse than placing the same complex architecture after the standard (boilerplate transformers.BertForSequenceClassification) pooling + dense layer
    • Intuitively, the error propogation backwards through DeBERTa during training seemed to benefit from a closer/simpler signal, resulting in better inputs for the detached CNN-BiLSTM

The tradeoffs between inference time in production systems and model performance is an interesting area for further research.

Attached Classification/Regression Head example:

Screenshot 2023-06-20 at 8 18 45 AM

Detached Classification/Regression Head (with entity token replacement) example:

Screenshot 2023-06-20 at 8 19 49 AM

Experiments & Results:

Screenshot 2023-06-20 at 8 27 04 AM

Footnotes

  1. For BERT-based models, the final token-level embeddings that are output by the fine-tuned model are referred to as the "final hidden states".

  2. "Attached" classification/regression head -- a single network is used to simultaneously fine-tune DeBERTa and perform classification/regression. The loss from the "classification" phase directly affects "representation" (i.e. the production of fine-tuned final hidden states).

  3. "Detached" classification/regression head -- the production of fine-tuned final hidden states is performed using a simple primary network (pooling + dense), then a (completely separate) secondary network is utilized for classification/regression using the output of the primary network as input.