UMUTeam at EXIST 2021

Sexist Language Identification based on Linguistic Features and Transformers in Spanish and English

Sexism is harmful behaviour that can make women feel worthless promoting self-censorship and gender inequality. In the digital era, misogynists have found in social networks a place in which they can spread their oppressive discourse towards women. Although this particular form of oppressive speech is banned and punished on most social networks, its identification is quite challenging due to the large number of messages posted everyday. Moreover, sexist comments can be unnoticed as condescends or friendly statements which hinders its identification even for humans. With the aim of improving automatic sexist identification on social networks, we participate in EXIST-2021. This shared task involves the identification and categorisation of sexism language on Spanish and English documents compiled from micro-blogging platforms. Specifically, two tasks were proposed, one concerning a binary classification of sexism utterances and another regarding multi-class identification of sexist traits. Our proposal for solving both tasks is grounded on the combination of linguistic features and state-of-the-art transformers by means of ensembles and multi-input neural networks. To address the multi-language problem, we tackle the problem independently by language to put the results together at the end. Our best result was achieved in task 1 with an accuracy of 75.14% and an accuracy of 61.70% for task 2.

Links

http://nlp.uned.es/exist2021/

Details

The source code is stored in the code folder. For training, the embeddingsfolders there are symbolyc links to the pretrained word embeddings used. Due to size, however, you should download the glove.6b.300d.txt (https://nlp.stanford.edu/projects/glove/). The dataset is not submitted and you should download from codalab. If you need the trained model and feature sets you can request them by email [email protected].

Install

Create a virtual environment in Python 3
Install the dependencies that are stored at requirements.txt
Copy the datasets at assets/exist/2021-es/dataset folder and assets/exist/2021-en/dataset
Generate the dataset: python -W ignore compile.py --dataset=exist --corpus=2021-es python -W ignore compile.py --dataset=exist --corpus=2021-en
Finetune BERT. python -W ignore train.py --dataset=exist --corpus=2021-es --model=transformers python -W ignore train.py --dataset=exist --corpus=2021-en --model=transformers
Feature selection. python -W ignore feature-selection.py --dataset=exist --corpus=2021-es python -W ignore feature-selection.py --dataset=exist --corpus=2021-en
Generate BF features: python -W ignore generate-bf.py --dataset=exist --corpus=2021-es python -W ignore generate-bf.py --dataset=exist --corpus=2021-en
Feature selection for the BF features. python -W ignore feature-selection.py --dataset=exist --corpus=2021-es python -W ignore feature-selection.py --dataset=exist --corpus=2021-en
Train (available features: lf, se, bf, we and combinations. For example: lf-bf) python -W ignore train.py --dataset=exist --corpus=2021-es --model=deep-learning --features=lf python -W ignore train.py --dataset=exist --corpus=2021-en --model=deep-learning --features=lf
Evaluate. python -W ignore evaluate.py --dataset=exist --corpus=2021-es --model=deep-learning --features=lf --source=val

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
code		code
config		config
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UMUTeam at EXIST 2021

Sexist Language Identification based on Linguistic Features and Transformers in Spanish and English

Links

Details

Install

About

Releases

Packages

Languages

Smolky/exist-2021

Folders and files

Latest commit

History

Repository files navigation

UMUTeam at EXIST 2021

Sexist Language Identification based on Linguistic Features and Transformers in Spanish and English

Links

Details

Install

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages