Different label categories than expected in a spark-nlp NER model #13167
-
Hello everyone, I am a beginner with spark-nlp, and I want to train a NER model that recognises in texts 2 types of entities with SPECSKILL and HUMANSKILL labels. I'm using Python 3.7.12 and spark-nlp 4.2.3. The training and test datasets are in CoNLL 2003 format. I did a first small training and got the following results: Questions:
I hope your comments. Thank you. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi,
The number of labels are with If you just count them you see they are 9 (8+
The answer to the rest of your questions, Here is a very complete tutorial explaining everything I said in details (this will answer all your questions): https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Public/4.NERDL_Training.ipynb |
Beta Was this translation helpful? Give feedback.
Hi,
LABELONE
orLABELTWO
inside that file. Instead, you have the following labels (Obviously, Spark NLP is not making these labels up so you are either reading the wrong CoNLL index or this is what's actually inside that conll file):The number of labels are with
B-
andI-
plusO
when they are counted. That's why for CoNLL2003 file you have tested with 4 entities (not labels) you have 9 labels, 8 different labels starting withB-
andI-
andO
which makes them 9 unique labels to learn during the training. You have mistaken the entities with labe…