John Snow Labs Spark-NLP 3.4.4: New DeBERTa for Token Classification, new CamemBERT embeddings, speed improvements for Tokenizer and UniversalSentenceEncoder annotators, over 160 new state-of-the-art models, and other improvements! #8312
maziyarpanahi
announced in
Announcement
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Overview
We are very excited to release Spark NLP 🚀 3.4.4! This release comes with a new DeBERTa for Token Classification annotator compatible with existing or fine-tuned models on HuggingFace 🤗, a new annotator for CamemBERT embeddings models, up to 18x times improvements of UniversalSentenceEncoder on GPU devices, up to 400% speed improvements in Tokenizer with a list of exceptions, new state-of-the-art NER, French embeddings, DistilBERT embeddings, and ALBERT embeddings models!
As always, we would like to thank our community for their feedback, questions, and feature requests.
New Features
DeBertaForTokenClassification
can load DeBERTa v2&v3 models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. This annotator is compatible with all the models trained/fine-tuned by usingDebertaV2ForTokenClassification
for PyTorch orTFDebertaV2ForTokenClassification
for TensorFlow models in HuggingFace Introducing DeBertaForTokenClassification annotator #8082Bug Fixes & Enhancements
exceptions list
to be scalable to a large number of exceptions without impacting the overall performance Tokenizer: Optimized tokenization with exceptions #7881Dependencies
2.4.8
,3.0.3
, and3.2.1
1.4.2
1.6.2
Models
Spark NLP 3.4.4 comes with over 160+ state-of-the-art multi-lingual pretrained models. Some of the featured models:
New DeBERTa Token Classification Models
New fine-tuned DeBERTa v3 models for token classifications over CoNLL03 and OntoNotes datasets that reach state-of-the-art metrics.
en
0.97
en
0.96
en
0.95
en
0.93
en
0.89
en
0.88
en
0.87
en
0.86
New CamemBERT Models
fr
fr
fr
fr
fr
fr
New DistilBERT Embeddings Models
fr
mr
id
jv
ms
ar
New ALBERT Embeddings Models
fr
ar
mr
fa
ms
mr
The complete list of all 5000+ models & pipelines in 200+ languages is available on Models Hub.
Documentation
Installation
Python
#PyPI pip install spark-nlp==3.4.4
Spark Packages
spark-nlp on Apache Spark 3.0.x and 3.1.x (Scala 2.12 only):
GPU
spark-nlp on Apache Spark 3.2.x (Scala 2.12 only):
GPU
spark-nlp on Apache Spark 2.4.x (Scala 2.11 only):
GPU
spark-nlp on Apache Spark 2.3.x (Scala 2.11 only):
GPU
Maven
spark-nlp on Apache Spark 3.0.x and 3.1.x:
spark-nlp-gpu:
spark-nlp on Apache Spark 3.2.x:
spark-nlp-gpu:
spark-nlp on Apache Spark 2.4.x:
spark-nlp-gpu:
spark-nlp on Apache Spark 2.3.x:
spark-nlp-gpu:
FAT JARs
CPU on Apache Spark 3.0.x/3.1.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-3.4.4.jar
GPU on Apache Spark 3.0.x/3.1.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-assembly-3.4.4.jar
CPU on Apache Spark 3.2.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-spark32-assembly-3.4.4.jar
GPU on Apache Spark 3.2.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-spark32-assembly-3.4.4.jar
CPU on Apache Spark 2.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-spark24-assembly-3.4.4.jar
GPU on Apache Spark 2.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-spark24-assembly-3.4.4.jar
CPU on Apache Spark 2.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-spark23-assembly-3.4.4.jar
GPU on Apache Spark 2.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-spark23-assembly-3.4.4.jar
What's Changed
New Contributors
Full Changelog: 3.4.3...3.4.4
This discussion was created from the release John Snow Labs Spark-NLP 3.4.4: New DeBERTa for Token Classification, new CamemBERT embeddings, speed improvements for Tokenizer and UniversalSentenceEncoder annotators, over 160 new state-of-the-art models, and other improvements!.
Beta Was this translation helpful? Give feedback.
All reactions