Spark NLP 5.3.0: Introducing Llama-2 for CasualLM, M2M100 for Multilingual Translation, MPNet & DeBERTa Enhancements, New Document Similarity Features, Expanded ONNX & In-Memory Support, Updated Runtimes, Essential Bug Fixes, and More! #14185
maziyarpanahi
announced in
Announcement
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
🎉 Celebrating 91 Million Downloads on PyPI - A Spark NLP Milestone! 🚀
We're thrilled to announce the release of Spark NLP 5.3.0, a monumental update that brings cutting-edge advancements and enhancements to the forefront of Natural Language Processing (NLP). This release underscores our commitment to providing the NLP community with state-of-the-art tools and models, furthering our mission to democratize NLP technologies.
This release also addresses critical bug fixes, enhancing the stability and reliability of Spark NLP. Fixes include Spark NLP configuration adjustments, score calculation corrections, input validation, notebook improvements, and serialization issues.
We invite the community to explore these new features and enhancements, and we look forward to seeing the innovative applications that Spark NLP 5.3.0 will enable. 🌟
🔥 New Features & Enhancements
We have made
LLAMA2Transformer
annotator compatible with ONNX exports and quantizations:As always, we made this feature super easy and scalable:
We will continue improving this annotator and import more models in the future
M2M100
model sets a new benchmark for multilingual translation, supporting direct translation across 9,900 language pairs from 100 languages. This feature represents a significant leap in breaking down language barriers in global communication.DocumentSimilarity
annotator, offering an efficient and scalable solution for ranking documents based on similarity, ideal for retrieval-augmented generation (RAG) applications.MPNetForSequenceClassification
annotator for sequence classification tasks. This annotator is based on the MPNet architecture, enhances our capabilities in sequence classification tasks, offering more precise and context-aware processing.MPNetForQuestionAnswering
annotator for question answering tasks. This annotator is based on the MPNet architecture, enhances our capabilities in question answering tasks, offering more precise and context-aware processing.DeBertaForZeroShotClassification
annotator, leveraging the DeBERTa architecture, introduces sophisticated zero-shot classification capabilities, enabling the classification of text into predefined classes without direct example training.WordEmbeddingsModel
annotator in serverless clusters. We initially introduced the in-memory feature for this annotator for users inside Kubernetes clusters without anyHDFS
. However, today it runs without any issuelocally
, on GoogleColab
,Kaggle
,Databricks
,AWS EMR
,GCP
, andAWS Glue
.BertForZeroShotClassification
annotator14.2
,14.3
,14.2 ML
,14.3 ML
,14.2 GPU
, and14.3 GPU
.6.15.0
and7.0.0
.EntityRuler
documentation.🐛 Bug Fixes
cluster_tmp_dir
on Databricks' DBFS viaspark.jsl.settings.storage.cluster_tmp_dir
Spark NLP Configuration's spark.jsl.settings.storage.cluster_tmp_dir: Databricks DBFS location does not work #14129RoBertaForQuestionAnswering
annotator SPARKNLP-942: MPNet Classifiers #14147ℹ️ Known Issues
💾 Models
The complete list of all 37000+ models & pipelines in 230+ languages is available on Models Hub
📓 New Notebooks
📖 Documentation
❤️ Community support
and show off how you use Spark NLP!
Installation
Python
#PyPI pip install spark-nlp==5.3.0
Spark Packages
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x (Scala 2.12):
GPU
Apple Silicon (M1 & M2)
AArch64
Maven
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x:
spark-nlp-gpu:
spark-nlp-silicon:
spark-nlp-aarch64:
FAT JARs
CPU on Apache Spark 3.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-5.3.0.jar
GPU on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-assembly-5.3.0.jar
M1 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-silicon-assembly-5.3.0.jar
AArch64 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-aarch64-assembly-5.3.0.jar
Pull Requests:
What's Changed
New Contributors
Full Changelog: 5.2.3...5.3.0
This discussion was created from the release Spark NLP 5.3.0: Introducing Llama-2 for CasualLM, M2M100 for Multilingual Translation, MPNet & DeBERTa Enhancements, New Document Similarity Features, Expanded ONNX & In-Memory Support, Updated Runtimes, Essential Bug Fixes, and More!.
Beta Was this translation helpful? Give feedback.
All reactions