🔥 Enhancements & Bug Fixes
BertForMultipleChoice
Transformer Added. Enhanced BERT’s capabilities to handle multiple-choice tasks such as standardized test questions and survey or quiz automation.PromptAssembler
Annotator Introduced. Introduced a new annotator that constructs prompts for LLMs using a chat template and a sequence of messages. Accepts an array of tuples with roles (“system”, “user”, “assistant”) and message texts. Utilizes llama.cpp as a backend for template parsing, supporting basic template applications.
Example Notebook
promptAssembler = (
PromptAssembler()
.setInputCol("messages")
.setOutputCol("prompt")
.setChatTemplate(template)
)
- Integrated New Tasks and Documentation. Added support and documentation for the following tasks:
- Automatic Speech Recognition
- Dependency Parsing
- Image Captioning
- Image Classification
- Landing Page
- Question Answering
- Summarization
- Table Question Answering
- Text Classification
- Text Generation
- Text Preprocessing
- Token Classification
- Translation
- Zero-Shot Classification
- Zero-Shot Image Classification
- Resolved Pretrained Model Loading Issue on
DBFS Systems
. - Fixed a bug where pretrained models were not found when running
AutoGGUFModel
pipelines onDatabricks
due to incorrect path handling of gguf files.
📖 Documentation
- Import models from TF Hub & HuggingFace
- Spark NLP Notebooks
- Models Hub with new models
- Spark NLP Articles
- Spark NLP in Action
- Spark NLP Documentation
- Spark NLP Scala APIs
- Spark NLP Python APIs
❤️ Community support
- Slack For live discussion with the Spark NLP community and the team
- GitHub Bug reports, feature requests, and contributions
- Discussions Engage with other community members, share ideas, and show off how you use Spark NLP!
- Medium Spark NLP articles
- YouTube Spark NLP video tutorials
Installation
Python
#PyPI
pip install spark-nlp==5.5.1
Spark Packages
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x: (Scala 2.12):
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.5.1
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.5.1
GPU
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.5.1
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.5.1
Apple Silicon (M1 & M2)
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.5.1
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.5.1
AArch64
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.5.1
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.5.1
Maven
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp_2.12</artifactId>
<version>5.5.1</version>
</dependency>
spark-nlp-gpu:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-gpu_2.12</artifactId>
<version>5.5.1</version>
</dependency>
spark-nlp-silicon:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-silicon_2.12</artifactId>
<version>5.5.1</version>
</dependency>
spark-nlp-aarch64:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-aarch64_2.12</artifactId>
<version>5.5.1</version>
</dependency>
FAT JARs
-
CPU on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x/3.5.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-5.5.1.jar
-
GPU on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x/3.5.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-assembly-5.5.1.jar
-
M1 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x/3.5.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-silicon-assembly-5.5.1.jar
-
AArch64 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x/3.5.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-aarch64-assembly-5.5.1.jar
What's Changed
- Models hub by @maziyarpanahi in #14418
- Models hub by @maziyarpanahi in #14420
- Add a new llama_cpp engine by @maziyarpanahi in #14436
- tasks-docs-integration by @AbdullahMubeenAnwar in #14428
- Introducing BertForMultipleChoice transformer by @danilojsl in #14435
- Fix pretrained models not being found on dbfs systems by @DevinTDHa in #14438
- [SPARKNLP-1067] PromptAssembler by @DevinTDHa in #14439
- Release/551 release candidate by @maziyarpanahi in #14437
Full Changelog: 5.5.0...5.5.1