John Snow Labs Spark-NLP 4.2.1: Over 230 state-of-the-art Transformer Vision (ViT) pretrained pipelines, new multi-lingual support for Word Segmentation, add LightPipeline support to Automatic Speech Recognition pipelines, support for processed audio files in type Double for Wav2Vec2, and bug fixes #12922
maziyarpanahi
announced in
Announcement
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
📢 Overview
Spark NLP 4.2.1 🚀 comes with a new multi-lingual support for Word Segmentation mostly used for (but not limited to) Chinese, Japanese, Korean, and so on, adding Automatic Speech Recognition (ASR) pipelines to LightPipeline arsenal for faster computation of smaller datasets without Apache Spark (e.g. RESTful API use case), adding support for processed audio files in type of Double in addition to Float for
Wav2Vec2
, over 230+ state-of-the-art Transformer Vision (ViT) pretrained pipelines for 1-line Image Classification, and bug fixes.Do not forget to visit Models Hub with over 11400+ free and open-source models & pipelines. As always, we would like to thank our community for their feedback, questions, and feature requests. 🎉
⭐ New Features & improvements
enableRegexTokenizer
feature in WordSegmenter to support word segmentation within mixed and multi-lingual content Adding enableRegexTokenizer in WordSegmenter #12854SpanBertCoref
annotator to all docs Added Docs for SpanBertCoref #12889Bug Fixes
fullAnnotate
in Lightpipeline with a list that started to fail in 4.2.0 releaseChunker
annotator Silently skip to construct bad Chunks #12901📓 New Notebooks
Models
Spark NLP 4.2.1 comes with 230+ state-of-the-art pre-trained Transformer Vision (ViT) pipeline:
Featured Pipelines
en
en
en
en
en
en
en
en
en
Check 460+ Transformer Vision (ViT) models & pipelines for Models Hub - Image Classification
Spark NLP covers the following languages:
English
,Multilingual
,Afrikaans
,Afro-Asiatic languages
,Albanian
,Altaic languages
,American Sign Language
,Amharic
,Arabic
,Argentine Sign Language
,Armenian
,Artificial languages
,Atlantic-Congo languages
,Austro-Asiatic languages
,Austronesian languages
,Azerbaijani
,Baltic languages
,Bantu languages
,Basque
,Basque (family)
,Belarusian
,Bemba (Zambia)
,Bengali, Bangla
,Berber languages
,Bihari
,Bislama
,Bosnian
,Brazilian Sign Language
,Breton
,Bulgarian
,Catalan
,Caucasian languages
,Cebuano
,Celtic languages
,Central Bikol
,Chichewa, Chewa, Nyanja
,Chilean Sign Language
,Chinese
,Chuukese
,Colombian Sign Language
,Congo Swahili
,Croatian
,Cushitic languages
,Czech
,Danish
,Dholuo, Luo (Kenya and Tanzania)
,Dravidian languages
,Dutch
,East Slavic languages
,Eastern Malayo-Polynesian languages
,Efik
,Esperanto
,Estonian
,Ewe
,Fijian
,Finnish
,Finnish Sign Language
,Finno-Ugrian languages
,French
,French-based creoles and pidgins
,Ga
,Galician
,Ganda
,Georgian
,German
,Germanic languages
,Gilbertese
,Greek (modern)
,Greek languages
,Gujarati
,Gun
,Haitian, Haitian Creole
,Hausa
,Hebrew (modern)
,Hiligaynon
,Hindi
,Hiri Motu
,Hungarian
,Icelandic
,Igbo
,Iloko
,Indic languages
,Indo-European languages
,Indo-Iranian languages
,Indonesian
,Irish
,Isoko
,Isthmus Zapotec
,Italian
,Italic languages
,Japanese
,Japanese
,Kabyle
,Kalaallisut, Greenlandic
,Kannada
,Kaonde
,Kinyarwanda
,Kirundi
,Kongo
,Korean
,Kwangali
,Kwanyama, Kuanyama
,Latin
,Latvian
,Lingala
,Lithuanian
,Louisiana Creole
,Lozi
,Luba-Katanga
,Luba-Lulua
,Lunda
,Lushai
,Luvale
,Macedonian
,Malagasy
,Malay
,Malayalam
,Malayo-Polynesian languages
,Maltese
,Manx
,Marathi (Marāṭhī)
,Marshallese
,Mexican Sign Language
,Mon-Khmer languages
,Morisyen
,Mossi
,Multiple languages
,Ndonga
,Nepali
,Niger-Kordofanian languages
,Nigerian Pidgin
,Niuean
,North Germanic languages
,Northern Sotho, Pedi, Sepedi
,Norwegian
,Norwegian Bokmål
,Norwegian Nynorsk
,Nyaneka
,Oromo
,Pangasinan
,Papiamento
,Persian (Farsi)
,Peruvian Sign Language
,Philippine languages
,Pijin
,Pohnpeian
,Polish
,Portuguese
,Portuguese-based creoles and pidgins
,Punjabi (Eastern)
,Romance languages
,Romanian
,Rundi
,Russian
,Ruund
,Salishan languages
,Samoan
,San Salvador Kongo
,Sango
,Semitic languages
,Serbo-Croatian
,Seselwa Creole French
,Shona
,Sindhi
,Sino-Tibetan languages
,Slavic languages
,Slovak
,Slovene
,Somali
,South Caucasian languages
,South Slavic languages
,Southern Sotho
,Spanish
,Spanish Sign Language
,Sranan Tongo
,Swahili
,Swati
,Swedish
,Tagalog
,Tahitian
,Tai
,Tamil
,Telugu
,Tetela
,Tetun Dili
,Thai
,Tigrinya
,Tiv
,Tok Pisin
,Tonga (Tonga Islands)
,Tonga (Zambia)
,Tsonga
,Tswana
,Tumbuka
,Turkic languages
,Turkish
,Tuvalu
,Tzotzil
,Ukrainian
,Umbundu
,Uralic languages
,Urdu
,Venda
,Venezuelan Sign Language
,Vietnamese
,Wallisian
,Walloon
,Waray (Philippines)
,Welsh
,West Germanic languages
,West Slavic languages
,Western Malayo-Polynesian languages
,Wolaitta, Wolaytta
,Wolof
,Xhosa
,Yapese
,Yiddish
,Yoruba
,Yucatec Maya, Yucateco
,Zande (individual language)
,Zulu
The complete list of all 11000+ models & pipelines in 230+ languages is available on Models Hub
📖 Documentation
Installation
Python
#PyPI pip install spark-nlp==4.2.1
Spark Packages
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, and 3.3.x (Scala 2.12):
GPU
M1
Maven
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, and 3.3.x:
spark-nlp-gpu:
spark-nlp-m1:
FAT JARs
CPU on Apache Spark 3.x/3.1.x/3.2.x/3.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-4.2.1.jar
GPU on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-assembly-4.2.1.jar
M1 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-m1-assembly-4.2.1.jar
AArch64 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-m1-assembly-4.2.1.jar
What's Changed
Contributors
@Meryem1425 @muhammetsnts @jsl-models @josejuanmartinez @DevinTDHa @ArshaanNazir @C-K-Loan @KshitizGIT @agsfer @diatrambitas @danilojsl @Damla-Gurbaz @maziyarpanahi @jsl-builder
New Contributors
Full Changelog: 4.2.0...4.2.1
This discussion was created from the release John Snow Labs Spark-NLP 4.2.1: Over 230 state-of-the-art Transformer Vision (ViT) pretrained pipelines, new multi-lingual support for Word Segmentation, add LightPipeline support to Automatic Speech Recognition pipelines, support for processed audio files in type Double for Wav2Vec2, and bug fixes.
Beta Was this translation helpful? Give feedback.
All reactions