You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel. : org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:
#1011
Open
behnazeslami opened this issue
Mar 7, 2024
· 1 comment
Hi,
In my CentOS Linux, I installed: 1- ! pip install --upgrade -q pyspark==3.4.1 spark-nlp==5.2.2
2- ! pip install --upgrade spark-nlp-jsl==5.2.1 --user --extra-index-url https://pypi.johnsnowlabs.com/[secret_code]
I checked the java --version: java -version
openjdk version "11.0.13" 2021-10-19 OpenJDK Runtime Environment JBR-11.0.13.7-1751.21-jcef (build 11.0.13+7-b1751.21) OpenJDK 64-Bit Server VM JBR-11.0.13.7-1751.21-jcef (build 11.0.13+7-b1751.21, mixed mode)
in the ~/.bashrc: JAVA_HOME: export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.332.b09-1.el7_9.x86_64
I am trying to run the following program:
` import sparknlp
import sparknlp_jsl
from sparknlp.base import *
from sparknlp.util import *
from sparknlp.annotator import *
from sparknlp_jsl.annotator import *
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.ml import Pipeline, PipelineModel
from pyspark.sql.functions import col
from pyspark.sql.functions import explode
from sparknlp.pretrained import PretrainedPipeline
import gc
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', False)
pd.set_option('max_colwidth', None)
:: retrieving :: org.apache.spark#spark-submit-parent-b59223ac-26d8-44de-a4c3-d05a558c3faf
confs: [default]
0 artifacts copied, 72 already retrieved (0kB/31ms)
24/03/06 20:34:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark NLP Version : 5.2.2
Spark NLP_JSL Version : 5.2.1
<pyspark.sql.session.SparkSession object at 0x7f5597073190>
========================================================================
biobert_pubmed_base_cased download started this may take some time.
Approximate size to download 386.4 MB
[ | ]biobert_pubmed_base_cased download started this may take some time.
Approximate size to download 386.4 MB
[ / ]Download done! Loading the resource.
An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel.
: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/beslami/cache_pretrained/biobert_pubmed_base_cased_en_2.6.2_2.4_1600449177996/metadata
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:304)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:244)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:332)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:208)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:291)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:291)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:287)
at org.apache.spark.rdd.RDD.$anonfun$take$1(RDD.scala:1441)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:405)
at org.apache.spark.rdd.RDD.take(RDD.scala:1435)
at org.apache.spark.rdd.RDD.$anonfun$first$1(RDD.scala:1476)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:405)
at org.apache.spark.rdd.RDD.first(RDD.scala:1476)
at org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:587)
at org.apache.spark.ml.util.DefaultParamsReader.load(ReadWrite.scala:465)
at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:31)
at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:24)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:513)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:505)
at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader$.downloadModel(ResourceDownloader.scala:705)
at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel(ResourceDownloader.scala)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.io.IOException: Input path does not exist: file:/home/beslami/cache_pretrained/biobert_pubmed_base_cased_en_2.6.2_2.4_1600449177996/metadata
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:278)
... 40 more
[OK!]
Traceback (most recent call last):
File "/data/beslami/sample_loaded_models.py", line 75, in
embeddings = BertEmbeddings.pretrained("biobert_pubmed_base_cased")
File "/home/beslami/.local/lib/python3.9/site-packages/sparknlp/annotator/embeddings/bert_embeddings.py", line 206, in pretrained
return ResourceDownloader.downloadModel(BertEmbeddings, name, lang, remote_loc)
File "/home/beslami/.local/lib/python3.9/site-packages/sparknlp/pretrained/resource_downloader.py", line 99, in downloadModel
raise e
File "/home/beslami/.local/lib/python3.9/site-packages/sparknlp/pretrained/resource_downloader.py", line 96, in downloadModel
j_obj = _internal._DownloadModel(reader.name, name, language, remote_loc, j_dwn).apply()
File "/home/beslami/.local/lib/python3.9/site-packages/sparknlp/internal/init.py", line 352, in init
super(_DownloadModel, self).init("com.johnsnowlabs.nlp.pretrained." + validator + ".downloadModel", reader,
File "/home/beslami/.local/lib/python3.9/site-packages/sparknlp/internal/extended_java_wrapper.py", line 27, in init
self._java_obj = self.new_java_obj(java_obj, *args)
File "/home/beslami/.local/lib/python3.9/site-packages/sparknlp/internal/extended_java_wrapper.py", line 37, in new_java_obj
return self._new_java_obj(java_class, *args)
File "/home/beslami/.local/lib/python3.9/site-packages/pyspark/ml/wrapper.py", line 86, in _new_java_obj
return java_obj(*java_args)
File "/home/beslami/.local/lib/python3.9/site-packages/py4j/java_gateway.py", line 1322, in call
return_value = get_return_value(
File "/home/beslami/.local/lib/python3.9/site-packages/pyspark/errors/exceptions/captured.py", line 169, in deco
return f(*a, **kw)
File "/home/beslami/.local/lib/python3.9/site-packages/py4j/protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel.
: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/beslami/cache_pretrained/biobert_pubmed_base_cased_en_2.6.2_2.4_1600449177996/metadata
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:304)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:244)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:332)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:208)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:291)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:291)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:287)
at org.apache.spark.rdd.RDD.$anonfun$take$1(RDD.scala:1441)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:405)
at org.apache.spark.rdd.RDD.take(RDD.scala:1435)
at org.apache.spark.rdd.RDD.$anonfun$first$1(RDD.scala:1476)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:405)
at org.apache.spark.rdd.RDD.first(RDD.scala:1476)
at org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:587)
at org.apache.spark.ml.util.DefaultParamsReader.load(ReadWrite.scala:465)
at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:31)
at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:24)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:513)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:505)
at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader$.downloadModel(ResourceDownloader.scala:705)
at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel(ResourceDownloader.scala)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.io.IOException: Input path does not exist: file:/home/beslami/cache_pretrained/biobert_pubmed_base_cased_en_2.6.2_2.4_1600449177996/metadata
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:278)
... 40 more
The text was updated successfully, but these errors were encountered:
behnazeslami
changed the title
py4j.protocol.Py4JJavaError: An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel. : org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:
An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel. : org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:
Mar 7, 2024
I transferred this issue as it has licensed annotators, we cannot reproduce it in open-source library. (but I suspect the home_directory not having right permissions to download/extract the models or it is not reachable. checking /home/beslami/cache_pretrained/ path and its permissions might help)
Hi,
In my CentOS Linux, I installed:
1- ! pip install --upgrade -q pyspark==3.4.1 spark-nlp==5.2.2
2- ! pip install --upgrade spark-nlp-jsl==5.2.1 --user --extra-index-url https://pypi.johnsnowlabs.com/[secret_code]
I checked the java --version:
java -version
openjdk version "11.0.13" 2021-10-19 OpenJDK Runtime Environment JBR-11.0.13.7-1751.21-jcef (build 11.0.13+7-b1751.21) OpenJDK 64-Bit Server VM JBR-11.0.13.7-1751.21-jcef (build 11.0.13+7-b1751.21, mixed mode)
in the
~/.bashrc
:JAVA_HOME: export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.332.b09-1.el7_9.x86_64
I am trying to run the following program:
` import sparknlp
import sparknlp_jsl
from sparknlp.base import *
from sparknlp.util import *
from sparknlp.annotator import *
from sparknlp_jsl.annotator import *
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.ml import Pipeline, PipelineModel
from pyspark.sql.functions import col
from pyspark.sql.functions import explode
from sparknlp.pretrained import PretrainedPipeline
import gc
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', False)
pd.set_option('max_colwidth', None)
import string
import numpy as np
#%%
params = {"spark.driver.memory":"50G",
"spark.kryoserializer.buffer.max":"2000M",
"spark.serializer": "org.apache.spark.serializer.KryoSerializer",
"spark.driver.maxResultSize":"16G"}
spark = sparknlp_jsl.start(license_keys['SECRET'], params=params, gpu=True)
print ("Spark NLP Version :", sparknlp.version())
print ("Spark NLP_JSL Version :", sparknlp_jsl.version())
print(spark)
print("\n========================================================================")
document = DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
sentenceDetector = SentenceDetector()
.setInputCols(["document"])
.setOutputCol("sentence")
token = Tokenizer()
.setInputCols(['sentence'])
.setOutputCol('token')
embeddings = BertEmbeddings.pretrained("biobert_pubmed_base_cased")
.setInputCols(["sentence", "token"])
.setOutputCol("embeddings")
clinical_ner = MedicalNerModel.pretrained("ner_clinical_biobert", "en", "clinical/models")
.setInputCols(["sentence", "token", "embeddings"])
.setOutputCol("ner")
ner_converter = NerConverter()
.setInputCols(["sentence", "token", "ner"])
.setOutputCol("ner_chunk")
clinical_assertion = AssertionDLModel.pretrained("assertion_dl_biobert_scope_L10R10","en", "clinical/models")
.setInputCols(["sentence", "ner_chunk", "embeddings"])
.setOutputCol("assertion")
chunk2doc = Chunk2Doc()
.setInputCols("ner_chunk")
.setOutputCol("ner_chunk_doc")
sbert_embedder = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli","en","clinical/models")
.setInputCols(["ner_chunk_doc"])
.setOutputCol("sbert_embeddings")
snomed_resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_snomed_findings_aux_concepts", "en", "clinical/models")
.setInputCols(["sbert_embeddings"])
.setOutputCol("snomed_code")
.setDistanceFunction("COSINE")
.setCaseSensitive(False)
.setUseAuxLabel(True)
.setNeighbours(10)
resolver = SentenceEntityResolverModel
.pretrained("sbiobertresolve_umls_findings","en", "clinical/models")
.setInputCols(["ner_chunk", "sbert_embeddings"])
.setOutputCol("resolution")
.setDistanceFunction("EUCLIDEAN")
nlpPipeline = Pipeline(stages=[document,
sentenceDetector,
token,
embeddings,
clinical_ner,
ner_converter,
clinical_assertion,
chunk2doc,
sbert_embedder,
snomed_resolver,
resolver])
data = spark.createDataFrame([[""]]).toDF("text")
assertion_model = nlpPipeline.fit(data)`
However, I get the following error:
:: retrieving :: org.apache.spark#spark-submit-parent-b59223ac-26d8-44de-a4c3-d05a558c3faf
confs: [default]
0 artifacts copied, 72 already retrieved (0kB/31ms)
24/03/06 20:34:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark NLP Version : 5.2.2
Spark NLP_JSL Version : 5.2.1
<pyspark.sql.session.SparkSession object at 0x7f5597073190>
========================================================================
biobert_pubmed_base_cased download started this may take some time.
Approximate size to download 386.4 MB
[ | ]biobert_pubmed_base_cased download started this may take some time.
Approximate size to download 386.4 MB
[ / ]Download done! Loading the resource.
An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel.
: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/beslami/cache_pretrained/biobert_pubmed_base_cased_en_2.6.2_2.4_1600449177996/metadata
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:304)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:244)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:332)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:208)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:291)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:291)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:287)
at org.apache.spark.rdd.RDD.$anonfun$take$1(RDD.scala:1441)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:405)
at org.apache.spark.rdd.RDD.take(RDD.scala:1435)
at org.apache.spark.rdd.RDD.$anonfun$first$1(RDD.scala:1476)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:405)
at org.apache.spark.rdd.RDD.first(RDD.scala:1476)
at org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:587)
at org.apache.spark.ml.util.DefaultParamsReader.load(ReadWrite.scala:465)
at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:31)
at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:24)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:513)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:505)
at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader$.downloadModel(ResourceDownloader.scala:705)
at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel(ResourceDownloader.scala)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.io.IOException: Input path does not exist: file:/home/beslami/cache_pretrained/biobert_pubmed_base_cased_en_2.6.2_2.4_1600449177996/metadata
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:278)
... 40 more
[OK!]
Traceback (most recent call last):
File "/data/beslami/sample_loaded_models.py", line 75, in
embeddings = BertEmbeddings.pretrained("biobert_pubmed_base_cased")
File "/home/beslami/.local/lib/python3.9/site-packages/sparknlp/annotator/embeddings/bert_embeddings.py", line 206, in pretrained
return ResourceDownloader.downloadModel(BertEmbeddings, name, lang, remote_loc)
File "/home/beslami/.local/lib/python3.9/site-packages/sparknlp/pretrained/resource_downloader.py", line 99, in downloadModel
raise e
File "/home/beslami/.local/lib/python3.9/site-packages/sparknlp/pretrained/resource_downloader.py", line 96, in downloadModel
j_obj = _internal._DownloadModel(reader.name, name, language, remote_loc, j_dwn).apply()
File "/home/beslami/.local/lib/python3.9/site-packages/sparknlp/internal/init.py", line 352, in init
super(_DownloadModel, self).init("com.johnsnowlabs.nlp.pretrained." + validator + ".downloadModel", reader,
File "/home/beslami/.local/lib/python3.9/site-packages/sparknlp/internal/extended_java_wrapper.py", line 27, in init
self._java_obj = self.new_java_obj(java_obj, *args)
File "/home/beslami/.local/lib/python3.9/site-packages/sparknlp/internal/extended_java_wrapper.py", line 37, in new_java_obj
return self._new_java_obj(java_class, *args)
File "/home/beslami/.local/lib/python3.9/site-packages/pyspark/ml/wrapper.py", line 86, in _new_java_obj
return java_obj(*java_args)
File "/home/beslami/.local/lib/python3.9/site-packages/py4j/java_gateway.py", line 1322, in call
return_value = get_return_value(
File "/home/beslami/.local/lib/python3.9/site-packages/pyspark/errors/exceptions/captured.py", line 169, in deco
return f(*a, **kw)
File "/home/beslami/.local/lib/python3.9/site-packages/py4j/protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel.
: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/beslami/cache_pretrained/biobert_pubmed_base_cased_en_2.6.2_2.4_1600449177996/metadata
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:304)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:244)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:332)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:208)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:291)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:291)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:287)
at org.apache.spark.rdd.RDD.$anonfun$take$1(RDD.scala:1441)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:405)
at org.apache.spark.rdd.RDD.take(RDD.scala:1435)
at org.apache.spark.rdd.RDD.$anonfun$first$1(RDD.scala:1476)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:405)
at org.apache.spark.rdd.RDD.first(RDD.scala:1476)
at org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:587)
at org.apache.spark.ml.util.DefaultParamsReader.load(ReadWrite.scala:465)
at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:31)
at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:24)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:513)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:505)
at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader$.downloadModel(ResourceDownloader.scala:705)
at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel(ResourceDownloader.scala)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.io.IOException: Input path does not exist: file:/home/beslami/cache_pretrained/biobert_pubmed_base_cased_en_2.6.2_2.4_1600449177996/metadata
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:278)
... 40 more
The text was updated successfully, but these errors were encountered: