-
I use the following pipeline with BioBERT Sentence Embeddings.
24/08/08 03:19:13.581 [task-result-getter-3] WARN o.a.spark.scheduler.TaskSetManager - Lost task 7.2 in stage 10.0 (TID 370) (10.0.0.12 executor 4): org.apache.spark.SparkException: Failed to execute user defined function (LSHModel$$Lambda$5263/1056329262: (struct<type:tinyint,size:int,indices:array,values:array>) => array<struct<type:tinyint,size:int,indices:array,values:array>>)
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
The exception still raises even I use sent_roberta_base. |
Beta Was this translation helpful? Give feedback.
-
Finally I found the root cause. There exists
It will be viewed as 2 sentences.
The solution to my case is to set custom bound for SentenceDetector
|
Beta Was this translation helpful? Give feedback.
Finally I found the root cause. There exists
.
in dataset like thisIt will be viewed as 2 sentences.
The output column(sentence_embeddings) of BertSentenceEmbeddings and RoBertaSentenceEmbeddings is an array of size 2.
DocumentSimilarityRankerApproach.train()
will flattensentence_embeddings.embeddings
and causes the dimension be 1536 (768 * 2)The solution to my case is to set custom bound for SentenceDetector