Skip to content

The columns of A don't match the number of elements of x. A: 768, x: 1536 #14362

Answered by SidWeng
SidWeng asked this question in Q&A
Discussion options

You must be logged in to vote

Finally I found the root cause. There exists. in dataset like this

First document, this is my first sentence. This is my second sentence.

It will be viewed as 2 sentences.
The output column(sentence_embeddings) of BertSentenceEmbeddings and RoBertaSentenceEmbeddings is an array of size 2.
DocumentSimilarityRankerApproach.train() will flatten sentence_embeddings.embeddings and causes the dimension be 1536 (768 * 2)

val similarityDataset: DataFrame = embeddingsDataset
  .withColumn(s"$LSH_INPUT_COL_NAME", array_to_vector(flatten(col(INPUT_EMBEDDINGS))))

The solution to my case is to set custom bound for SentenceDetector

.setCustomBounds(Array("\n"))
.setUseCustomBoundsOnly(true)

Replies: 2 comments 2 replies

Comment options

You must be logged in to vote
1 reply
@SidWeng
Comment options

Comment options

You must be logged in to vote
1 reply
@danilojsl
Comment options

Answer selected by SidWeng
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants