You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Because we don't store term vectors (due to size) this is terribly difficult to debug, but here is a review of what is known so far
Search for "nordic optical" in body or abstract, will find less documents than expected.
Now, hold your breath .... tada, but only for index built 1 week ago! An index which was built from scratch this Saturday is unaffected.
What is different? The index from last week has been compacted. The solr release building that index also had a bug (which should however only impact documents that had a synonym on the very first position of the indexed stream; and it resulted in docs being rejected -- i.e. not indexed)
Everything else is the same, including synonyms that are used for index time tokenization.
The problem is with search query, the following abstract:"nordic optical" becomes abstract:"nordic syn::optical"
collection1: 1203 results
collection2: 1170
when searching with =abstract:"nordic optical"
collection1: 1203 results
collection2: 1203
when searching abstract:"nordic syn::optical" (this one can only be done from inside Luke with whitespace analyzer):
collection2: 1170 results
So for 33 documents, the position of the token syn::optical -- looks like -- moved by 1. But I have no way to tell because we can't reconstruct the document due to missing term vectors.
This query: abstract:nordic NEAR1 abstract:optical
collection1: 1205
collection2: 1205
Which is totally confusing! -- PROXIMITY search only considers tokens that are next to each other, so it is (almost) the same thing as a phrase search. And I tried abstract:"syn::optical nordic" -- to verify the tokens were not swapped; that produces 0 results
At this point, the suspicion falls on core optimization -- to verify this theory, we'll have to repeat the same action. But we need to wait to have a new core built; not wanting to screw production (which works and is producing correct results)
The text was updated successfully, but these errors were encountered:
Because we don't store term vectors (due to size) this is terribly difficult to debug, but here is a review of what is known so far
Search for
"nordic optical"
in body or abstract, will find less documents than expected.Now, hold your breath .... tada, but only for index built 1 week ago! An index which was built from scratch this Saturday is unaffected.
What is different? The index from last week has been compacted. The solr release building that index also had a bug (which should however only impact documents that had a synonym on the very first position of the indexed stream; and it resulted in docs being rejected -- i.e. not indexed)
Everything else is the same, including synonyms that are used for index time tokenization.
The problem is with search query, the following
abstract:"nordic optical"
becomesabstract:"nordic syn::optical"
collection1: 1203 results
collection2: 1170
when searching with
=abstract:"nordic optical"
collection1: 1203 results
collection2: 1203
when searching
abstract:"nordic syn::optical"
(this one can only be done from inside Luke with whitespace analyzer):collection2: 1170 results
So for 33 documents, the position of the token
syn::optical
-- looks like -- moved by 1. But I have no way to tell because we can't reconstruct the document due to missing term vectors.This query:
abstract:nordic NEAR1 abstract:optical
collection1: 1205
collection2: 1205
Which is totally confusing! -- PROXIMITY search only considers tokens that are next to each other, so it is (almost) the same thing as a phrase search. And I tried
abstract:"syn::optical nordic"
-- to verify the tokens were not swapped; that produces 0 resultsAt this point, the suspicion falls on core optimization -- to verify this theory, we'll have to repeat the same action. But we need to wait to have a new core built; not wanting to screw production (which works and is producing correct results)
The text was updated successfully, but these errors were encountered: