You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Users have asked us for the ability to find chemical symbols such as CO2+ which right now is impossible since the tokenizer removes the ending + sign. Same thing applies for the minus symbol.
The other thing we'll have to deal with in this context are superscripts and subscripts, which are now handled with the HTML tags <SUP> and <SUB>
Here is an example of what the Classic tokenizer did for some of these cases:
INPUT: Index H-alpha and Hα+ and H<SUB>α</SUB> and as well
OUTPUT: INDEX HALPHA HALPHA+ HALPHA WELL H ALPHA HALPHA HALPHA
POSITION: 1 2 3 4 5 2 2 3 4
(note: the &alpha entity is not translated in SOLR in the unicode glyph α (U+03B1)).
Here is an example of how chemical formulae are handled:
INPUT: test formula H2O+ test CNO-SI+ test PSi+O3- end
OUTPUT: TEST FORMULA H2O+ TEST CNO-SI+ TEST PSI+O3- END CNO- SI+ PSI+ O3- H2O CNO SI PSI O3
POSITION: 1 2 3 4 5 6 7 8 5 5 7 7 3 5 5 7 7
Essentially these apply to ([A-Z][A-Za-z0-9]*[+-])* sequences.
The text was updated successfully, but these errors were encountered:
Users have asked us for the ability to find chemical symbols such as
CO2+
which right now is impossible since the tokenizer removes the ending+
sign. Same thing applies for the minus symbol.The other thing we'll have to deal with in this context are superscripts and subscripts, which are now handled with the HTML tags
<SUP>
and<SUB>
Here is an example of what the Classic tokenizer did for some of these cases:
(note: the
&alpha
entity is not translated in SOLR in the unicode glyph α (U+03B1)).Here is an example of how chemical formulae are handled:
Essentially these apply to ([A-Z][A-Za-z0-9]*[+-])* sequences.
The text was updated successfully, but these errors were encountered: