Implement citations() score as a function of cited paper frequencies #208

aaccomazzi · 2023-12-04T14:02:36Z

This is a request that comes up on a regular basis. When we call the citations() and references() operators, the returned scores are not useful, in the sense that they neither reflect the original scores from the inner query nor they reflect the number of times the documents in the inner query were cited by the returned documents. We would like to enable the latter.

For example, if I search for author:"accomazzi, a" in the astronomy colleciton I will find about 200+ documents. If I ask for the their citations via citations(author:"accomazzi, a") the generated list has a ranking which is somewhat meaningless. Instead, we would like to see at the top the papers that cite the original inner query most frequently, which in this case would be:

bibcode              | citations
-------------------- | ---------
2010ARIST..44....3K  | 13.000
2002Ap&SS.282..299E  | 10.000
2011ASSP...24...23K  | 8.000
2007BASI...35..717E  | 8.000
2003lisa.conf..145E  | 8.000
2018ApJS..236....3H  | 7.000
...

Ideally we should take one step forward and consider implementing a hybrid score controlled by an optional parameter, as we have done for the reviews() operator:

montysolr/montysolr/src/main/java/org/apache/lucene/queryparser/flexible/aqp/builders/AqpAdsabsSubQueryProvider.java

Line 802 in 811eee1

* def reviews(query):

The optional parameter (let's call it textWeightRatio) would control how much weight is given to the scores coming from the documents retrieved by the inner query, so that we can compute a final score for each citing paper j this way:

final_score(j) = SUM (1 + textWeightRatio * innerScore(i) / maxInnerScore)

where innerScore(i) is the relevance score computed for document i which matches the inner query, and SUM is computed over all citations to the inner set. maxInnerscore is the highest score from the inner query. When textWeightRatio is 0 (default), the final score is simply the number of citations document j has to the documents selected by the inner query.

The text was updated successfully, but these errors were encountered:

aaccomazzi added query enhancement operators score feature-request labels Dec 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement citations() score as a function of cited paper frequencies #208

Implement citations() score as a function of cited paper frequencies #208

aaccomazzi commented Dec 4, 2023 •

edited

Loading

Implement citations() score as a function of cited paper frequencies #208

Implement citations() score as a function of cited paper frequencies #208

Comments

aaccomazzi commented Dec 4, 2023 • edited Loading

aaccomazzi commented Dec 4, 2023 •

edited

Loading