Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML search: represent indices using JavaScript Map instead of object literals #13097

Open
wants to merge 27 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
47f6547
search: check that query terms exist as properties in term indices be…
jayaddison Oct 24, 2024
713d6bb
Tests: fixtures: add document title for ECMAScript testroot
jayaddison Nov 2, 2024
0fff170
Tests: add JavaScript test coverage for prototype-property-name query
jayaddison Nov 2, 2024
64d6e09
Add CHANGES.rst entry
jayaddison Nov 2, 2024
9bab17c
Tests: add negative-test case
jayaddison Nov 5, 2024
9b1edac
search: add explanatory comment for document title/text term lookup
jayaddison Nov 11, 2024
52b9de3
search: retain `undefined` result instead of `[]` for non-matching terms
jayaddison Nov 11, 2024
bcb02fe
search: add explanatory comment regarding `hasOwnProperty` usage
jayaddison Nov 11, 2024
7267990
Merge branch 'master' into issue-13096/html-search-handle-proto-query
AA-Turner Nov 13, 2024
462c859
HTML search: encapsulate the search index content within a JSON string
jayaddison Nov 14, 2024
443fd21
HTML search: use the `JSON.parse` `reviver` argument to freeze search…
jayaddison Nov 14, 2024
295d230
Merge branch 'master' into issue-13096/html-search-handle-proto-query
jayaddison Nov 14, 2024
3498755
Tests: fixup: accommodate adjusted `setIndex` prefix/suffix lengths
jayaddison Nov 14, 2024
f38ce20
HTML search: replace `repr` with simple embedded single-quotes
jayaddison Nov 14, 2024
0d9a26c
Revert "HTML search: use the `JSON.parse` `reviver` argument to freez…
jayaddison Nov 15, 2024
d4bba1e
WIP: HTML search: use JS `Map`s in serialized index
jayaddison Nov 15, 2024
d9dee52
HTML search: fixup for object search
jayaddison Nov 15, 2024
d056a8a
HTML search: ensure array-format index entries are sorted
jayaddison Nov 16, 2024
15a7dfa
HTML search: refactor-out redundant code
jayaddison Nov 16, 2024
6e670ea
HTML search: rectify variable names
jayaddison Nov 16, 2024
f2848ed
HTML search: compress index representation
jayaddison Nov 16, 2024
798cb48
HTML search: enclose index names in double-quotes
jayaddison Nov 16, 2024
af61293
Tests: HTML search: use `ast.literal_eval` to parse index contents
jayaddison Nov 16, 2024
2bcf7a6
Tests: HTML search: update test expectation
jayaddison Nov 16, 2024
381c26f
HTML search: cleanup: remove redundant `assert` statement
jayaddison Nov 16, 2024
ac281dc
HTML search: typing/lint fixups for `ruff` and `mypy`
jayaddison Nov 16, 2024
fae81a0
Edit CHANGES.rst entry
jayaddison Nov 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,9 @@ Bugs fixed

* #13060: HTML Search: use ``Map`` to store per-file term scores.
Patch by James Addison
* #13097: HTML Search: add a precautionary check for query term
presence in index properties before accessing them.
Patch by James Addison

Testing
-------
4 changes: 2 additions & 2 deletions sphinx/search/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -164,8 +164,8 @@ class _JavaScriptIndex:
on the documentation search object to register the index.
"""

PREFIX = 'Search.setIndex('
SUFFIX = ')'
PREFIX = "Search.setIndex('"
SUFFIX = "')"

def dumps(self, data: Any) -> str:
data_json = json.dumps(data, separators=(',', ':'), sort_keys=True)
Expand Down
9 changes: 6 additions & 3 deletions sphinx/themes/basic/static/searchtools.js
Original file line number Diff line number Diff line change
Expand Up @@ -219,7 +219,8 @@ const Search = {
(document.body.appendChild(document.createElement("script")).src = url),

setIndex: (index) => {
Search._index = index;
const reviver = (k, v) => (typeof v === "object" && v !== null) ? Object.freeze(v) : v;
Search._index = JSON.parse(index, reviver);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @wlach is right that immutability is orthoganal here. I think 'simply' using JSON.parse rather than an object literal would solve the __proto__ case. Would we then need to keep the hasOwnProperty checks?

An alternative would be to use a reviver function that constructs Map objects/serialise the index with Map literals, and use .get(), but that may also be expensive.

Suggested change
const reviver = (k, v) => (typeof v === "object" && v !== null) ? Object.freeze(v) : v;
Search._index = JSON.parse(index, reviver);
Search._index = JSON.parse(index);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @wlach is right that immutability is orthoganal here. I think 'simply' using JSON.parse rather than an object literal would solve the __proto__ case.

Ok, yep - I can revert 443fd21 to remove the use of Object.freeze / the reviver function.

Would we then need to keep the hasOwnProperty checks?

We should keep those, I think. Without them, queries for __proto__ may continue to inadvertently retrieve object prototypes instead of getting an undefined result.

An alternative would be to use a reviver function that constructs Map objects/serialise the index with Map literals, and use .get(), but that may also be expensive.

What do you think about constructing those Map objects directly in the searchindex.js file format, instead of using JSON.parse and a reviver?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about constructing those Map objects directly in the searchindex.js file format, instead of using JSON.parse and a reviver?

IMO this is definitely worth a shot, I suspect it'll be quite fast.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm giving it a try - it doesn't seem too tricky; the Python serialization of the search index becomes a little more complex. One challenge that I have yet to work out is how the Python unit tests should parse/load the resulting index (ref:

return json.loads(searchindex[16:-1])
).

if (Search._queued_query !== null) {
const query = Search._queued_query;
Search._queued_query = null;
Expand Down Expand Up @@ -513,9 +514,11 @@ const Search = {
// perform the search on the required terms
searchTerms.forEach((word) => {
const files = [];
// find documents, if any, containing the query word in their text/title term indices
// use Object.hasOwnProperty to avoid mismatching against prototype properties
const arr = [
{ files: terms[word], score: Scorer.term },
{ files: titleTerms[word], score: Scorer.title },
{ files: terms.hasOwnProperty(word) ? terms[word] : undefined, score: Scorer.term },
{ files: titleTerms.hasOwnProperty(word) ? titleTerms[word] : undefined, score: Scorer.title },
];
// add support for partial matches
if (word.length > 2) {
Expand Down
2 changes: 1 addition & 1 deletion tests/js/fixtures/cpp/searchindex.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions tests/js/fixtures/ecmascript/searchindex.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion tests/js/fixtures/multiterm/searchindex.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion tests/js/fixtures/partial/searchindex.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion tests/js/fixtures/titles/searchindex.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Empty file.
6 changes: 6 additions & 0 deletions tests/js/roots/ecmascript/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
ECMAScript
----------

This is a sample JavaScript (aka ``ECMAScript``) project used to generate a search engine index fixture.

Use the `__proto__` property to access the `prototype <https://developer.mozilla.org/en-US/docs/Learn/JavaScript/Objects/Object_prototypes>`_ (if any) of an object instance.
32 changes: 32 additions & 0 deletions tests/js/searchtools.spec.js
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,38 @@ describe('Basic html theme search', function() {

});

describe('can handle edge-case search queries', function() {

it('can search for the javascript prototype property', function() {
eval(loadFixture("ecmascript/searchindex.js"));

searchParameters = Search._parseQuery('__proto__');

hits = [
[
'index',
'ECMAScript',
'',
null,
5,
'index.rst',
'text'
]
];
expect(Search._performSearch(...searchParameters)).toEqual(hits);
});

it('does not find the javascript prototype property in unrelated documents', function() {
eval(loadFixture("partial/searchindex.js"));

searchParameters = Search._parseQuery('__proto__');

hits = [];
expect(Search._performSearch(...searchParameters)).toEqual(hits);
});

});

});

describe("htmlToText", function() {
Expand Down
2 changes: 1 addition & 1 deletion tests/test_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ def load_searchindex(path: Path) -> Any:
assert searchindex.startswith('Search.setIndex(')
assert searchindex.endswith(')')

return json.loads(searchindex[16:-1])
return json.loads(searchindex[17:-2])


def is_registered_term(index: Any, keyword: str) -> bool:
Expand Down
Loading