Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect grouping of search results between "Extensions" and "Other Matches" #1088

Open
lhsazevedo opened this issue Oct 8, 2024 · 1 comment

Comments

@lhsazevedo
Copy link
Contributor

lhsazevedo commented Oct 8, 2024

Background

In php/phd#154, we resolved the issue of missing pages in the search index. However, now that these pages are visible in search results, a long-standing bug in result grouping has become apparent.

Issue

Some search results are incorrectly categorized between the "Extensions" and "Other Matches" groups.

Example:

image
Query: security

As shown:

  1. "Security (PHP Manual)" appears in the "Extensions" group, although it is not a PHP extension.
  2. "Security consideration" (from the win32service extension) is incorrectly placed in the "Other Matches" group.

Cause

The client-side search code groups results based on types, including Function, Variable, Class, Exception, Extension, and Other Matches (general). These types are assigned according to the XML element tags in the manual's source.

Issue 1: Incorrect grouping in "Extensions"

The first issue occurs in this section of the code:

web-php/js/search.js

Lines 130 to 134 in 27fbef1

case "set":
case "book":
case "reference":
type = "extension";
break;

The code assumes that any entry with the element tag <book>, <set>, or <reference> is related to extensions, which is inaccurate. Many entries, though using these elements, do not belong to extensions.

Example data:

id ldesc element
getting-started Getting Started book
install Installation and Configuration book
... ... ...
reserved.variables Predefined Variables reference
wrappers Supported Protocols and Wrappers reference
... ... ...
SELECT "docbook_id", "ldesc", "element"
FROM "ids" 
WHERE "element" IN ('book','set','reference')

Issue 2: Incorrect grouping in "Other Matches"

The second issue is due to an assumption in the following code:

web-php/js/search.js

Lines 136 to 141 in 27fbef1

case "section":
case "chapter":
case "appendix":
case "article":
default:
type = "general";

The code assumes that entries with the tags <section>, <chapter>, <appendix>, or <article> do not belong to an extension. While this is not as bad, there are many pages that are part of an extension but are currently placed in the "Other Matches" group:

id ldesc element
... ... ...
apcu.installation Installation section
apcu.configuration Runtime Configuration section
... ... ...
pdo.setup Installing/Configuring chapter
pdo.constants Predefined Constants appendix
pdo.connections Connections and Connection management chapter
... ... ...
SELECT "docbook_id", "ldesc", "element"
FROM "ids" 
WHERE "element" IN ('section','chapter','appendix','article')

PHP Manual index dump

For convenience, here is the dump from the PHD SQLite index for the PHP Manual: php-manual-index_2024-10-08.sql.gz

Notes

@Girgias
Copy link
Member

Girgias commented Oct 8, 2024

Stream wrapper now (should) have the role="stream_wrapper" attribute on the <refentry> tag. So those should be easy to filter out.

I don't know why chapter/section are not considered part of an extension as this markup has existed for decades.

Same for the which has always listed constants.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants