Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Hide Search Response Processors #16836

Open
mingshl opened this issue Dec 12, 2024 · 4 comments
Open

[RFC] Hide Search Response Processors #16836

mingshl opened this issue Dec 12, 2024 · 4 comments
Labels
enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc

Comments

@mingshl
Copy link
Contributor

mingshl commented Dec 12, 2024

Is your feature request related to a problem? Please describe

Overview:
The Hide Search Response Processor is a new type of search response processor proposed to enhance the flexibility and control of search results. This processor allows users to selectively hide one or more fields, or even all fields, from the search hits in the final response. This capability is particularly useful when chaining multiple processors together and when certain fields are used for intermediate processing but are not required in the final output.

Key Features:

Selective Field Hiding: Users can specify which fields to hide from the search response.
Complete Hit Hiding: Option to hide all fields in a hit, leaving only metadata.
Chain-friendly: Designed to work seamlessly with other response processors.

Use Cases:

Data Privacy: Hide sensitive fields before returning results to the client.
Response Optimization: Reduce hit size by removing unnecessary fields.
Intermediate Processing: Hide fields used for internal processing but not needed in the final output.
Custom View Creation: Create tailored views of documents for different use cases or user roles.

Describe the solution you'd like

Example Scenario:

Consider a document with a field "long_text" containing a large string. The search process might involve:

Applying a split response processor to create a new field "long_text_array" from "long_text".
Performing operations on "long_text_array".
Using the Hide Search Response Processor to remove "long_text" from the final response.
This workflow allows for efficient processing while keeping the final response clean and focused on relevant data.

Configuration Example:

{
  "processors" : [
    {
      "hide_fields" : {
        "fields" : ["long_text", "internal_field"],
        "ignore_missing": true
      }
    }
  ]
}

Benefits:

Improved Response Clarity: Cleaner, more focused search results.
Reduced Data Transfer: Smaller response payloads when unnecessary fields are hidden.
Enhanced Processing Flexibility: Facilitates more complex search and processing pipelines.
Better Control Over Data Exposure: Helps in managing what data is returned to different clients or applications.

Related component

Search

Describe alternatives you've considered

Detail Discussion:

  1. consider the nested map, nested array, can we remove part of the nested map or nested array in the response?
  2. should we used a short cut configuration field to hide all fields, for example, when "hide_all" is true then hide all the fields in "_source"?

Additional context

there is a remove ingest processor that does similar functionality during ingestion. https://github.com/opensearch-project/OpenSearch/blob/main/modules/ingest-common/src/main/java/org/opensearch/ingest/common/RemoveProcessor.java

@mingshl mingshl added enhancement Enhancement or improvement to existing feature or request untriaged labels Dec 12, 2024
@github-actions github-actions bot added the Search Search query, autocomplete ...etc label Dec 12, 2024
@ylwu-amzn
Copy link
Contributor

ylwu-amzn commented Dec 12, 2024

Is this similar to remove ingest processor https://opensearch.org/docs/latest/ingest-pipelines/processors/remove/ ?
Are we going to "hide" or "remove" some field from response?

@mingshl
Copy link
Contributor Author

mingshl commented Dec 12, 2024

Is this similar to remove ingest processor https://opensearch.org/docs/latest/ingest-pipelines/processors/remove/ ? Are we going to "hide" or "remove" some field from response?

yes it's similar to remove processor.

During ingest, the remove processor will remove the field from ingest document. That field will not be indexed.

But during search, this hide processor is removing the field from search response but the field will still persist in the index document, hope this make sense. I personally prefer naming it hide processor because the field still exist in index.

If it's better to keep naming consistency with the naming from ingest processor, we can name it remove processor as well. Opinion wanted! @martin-gaievski @owaiskazi19 @msfroh

@msfroh
Copy link
Collaborator

msfroh commented Dec 12, 2024

I'm somewhat inclined to go with remove for consistency with the Ingest processor. If we ever implement the DocumentProcessor idea, the implementation could be shared.

Note that a more efficient alternative involves the user specifying what fields they want returned in their request, using the "_source":[ ... fields ...] parameter in their search request. That limits the fields returned from shards to the coordinator. (Using a processor at the coordinator would reduce bytes transferred from the coordinator back to the client, but does nothing to reduce node-to-node transfer.) That said, I can imagine a scenario where you want to return a field from the shards to the coordinator so it can be consumed/used by another response processor, but then we want to remove it from the final result.

@brianf-aws
Copy link

brianf-aws commented Dec 12, 2024

consider the nested map, nested array, can we remove part of the nested map or nested array in the response?

The ByField rerank type has this functionality of removing the the field even with a nested map. I havent thought of a nested array but I think A similar approach can occur.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc
Projects
Status: 🆕 New
Development

No branches or pull requests

5 participants