Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect dissection behavior after consecutive delimiters #119264

Open
nielsbauman opened this issue Dec 24, 2024 · 2 comments
Open

Incorrect dissection behavior after consecutive delimiters #119264

nielsbauman opened this issue Dec 24, 2024 · 2 comments
Labels
>bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team

Comments

@nielsbauman
Copy link
Contributor

When we dissect a field, we're behaving incorrectly when the following conditions apply:

  • There is a delimiter in the pattern that is repeated multiple consecutive times - i.e. %{}|%{}|.
  • After that last delimiter, there are one or more characters that are not part of the extracted fields - i.e. foo=.
  • The value directly before the field that breaks the delimiter repetition is empty - i.e. baz|| or ||

If we use the pattern %{}|%{}|foo=%{field} to dissect the input string ||foo=bar, the field field is extracted as foo=bar whereas the expected value would be bar.

Possible workarounds are either removing the foo= part from the dissect pattern and removing it in an additional processor, or leaving the dissect pattern as-is and removing foo= in an additional processor (that is conditional or ignores failures).

Reproduction example
PUT _ingest/pipeline/my-pipeline
{
  "processors":
  [
    {
      "dissect":
      {
        "field": "message",
        "pattern": "%{}|%{}|foo=%{field}"
      }
    }
  ]
}


POST _ingest/pipeline/my-pipeline/_simulate
{
    "docs": [
        {
            "_source": {
                "message": "||foo=bar"
            }
        }
    ]
}
@nielsbauman nielsbauman added :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >bug Team:Data Management Meta label for data/management team labels Dec 24, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@nielsbauman
Copy link
Contributor Author

My guess is that is caused somewhere around where we handle consecutive delimiters:

// look for consecutive delimiters (e.g. a,,,,d,e)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

2 participants