Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] dynamic method [java.lang.String, sha256/0] not available in _reindex Painless script #16423

Open
dkvasnicka opened this issue Oct 22, 2024 · 8 comments
Labels
enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing untriaged

Comments

@dkvasnicka
Copy link

dkvasnicka commented Oct 22, 2024

Describe the bug

When executing a _reindex call during which I want to use a script to change the ID of each document I'm getting:

{
  "error": {
    "root_cause": [
      {
        "type": "script_exception",
        "reason": "runtime error",
        "script_stack": [
          "ctx._id = ctx._source.url.sha256()",
          "                         ^---- HERE"
        ],
        "script": "ctx._id = ctx._source.url.sha256()",
        "lang": "painless",
        "position": {
          "offset": 25,
          "start": 0,
          "end": 34
        }
      }
    ],
    "type": "script_exception",
    "reason": "runtime error",
    "script_stack": [
      "ctx._id = ctx._source.url.sha256()",
      "                         ^---- HERE"
    ],
    "script": "ctx._id = ctx._source.url.sha256()",
    "lang": "painless",
    "position": {
      "offset": 25,
      "start": 0,
      "end": 34
    },
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "dynamic method [java.lang.String, sha256/0] not found"
    }
  },
  "status": 400
}

This was supposedly fixed in ES this April (no associated PR as of now): elastic/elasticsearch#107556

My AWS console says an upgrade to 2.15 is available so if this is no longer an issue in this version please let me know 🙏

Related component

Indexing

To Reproduce

POST /_reindex
{
   "source":{
      "index":"src"
   },
   "dest":{
      "index":"dest"
   },
   "script":{
      "lang":"painless",
      "source":"ctx._id = ctx._source.url.sha256()"
   }
}

Expected behavior

Source index is fully reindexed into the dest index with ID of each document being equal to SHA-256 of the selected String field.

Additional Details

Host/Environment (please complete the following information):
Amazon OpenSearch Service
OpenSearch 2.13
OpenSearch_2_13_R20240520-P5

@dkvasnicka dkvasnicka added bug Something isn't working untriaged labels Oct 22, 2024
@github-actions github-actions bot added the Indexing Indexing, Bulk Indexing and anything related to indexing label Oct 22, 2024
@dblock
Copy link
Member

dblock commented Oct 23, 2024

Elasticsearch is a different product. Try reproducing on the latest open source version of OpenSearch (2.17)?

A failing YAML REST test would be helpful, https://github.com/opensearch-project/OpenSearch/blob/main/TESTING.md#testing-the-rest-layer.

@dkvasnicka
Copy link
Author

dkvasnicka commented Oct 23, 2024

Elasticsearch is a different product.

I don't think we need to dive depper into this rabbit hole. I think it's obvious that if this bug was present in ES up to recent times it's very likely that it is or was also present in OS, unless you guys found it and fixed it (no way for me to know - hence this report/question) or someone else reported it and you fixed it (does not seem to be the case).

Try reproducing on the latest open source version of OpenSearch (2.17)?

Sorry, I lack the motivation to spend time on that because I could not use it even if I wanted to as AWS does not offer it, only 2.15. This is not an issue that would make me host the entire cluster myself just to get 2.17 and doing an upgrade of a cluster with millions of documents in it to 2.15 just to find out if you guys maybe fixed this also does not sound like a reasonable course of action to me.

Would it be better to raise this issue directly with AWS support?

@dblock
Copy link
Member

dblock commented Oct 23, 2024

I meant to try it locally with docker, which is easy.

I started OpenSearch with OPENSEARCH_INITIAL_ADMIN_PASSWORD=... docker run -d -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" opensearchproject/opensearch:latest.

Inserted a document into an index.

curl -k -X POST https://localhost:9200/movies/_doc --json '{"name":"Solaris"}' -u admin:$OPENSEARCH_PASSWORD

Tried to reindex.

curl -k https://localhost:9200/_reindex -u admin:$OPENSEARCH_PASSWORD --json '
{
   "source":{
      "index":"movies"
   },
   "dest":{
      "index":"movies2"
   },
   "script":{
      "lang":"painless",
      "source":"ctx._id = ctx._source.url.sha256()"
   }
}' | jq

It failed with the same problem.

{
  "error": {
    "root_cause": [
      {
        "type": "script_exception",
        "reason": "runtime error",
        "script_stack": [
          "ctx._id = ctx._source.url.sha256()",
          "                         ^---- HERE"
        ],
        "script": "ctx._id = ctx._source.url.sha256()",
        "lang": "painless",
        "position": {
          "offset": 25,
          "start": 0,
          "end": 34
        }
      }
    ],
    "type": "script_exception",
    "reason": "runtime error",
    "script_stack": [
      "ctx._id = ctx._source.url.sha256()",
      "                         ^---- HERE"
    ],
    "script": "ctx._id = ctx._source.url.sha256()",
    "lang": "painless",
    "position": {
      "offset": 25,
      "start": 0,
      "end": 34
    },
    "caused_by": {
      "type": "null_pointer_exception",
      "reason": "Cannot invoke \"Object.getClass()\" because \"callArgs[0]\" is null"
    }
  },
  "status": 400
}

So it's not fixed.

Amazon managed service gets updated often, I don't have an ETA for 2.17, but I would imagine it's not far out, but it won't help here. We need to fix the bug.

If someone wants to PR a fix please be mindful that you cannot take non-APLv2-compatible code from Elasticsearch. I'd write a REST YAML test to begin with and the debug the actual issue.

@dkvasnicka
Copy link
Author

Thanks. You forgot to take sha256() of name instead of url so you got a different (expected) error but I tried it on my local and the issue is still there.

@dblock
Copy link
Member

dblock commented Oct 24, 2024

Yes, that's right, I don't have url in the document it should have been name.

    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "dynamic method [java.lang.String, sha256/0] not found"
    }

@dblock
Copy link
Member

dblock commented Oct 24, 2024

Looking at this closer @dkvasnicka, why would you expect String to have a sha256 method? Is this in some documentation? (I do see it here so maybe this was intended to work but I don't see any tests for it.). Does sha256() work in other contexts?

Updating the ID works fine (e.g. "source":"ctx._id = ctx._source._id + \"_updated\"").

@dkvasnicka
Copy link
Author

why would you expect String to have a sha256 method?

Unfortunately the OS documentation has pretty much zero content on the Painless language, available functions, contexts etc. So I went by the fact that it's defined in the Painless module in a class that says Additional methods added to classes. These must be static methods with receiver as first argument.

However, as you pointed out the methods seem to be only whitelisted for Ingest pipelines scripts, nothing else, and now I see the commit that introduced this clearly mentions it. So technically speaking this is not a bug. ES decided to eventually whitelist these methods for all contexts but that's obviously their decision, what you do is up to you.

Also, since ingest pipelines can be executed during the _reindex operation it might be possible to actually achieve what I wanted by using a pipeline instead of using a script directly 💡

@dblock dblock added enhancement Enhancement or improvement to existing feature or request and removed bug Something isn't working labels Oct 24, 2024
@dblock dblock changed the title [BUG] dynamic method [java.lang.String, sha256/0] not found in _reindex Painless script [BUG] dynamic method [java.lang.String, sha256/0] not available in _reindex Painless script Oct 24, 2024
@dblock
Copy link
Member

dblock commented Oct 24, 2024

Thanks @dkvasnicka, I think we got to the bottom of this. I don't see why we wouldn't want to allow-list this method for _reindex. If you or someone wants to contribute this seems relatively straightforward: I'd write some YAML REST tests for this (and other) methods, then register the set for all contexts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing untriaged
Projects
None yet
Development

No branches or pull requests

2 participants