-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replayer Transformation support #937
Replayer Transformation support #937
Conversation
…via a base64 encoded string + a test sample script. Add a test/sample script to excise specific elements, when present, from a json tree and also drop a component of the URI path. Signed-off-by: Greg Schohn <[email protected]>
…er transforms for docs w/out bodies or with malformed bodies. Signed-off-by: Greg Schohn <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #937 +/- ##
============================================
+ Coverage 80.69% 80.81% +0.11%
- Complexity 2691 2726 +35
============================================
Files 398 400 +2
Lines 15317 15404 +87
Branches 956 967 +11
============================================
+ Hits 12360 12448 +88
+ Misses 2385 2382 -3
- Partials 572 574 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
These are deserialized into a list of json objects and are accessed via a new top-level payload key so that they're distinguishable from a single json object. Any leftover bytes are now appended after json objects. Both the single-json and ndjson-list (mutually exclusive) and the leftover bytes are all accessible for transformations. I've also added a transformation script to the replayer's command for docker-compose to exercise json body transformations. Signed-off-by: Greg Schohn <[email protected]>
8274fe8
to
b39eb8e
Compare
…ents its own EntrySet... Accesses to the TreeMap derived object weren't causing faults from transformations, so the it never looked like anybody was trying to access payload contents, causing the transforming pipeline to NOT parse the json or pass the bodies to the transformers. Also, when there is no payload at all, we'll pass that through as an empty byte array. It might be better to pass no payload at all, but that will be a slightly greater change. At the moment, this change allows the OpenSearch Benchmark test run traffic to completely move over to the target. Signed-off-by: Greg Schohn <[email protected]>
…rmations from docker-compose as the default. On the last point, I'll continue to look into differences that could be causing e2e tests to fail. Signed-off-by: Greg Schohn <[email protected]>
374b82f
to
4d27024
Compare
@@ -78,7 +78,7 @@ services: | |||
condition: service_started | |||
opensearchtarget: | |||
condition: service_started | |||
command: /bin/sh -c "/runJavaWithClasspath.sh org.opensearch.migrations.replay.TrafficReplayer --speedup-factor 2 https://opensearchtarget:9200 --auth-header-value Basic\\ YWRtaW46bXlTdHJvbmdQYXNzd29yZDEyMyE= --insecure --kafka-traffic-brokers kafka:9092 --kafka-traffic-topic logging-traffic-topic --kafka-traffic-group-id logging-group-default --otelCollectorEndpoint http://otel-collector:4317" | |||
command: /bin/sh -c "/runJavaWithClasspath.sh org.opensearch.migrations.replay.TrafficReplayer --speedup-factor 2 https://opensearchtarget:9200 --auth-header-value Basic\\ YWRtaW46bXlTdHJvbmdQYXNzd29yZDEyMyE= --insecure --kafka-traffic-brokers kafka:9092 --kafka-traffic-topic logging-traffic-topic --kafka-traffic-group-id logging-group-default --otelCollectorEndpoint http://otel-collector:4317" #--transformer-config-base64 W3sgIkpzb25Kb2x0VHJhbnNmb3JtZXJQcm92aWRlciI6ClsKICB7CiAgICAic2NyaXB0IjogewogICAgICAib3BlcmF0aW9uIjogInNoaWZ0IiwKICAgICAgInNwZWMiOiB7CiAgICAgICAgIm1ldGhvZCI6ICJtZXRob2QiLAogICAgICAgICJVUkkiOiAiVVJJIiwKICAgICAgICAiaGVhZGVycyI6ICJoZWFkZXJzIiwKICAgICAgICAicGF5bG9hZCI6IHsKICAgICAgICAgICJpbmxpbmVkSnNvbkJvZHkiOiB7CiAgICAgICAgICAgICJ0b3AiOiB7CiAgICAgICAgICAgICAgInRhZ1RvRXhjaXNlIjogewogICAgICAgICAgICAgICAgIioiOiAicGF5bG9hZC5pbmxpbmVkSnNvbkJvZHkudG9wLiYiIAogICAgICAgICAgICAgIH0sCiAgICAgICAgICAgICAgIioiOiAicGF5bG9hZC5pbmxpbmVkSnNvbkJvZHkudG9wLiYiCiAgICAgICAgICAgIH0sCiAgICAgICAgICAiKiI6ICJwYXlsb2FkLmlubGluZWRKc29uQm9keS4mIgogICAgICAgICAgfQogICAgICAgIH0KICAgICAgfQogICAgfQogIH0sIAogewogICAic2NyaXB0IjogewogICAgICJvcGVyYXRpb24iOiAibW9kaWZ5LW92ZXJ3cml0ZS1iZXRhIiwKICAgICAic3BlYyI6IHsKICAgICAgICJVUkkiOiAiPXNwbGl0KCcvZXh0cmFUaGluZ1RvUmVtb3ZlJyxAKDEsJikpIgogICAgIH0KICB9CiB9LAogewogICAic2NyaXB0IjogewogICAgICJvcGVyYXRpb24iOiAibW9kaWZ5LW92ZXJ3cml0ZS1iZXRhIiwKICAgICAic3BlYyI6IHsKICAgICAgICJVUkkiOiAiPWpvaW4oJycsQCgxLCYpKSIKICAgICB9CiAgfQogfQpdCn1dCg==" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This command becomes unreadable with the base64 encoded here - can we instead load it from a file path to improve comprehension?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to accepting base64 encoding directly over the command line, but what I'm struggling with is understanding the escaping/quotes on this like. You have this section:
#--transformer-config-base64
which seems to imply the entire base64 block is part of a non-actionable comment, and the final double quote at the end of the line seems to confirm that (e.g. the line has a non-even number of quotes).
If that's the case - should this either be deleted or added to the actual command?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peternied - I don't expect customers to be loading this from a file. They could, but that would require them to get a file onto a replayer container. For our CDK deployment, it's much easier to add an extra argument to pass to the replayer than to add an additional file. As such, I was hoping to test something closer to what more customers would be doing.
@chelma - It's a comment for now. I'm debugging through why its inclusion caused some of the E2E tests to fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't expect customers to be loading this from a file.
How do you expect customer will be building up the configuration? I would expect them to be iterating in a text editor, does your workflow expect them to process then and then copy/paste into a cdk variable and then redeploy.
If we are using a file on a shared EFS volume that text editor could be vim on the migration console, this could be reloaded on boot of the container. This means no deployed is required, just stop/start on the TC instance... or at least that is where my head is at.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please file a new feature for setting up the EFS volume for that. We can tackle that separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. This is my first time looking through this code (really should have investigated it earlier), and I'm struggling a bit to see how we'd take something like the HttpJsonTransformingConsumer
and generalize its mechanisms to work for, say, the transformations we need for RFS. Curious what your thoughts are on that?
@@ -78,7 +78,7 @@ services: | |||
condition: service_started | |||
opensearchtarget: | |||
condition: service_started | |||
command: /bin/sh -c "/runJavaWithClasspath.sh org.opensearch.migrations.replay.TrafficReplayer --speedup-factor 2 https://opensearchtarget:9200 --auth-header-value Basic\\ YWRtaW46bXlTdHJvbmdQYXNzd29yZDEyMyE= --insecure --kafka-traffic-brokers kafka:9092 --kafka-traffic-topic logging-traffic-topic --kafka-traffic-group-id logging-group-default --otelCollectorEndpoint http://otel-collector:4317" | |||
command: /bin/sh -c "/runJavaWithClasspath.sh org.opensearch.migrations.replay.TrafficReplayer --speedup-factor 2 https://opensearchtarget:9200 --auth-header-value Basic\\ YWRtaW46bXlTdHJvbmdQYXNzd29yZDEyMyE= --insecure --kafka-traffic-brokers kafka:9092 --kafka-traffic-topic logging-traffic-topic --kafka-traffic-group-id logging-group-default --otelCollectorEndpoint http://otel-collector:4317" #--transformer-config-base64 W3sgIkpzb25Kb2x0VHJhbnNmb3JtZXJQcm92aWRlciI6ClsKICB7CiAgICAic2NyaXB0IjogewogICAgICAib3BlcmF0aW9uIjogInNoaWZ0IiwKICAgICAgInNwZWMiOiB7CiAgICAgICAgIm1ldGhvZCI6ICJtZXRob2QiLAogICAgICAgICJVUkkiOiAiVVJJIiwKICAgICAgICAiaGVhZGVycyI6ICJoZWFkZXJzIiwKICAgICAgICAicGF5bG9hZCI6IHsKICAgICAgICAgICJpbmxpbmVkSnNvbkJvZHkiOiB7CiAgICAgICAgICAgICJ0b3AiOiB7CiAgICAgICAgICAgICAgInRhZ1RvRXhjaXNlIjogewogICAgICAgICAgICAgICAgIioiOiAicGF5bG9hZC5pbmxpbmVkSnNvbkJvZHkudG9wLiYiIAogICAgICAgICAgICAgIH0sCiAgICAgICAgICAgICAgIioiOiAicGF5bG9hZC5pbmxpbmVkSnNvbkJvZHkudG9wLiYiCiAgICAgICAgICAgIH0sCiAgICAgICAgICAiKiI6ICJwYXlsb2FkLmlubGluZWRKc29uQm9keS4mIgogICAgICAgICAgfQogICAgICAgIH0KICAgICAgfQogICAgfQogIH0sIAogewogICAic2NyaXB0IjogewogICAgICJvcGVyYXRpb24iOiAibW9kaWZ5LW92ZXJ3cml0ZS1iZXRhIiwKICAgICAic3BlYyI6IHsKICAgICAgICJVUkkiOiAiPXNwbGl0KCcvZXh0cmFUaGluZ1RvUmVtb3ZlJyxAKDEsJikpIgogICAgIH0KICB9CiB9LAogewogICAic2NyaXB0IjogewogICAgICJvcGVyYXRpb24iOiAibW9kaWZ5LW92ZXJ3cml0ZS1iZXRhIiwKICAgICAic3BlYyI6IHsKICAgICAgICJVUkkiOiAiPWpvaW4oJycsQCgxLCYpKSIKICAgICB9CiAgfQogfQpdCn1dCg==" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to accepting base64 encoding directly over the command line, but what I'm struggling with is understanding the escaping/quotes on this like. You have this section:
#--transformer-config-base64
which seems to imply the entire base64 block is part of a non-actionable comment, and the final double quote at the end of the line seems to confirm that (e.g. the line has a non-even number of quotes).
If that's the case - should this either be deleted or added to the actual command?
@@ -140,8 +148,9 @@ To run only one transformer without any configuration, the `--transformer-config | |||
be set to the name of the transformer (e.g. 'JsonTransformerForOpenSearch23PlusTargetTransformerProvider', | |||
without quotes or any json surrounding it). | |||
|
|||
The user can also specify a file to read the transformations from using the `--transformer-config-file`, but can't use | |||
both transformer options. | |||
The user can also specify a file to read the transformations from using the `--transformer-config-file`. Users can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious if you have a plan for how to get the user-supplied --transformer-config-file
onto the same host/container has the transformer code in a cloud-based (or non-laptop) deployment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not planning for that w/ the CDK deployment today, which is why I put a Base64 option in there too. Further out, it might make sense to pull it via a parameter store and write it as part of an init task. I'm going to try not bother with changing things like container initialization until I'm done w/ the K8s investigation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, thanks.
"--insecure" }, arity = 0, description = "Do not check the server's certificate") | ||
@Parameter( | ||
required = false, | ||
names = {"--insecure" }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking nitpick - unbalanced spacing inside the curly braces
REMOVE_AUTH_HEADER_VALUE_ARG }, arity = 0, description = "Remove the authorization header if present and do not replace it with anything. " | ||
@Parameter( | ||
required = false, | ||
names = {REMOVE_AUTH_HEADER_VALUE_ARG }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking nitpick - unbalanced spacing inside the curly braces
AWS_AUTH_HEADER_USER_AND_SECRET_ARG }, arity = 2, description = "<USERNAME> <SECRET_ARN> pair to specify " | ||
@Parameter( | ||
required = false, names = { | ||
AWS_AUTH_HEADER_USER_AND_SECRET_ARG }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking nitpick - unbalanced spacing inside the curly braces
@Parameter(required = false, names = { | ||
AWS_AUTH_HEADER_USER_AND_SECRET_ARG }, arity = 2, description = "<USERNAME> <SECRET_ARN> pair to specify " | ||
@Parameter( | ||
required = false, names = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking nitpick - inconsistent new lines
As a developer and a user, I would expect to see extensibility in a similar way for metadata migration and document migration. That means having a simple dependency-free API from the metadata/document migration processes to call into hot-loadable classes (via ServiceProvider). Beyond that, I'd like to see the same translation environments supported for each of our 3 migration applications. Having a shared CompositeTransformer for all of these could be a start. That composite transformer may eventually handle conditionals and maybe is represented as a graph or tree. At that point, it should be shared by all of the applications that need to support transformations. On the other side - what is being transformed, I'm not sure what we'll want to share. Ideally, it's a very consistent environment that develops. For example, the CompositeTransformer and TransformationLoader (that configures transformations) should be consistently shared. Admittedly, both of these are super simple/lightweight today, but complexity and value aren't necessarily correlated. Further into the future, I'd suspect that we'll want to develop "document" migrations for metadata and backfill document migrations. These document migrations should be able to be pulled into the replayer since that's going to need to do all of the transformations of the other two systems since RESTful calls can go into the source w/ those document types. I suspect that we'll build up a library of transforms, each which may be configured. That library will be usable from all of the applications and may also allow transformations to be embedded/extended, etc. So - things to care about for today. What are document types. Right now, for the replayer there are 3 kinds of documents - json, ndjson, binary. Is that enough? I suspect that for document and metadata migration, those will likely just need to care about json transformations. Ideally, the replayer will be able to consume those shared transformations in the future. |
Signed-off-by: Greg Schohn <[email protected]> # Conflicts: # TrafficCapture/trafficReplayer/src/main/java/org/opensearch/migrations/replay/datahandlers/NettyPacketToHttpConsumer.java # TrafficCapture/trafficReplayer/src/test/java/org/opensearch/migrations/replay/datahandlers/http/HttpJsonTransformingConsumerTest.java
...cCapture/trafficReplayer/src/main/java/org/opensearch/migrations/replay/TrafficReplayer.java
Show resolved
Hide resolved
...ficReplayer/src/main/java/org/opensearch/migrations/replay/datahandlers/JsonAccumulator.java
Outdated
Show resolved
Hide resolved
.../java/org/opensearch/migrations/replay/datahandlers/http/NettyJsonBodyAccumulateHandler.java
Show resolved
Hide resolved
.../java/org/opensearch/migrations/replay/datahandlers/http/NettyJsonBodyAccumulateHandler.java
Show resolved
Hide resolved
There were some refcount bugfixes for partial json streams and command line parsing. More importantly, there was a test (testMalformedPayload_andTypeMappingUri_IsPassedThrough) that sent malformed JSON in a payload. The test didn't do much and a comment admitted that we weren't likely doing the right thing. Moving a buffer to Unpooled caused the test to fail w/ a leak. Upon closer inspection, errors while parsing JSON didn't cause the payload contents to be dumped into the inlinedBinaryBody key of the payload. Now that content is captured and passed forward (plus, refCounted appropriately, though still as an Unpooled to minimize risk from transformations [see code comment]). In the case of this existing test, which has been moved to HttpJsonTransformingConsumerTest, the JsonTransformerForOpenSearch23PlusTargetTransformerProvider that the test uses throws an exception when the json payload isn't present. That was also not being handled properly. Now it is handled by marking the transformation as an error w/ the exception wrapped in a "TransformationException". In that case, the transformation status is marked as an error and the contents of the entire message will be empty. It feels more appropriate that if a transformation threw to be as conservative as possible and not put anything on the wire, in case some part of it was sensitive. Lastly, I refactored the AggregatedRawResponse into a base class for AggregatedRawResult. We use that class to accumulate transformation results. It was extremely confusing to see an HTTP response and the word response on so many of the fields. Now the word 'result' is used and the usage in various contexts makes more sense. Signed-off-by: Greg Schohn <[email protected]>
279945b
to
c48b6c4
Compare
Description
Json transformations can work when bodies aren't present and when the bodies are of 'ndjson' types (like OpenSearch bulk requests).
There was also a test (testMalformedPayload_andTypeMappingUri_IsPassedThrough) that sent malformed JSON in a payload. The test didn't do much and a comment admitted that we weren't likely doing the right thing. Moving a buffer to Unpooled caused the test to fail w/ a leak. Upon closer inspection, errors while parsing JSON didn't cause the payload contents to be dumped into the inlinedBinaryBody key of the payload. Now that content is captured and passed forward (plus, refCounted appropriately, though still as an Unpooled to minimize risk from transformations [see code comment]). In the case of this existing test, which has been moved to HttpJsonTransformingConsumerTest, the JsonTransformerForOpenSearch23PlusTargetTransformerProvider that the test uses throws an exception when the json payload isn't present. That was also not being handled properly. Now it is handled by marking the transformation as an error w/ the exception wrapped in a "TransformationException". In that case, the transformation status is marked as an error and the contents of the entire message will be empty. It feels more appropriate that if a transformation threw to be as conservative as possible and not put anything on the wire, in case some part of it was sensitive.
Lastly, I refactored the AggregatedRawResponse into a base class for AggregatedRawResult. We use that class to accumulate transformation results. It was extremely confusing to see an HTTP response and the word response on so many of the fields. Now the word 'result' is used and the usage in various contexts makes more sense.
Issues Resolved
https://opensearch.atlassian.net/browse/MIGRATIONS-1984
Testing
Unit testing and docker compose testing.
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.