-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Index Configuration Tool] Unified Dockerfile for ICT and Data Prepper #234
Conversation
index_configuration_tool/Dockerfile
Outdated
# Copy only source code | ||
COPY ./*.py . | ||
|
||
# update PATH | ||
ENV PATH=/root/.local:$PATH | ||
|
||
# make sure you include the -u flag to have our stdout logged | ||
ENTRYPOINT [ "python", "-u", "./main.py" ] | ||
CMD python -u ./main.py -r $DATA_PREPPER_PATH/pipelines/pipelines.yaml $DATA_PREPPER_PATH/pipelines/pipelines.yaml; $DATA_PREPPER_PATH/bin/data-prepper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more of a me-and-docker question, but why switch from ENTRYPOINT to CMD?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated on this line:
If there are no indices to migrate, the Docker container should ideally just shut down without running Data Prepper. Today, Data Prepper would still run and incorrectly migrate all indices. This is because the Dockerfile simply defines two commands in sequence.
This seems to be largely because we're reading from and writing to the same YAML file, and then running with that YAML file. Would the situation be "safer" if we e.g. renamed pipelines.yaml to original-pipelines.yaml, used that as the source for ./main.py
but output to pipelines.yaml
and then ran data-prepper
? Would that ensure that only the correct indices get migrated?
Incorrectly migrating indices (with no step for the user to manually intervene) seems like a fairly large bug to continue with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more of a me-and-docker question, but why switch from ENTRYPOINT to CMD?
Good question! I was using CMD
because the base Data Prepper Dockerfile uses CMD
, but it turns out ENTRYPOINT in the child image overrides the parent CMD. Updated.
This seems to be largely because we're reading from and writing to the same YAML file, and then running with that YAML file. Would the situation be "safer" if we e.g. renamed pipelines.yaml to original-pipelines.yaml, used that as the source for ./main.py but output to pipelines.yaml and then ran data-prepper? Would that ensure that only the correct indices get migrated?
Agree that this is a reasonable mitigation. I've changed the path for the input YAML file for ICT to not live in the data-prepper pipelines
directory. While this mitigates this bug, the Docker image still runs the commands in sequence, which causes Data Prepper to start but fail because no pipeline configuration is found. I'm tracking a fix for this in https://opensearch.atlassian.net/browse/MIGRATIONS-1218
… tool and Data Prepper This change updates the Dockerfile definition for index_configuration_tool to use Data Prepper as the base image. This will allow us to run the index_configuration_tool Python logic to configure indices on the target cluster before kicking off Data Prepper execution to move data over. Note that since these steps must occur in sequence, the child Docker image overrides the parent's CMD definition and must manually invoke the data-prepper executable. Signed-off-by: Kartik Ganesh <[email protected]>
06c482d
to
d35aa3b
Compare
Codecov Report
@@ Coverage Diff @@
## main #234 +/- ##
=========================================
Coverage 54.69% 54.69%
Complexity 490 490
=========================================
Files 65 65
Lines 2565 2565
Branches 235 235
=========================================
Hits 1403 1403
Misses 1035 1035
Partials 127 127
Flags with carried forward coverage won't be shown. Click here to find out more. |
Signed-off-by: Kartik Ganesh <[email protected]>
Description
This change updates the Dockerfile definition for index_configuration_tool (ICT) to use Data Prepper as the base image. This will allow us to run (as a single Docker image) the index_configuration_tool Python logic to configure indices on the target cluster before kicking off Data Prepper execution to move data over. Note that since these steps must occur in sequence, the child Docker image overrides the parent's CMD definition and must manually invoke the data-prepper executable.
This is a POC implementation, so there are some limitations:
_bulk
call fails during data migration, the index-creation steps performed by ICT are not rolled back. These must be manually removed before running the Docker image again. This is necessary because ICT errs on the side of caution - the current logic does not validatedoc_count
for identical indices so we may end up incorrectly overwriting documents on the target cluster. Even if thedoc_count
is identical, it would require a much deeper check to verify if the contents of the index are identical.doc_count
though (like with identical indices above)Issues Resolved
relates #165 and #163
Testing
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.