-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fetch Migration] Improvements to subprocess handling #372
Conversation
Codecov Report
@@ Coverage Diff @@
## main #372 +/- ##
============================================
- Coverage 66.46% 66.44% -0.03%
+ Complexity 731 727 -4
============================================
Files 81 80 -1
Lines 3170 3165 -5
Branches 286 286
============================================
- Hits 2107 2103 -4
+ Misses 879 878 -1
Partials 184 184
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added comments to the code as best as I could follow it.
Some documentation in the main files about what you're doing, what the dependencies are doing, and what the responsibilities are for everything would help make a review from me more valuable. I don't know if are issues with the current patch because I can't fully assimilate the starting case yet.
|
||
__DP_EXECUTABLE_SUFFIX = "/bin/data-prepper" | ||
__PIPELINE_OUTPUT_FILE_SUFFIX = "/pipelines/pipeline.yaml" | ||
|
||
|
||
def run(dp_base_path: str, dp_config_file: str, dp_endpoint: str): | ||
def run(dp_base_path: str, dp_config_file: str, dp_endpoint: str) -> Optional[int]: | ||
dp_exec_path = dp_base_path + __DP_EXECUTABLE_SUFFIX | ||
output_file = dp_base_path + __PIPELINE_OUTPUT_FILE_SUFFIX | ||
metadata_migration_params = MetadataMigrationParams(dp_config_file, output_file, report=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can your code confirm that DataPrepper is bound to only localhost? I could see a lot of difficulty happening if a second managing agent came along and started acting on it - or if somebody else DDOS'ed the API (while it was causing a lot of enery on the source cluster).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this code needs an update to stop accepting dp_endpoint
. Currently the localhost
configuration is driven only by the Dockerfile:
ENTRYPOINT python3 -u ./fetch_orchestrator.py $DATA_PREPPER_PATH $FM_CODE_PATH/input.yaml http://localhost:4900 |
This is obviously not a strong check. I'll make that change in a follow-up PR.
Also includes a minor optimization to short-circuit logic when there are no source indices found, along with a unit test. Signed-off-by: Kartik Ganesh <[email protected]>
The migration_monitor module now includes a separate method to monitor a Data Prepper subprocess. In addition to interacting/monitoring via Data Prepper API calls (which is the existing implementation) the new, "local" monitoring method adds logic to poll the running subprocess and its return code. Unit tests for these use-cases have been added/updated. Signed-off-by: Kartik Ganesh <[email protected]>
This enables quitting the Fetch Migration workflow with a return_code bubbled up from migration_monitor.py, which can help indicate a successful (return code zero) or non-successful execution. Signed-off-by: Kartik Ganesh <[email protected]>
Signed-off-by: Kartik Ganesh <[email protected]>
1d28731
to
6af8de7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
CodeCov failures are looking at Java code, which this PR doesn't touch. I'll go ahead and merge this. |
Description
The
migration_monitor.py
module now includes a separate function (monitor_local
) to monitor a Data Prepper subprocess.Previously, the migration monitor only interacted with the Data Prepper process via API calls. This has the limitation that the monitor cannot differentiate between the subprocess intermittently not responding (or not yet started) versus a non-running subprocess. Thus, the API calls simply retry indefinitely if no response is received - which resulted in a indefinite polling loop if the subprocess had crashed or ended (say due to a bad configuration, OOM error, etc.)
The new function adds logic to poll the running subprocess and its return code (via Python's
Popen
object) - this is in addition to the monitoring via API calls. Unit tests for these use-cases have been added/updated.This PR also includes an additional commit which adds a fix to
index_operations.py
to filter out system indices (so they are excluded from metadata migration). Short-circuit logic for empty source clusters has been added, along with related unit tests.Testing
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.