Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS SigV4 support in Fetch Migration #394

Merged
merged 6 commits into from
Nov 10, 2023

Conversation

kartg
Copy link
Member

@kartg kartg commented Nov 8, 2023

Description

  • Category: Enhancement, New feature, Refactoring

This PR enables support for SigV4 signed requests to AWS OpenSearch / OpenSearch Serverless endpoints in Fetch Migration, using the requests-aws4auth library. To achieve this, this change adds several components:

1 - All “endpoint” related logic is now encapsulated in the EndpointInfo class, including logic to construct API paths instead of having callers compute this.

2 - Construction of EndpointInfo instances from the Data Prepper pipeline configuration (and its plugin configuration sub-sections) has been moved out of metadata_migration.py to a new endpoint_utils.py file for better abstraction.

3 - Use of SigV4 is inferred from the supplied DP pipeline (separately for source and sink), including detection of the serverless key to change the service name (‘es’ vs. ‘aoss’)

4 - Since AWS4Auth requires a region argument, Fetch Migration first checks the plugin configuration for an explicitly defined region. If this is not present (since it is an optional parameter), the code attempts to derive the region based on the service endpoint URL (since generated endpoint URLs usually include the region). If a region cannot be inferred, a ValueError is thrown.

Unit tests for all of these components have been added (or existing ones updated). A minor refactoring of logging in migration_monitor.py is also included, which improves unit test code coverage.

Testing

Migration to an OpenSearch Serverless collection has also been tested and verified via the CDK.

$ python -m coverage report --omit "*/tests/*"
Name                           Stmts   Miss  Cover
--------------------------------------------------
endpoint_info.py                  24      1    96%
endpoint_utils.py                107      1    99%
fetch_orchestrator.py             70      0   100%
fetch_orchestrator_params.py      22      0   100%
index_operations.py               38      0   100%
metadata_migration.py             62      0   100%
metadata_migration_params.py       7      0   100%
metadata_migration_result.py       5      0   100%
migration_monitor.py              89      0   100%
migration_monitor_params.py        6      0   100%
progress_metrics.py               91      0   100%
utils.py                          13      0   100%
--------------------------------------------------
TOTAL                            534      2    99%

Check List

  • New functionality includes testing
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link

codecov bot commented Nov 8, 2023

Codecov Report

Merging #394 (98008f0) into main (353ab11) will decrease coverage by 0.10%.
The diff coverage is n/a.

@@             Coverage Diff              @@
##               main     #394      +/-   ##
============================================
- Coverage     63.64%   63.55%   -0.10%     
+ Complexity      718      715       -3     
============================================
  Files            82       82              
  Lines          3298     3298              
  Branches        303      303              
============================================
- Hits           2099     2096       -3     
- Misses         1011     1014       +3     
  Partials        188      188              
Flag Coverage Δ
unittests 63.55% <ø> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 2 files with indirect coverage changes

Copy link
Collaborator

@sumobrian sumobrian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kartg kartg mentioned this pull request Nov 9, 2023
4 tasks
Copy link
Collaborator

@sumobrian sumobrian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm. Minor changes -- no need to re-review. One thing to double check is the documentation. I believe this was documented in a sample yml but didn't double check.

FetchMigration/python/endpoint_utils.py Outdated Show resolved Hide resolved
FetchMigration/python/endpoint_utils.py Show resolved Hide resolved
FetchMigration/python/endpoint_utils.py Outdated Show resolved Hide resolved
Comment on lines 44 to 50
# Utility method to creat a test config section
def create_config_section(plugin_config: dict) -> dict:
valid_plugin = dict()
valid_plugin[random.choice(SUPPORTED_ENDPOINTS)] = plugin_config
config_section = copy.deepcopy(BASE_CONFIG_SECTION)
config_section[TEST_KEY].append(valid_plugin)
return config_section
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, can you explain what this is doing? especially the random.choice part please.
NIT: missing 'e' in "create" in the comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Data Prepper pipeline configuration file is composed of several "sections" - source and sink being the two familiar ones. Each section can be configured with one or more plugins. Every plugin has a name, and an associated plugin configuration.

This utility method helps in the creation of one of those "sections" for testing. The random.choice logic randomly picks a supported endpoint as the plugin name, and uses the supplied plugin_config argument as its plugin configuration value. Instead of source or sink, the section name uses the value of TEST_KEY

Fixed the typo as a part of the latest commit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to go ahead and merge this. @okhasawn if you have any other comments, please continue to provide them and i'll incorporate them in a follow-up PR.

…tInfo class

This change adds a class method to construct API endpoint paths instead of having callers compute this on their own. It also includes better/normalized handling of slashes in endpoint and path URIs. A minor refactoring of logging in migration_monitor.py is also included, which improves unit test code coverage.

Signed-off-by: Kartik Ganesh <[email protected]>
Also adding package dependencies on AWSSigV4 libraries

Signed-off-by: Kartik Ganesh <[email protected]>
@kartg kartg force-pushed the fetch-migration-sigv4-support branch from 10a43d3 to d684a10 Compare November 10, 2023 04:47
Signed-off-by: Kartik Ganesh <[email protected]>
@kartg kartg force-pushed the fetch-migration-sigv4-support branch from d684a10 to 98008f0 Compare November 10, 2023 05:36
@kartg kartg merged commit 8989f00 into opensearch-project:main Nov 10, 2023
7 of 8 checks passed
@kartg kartg deleted the fetch-migration-sigv4-support branch November 10, 2023 07:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants