Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel scan in Full Table strategy #46

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mikeperello-scopely
Copy link

Description of change

According to the AWS documentation here, there is a possibility to Scan a DynamoDB table in parallel. This is useful for large scans, as by default, the Scan operation returns data to the application in 1 MB increments.

Manual QA steps

In order to run the tap in parallel, we need to specify as environment variables, the following attributes:

  • parallel_segment: specify the segment ID.
  • parallel_totalsegments : specify the total number of segments.

Risks

Rollback steps

  • revert this branch

Additional info

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Scan.html#Scan.ParallelScan

the Scan operation can logically divide a table or secondary index into multiple segments, with multiple application workers scanning the segments in parallel. Each worker can be a thread.

In order to run a Parallel scan, we need to run multiple executions of the tap, each one with different parallel_segment attribute value. But the same parallel_totalsegments.
So for example:

  • Execution 1:

    • parallel_segment = 0
    • parallel_totalsegments = 2
  • Execution 2:

    • parallel_segment = 1
    • parallel_totalsegments = 2

⚠️The first parallel_segment must start at 0.

Add parallel segment and total number of segments to the scan.
@singer-bot
Copy link

Hi @mikeperello-scopely, thanks for your contribution!

In order for us to evaluate and accept your PR, we ask that you sign a contribution license agreement. It's all electronic and will take just minutes.

@singer-bot
Copy link

You did it @mikeperello-scopely!

Thank you for signing the Singer Contribution License Agreement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants