Parallel scan in Full Table strategy #46

mikeperello-scopely · 2022-07-15T07:50:13Z

Description of change

According to the AWS documentation here, there is a possibility to Scan a DynamoDB table in parallel. This is useful for large scans, as by default, the Scan operation returns data to the application in 1 MB increments.

Manual QA steps

In order to run the tap in parallel, we need to specify as environment variables, the following attributes:

parallel_segment: specify the segment ID.
parallel_totalsegments : specify the total number of segments.

Risks

Rollback steps

revert this branch

Additional info

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Scan.html#Scan.ParallelScan

the Scan operation can logically divide a table or secondary index into multiple segments, with multiple application workers scanning the segments in parallel. Each worker can be a thread.

In order to run a Parallel scan, we need to run multiple executions of the tap, each one with different parallel_segment attribute value. But the same parallel_totalsegments.
So for example:

Execution 1:
- parallel_segment = 0
- parallel_totalsegments = 2
Execution 2:
- parallel_segment = 1
- parallel_totalsegments = 2

⚠️The first parallel_segment must start at 0.

Add parallel segment and total number of segments to the scan.

singer-bot · 2022-07-15T07:50:15Z

Hi @mikeperello-scopely, thanks for your contribution!

In order for us to evaluate and accept your PR, we ask that you sign a contribution license agreement. It's all electronic and will take just minutes.

singer-bot · 2022-07-15T12:47:49Z

You did it @mikeperello-scopely!

Thank you for signing the Singer Contribution License Agreement.

Update full_table.py

42ad87e

Add parallel segment and total number of segments to the scan.

singer-bot added the cla-missing label Jul 15, 2022

singer-bot removed the cla-missing label Jul 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel scan in Full Table strategy #46

Parallel scan in Full Table strategy #46

mikeperello-scopely commented Jul 15, 2022

singer-bot commented Jul 15, 2022

singer-bot commented Jul 15, 2022

Parallel scan in Full Table strategy #46

Are you sure you want to change the base?

Parallel scan in Full Table strategy #46

Conversation

mikeperello-scopely commented Jul 15, 2022

Description of change

Manual QA steps

Risks

Rollback steps

Additional info

singer-bot commented Jul 15, 2022

singer-bot commented Jul 15, 2022