Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDW on Public cloud requires ability to copy data between S3 buckets via aws cli #32

Open
hpasumarthi opened this issue Apr 17, 2023 · 1 comment

Comments

@hpasumarthi
Copy link

hpasumarthi commented Apr 17, 2023

Hello Team,
In CDW Public cloud on AWS data is on S3 buckets. Copying data via hadoop cli or distcp is not possible for PC environments because we do not have hadoop clusters.
Can we enhance hms-mirror to use aws cli commands to copy data between left and right table locations i.e S3 buckets.

e.g aws s3 cp s3://DOC-EXAMPLE-BUCKET-SOURCE s3://DOC-EXAMPLE-BUCKET-TARGET
or

e.g aws s3 sync s3://DOC-EXAMPLE-BUCKET-SOURCE s3://DOC-EXAMPLE-BUCKET-TARGET

https://repost.aws/knowledge-center/move-objects-s3-bucket

Expectation is instead of running distcp commands, hms-mirror will use aws cli to copy data from left to right.
Regards,
Hemanth

@hpasumarthi
Copy link
Author

hpasumarthi commented Apr 18, 2023

Came up with small script which can be used to convert distcp into aws cli

if [ -f "$1" ]; then
    echo "##Working on file : $1"
else 
    echo "##File in the path $1 does not exist."
    exit
fi

echo "##Run set/export AWS_DEFAULT_PROFILE=sso.dev before running commands below"
grep 's3a://' $1 |sed 's/s3a:/s3:/g'| while read line 
do
   location_right=`echo $line |cut -d '|' -f3| xargs`
   echo $line|cut -d '|' -f4|sed 's/<br>/\n/g' | while read location_left
   do 
     tbl_name="${location_left##*/}"
     if [[ "$location_left" =~ .*"s3://".* ]]; then
        echo "aws s3 sync $location_left $location_right/$tbl_name"
     fi
   done
done

Running the script will print distcp locations as aws commands

% sh distcp_awscli.sh testdev_airlines_RIGHT_distcp_workbook.md

##Working on file : testdev_airlines_RIGHT_distcp_workbook.md
##Run set/export AWS_DEFAULT_PROFILE=sso.dev before running commands below
aws s3 sync s3://ps-uat2/testdev-iceberg/airlines-iceberg/flights s3://ps-uat22/testdev-iceberg/airlines-iceberg/flights
aws s3 sync s3://ps-uat2/testdev-iceberg/airlines-iceberg/flights_iceberg s3://ps-uat22/testdev-iceberg/airlines-iceberg/flights_iceberg
aws s3 sync s3://ps-uat2/testdev-iceberg/airlines-iceberg/planes s3://ps-uat22/testdev-iceberg/airlines-iceberg/planes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant