-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
simplify SageMaker docs #408
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
||
### Upload data to S3 | ||
|
||
We offer the dataset for this demo in a public bucket hosted in either the [`us-east-1`](https://s3.console.aws.amazon.com/s3/buckets/sagemaker-rapids-hpo-us-east-1/) or [`us-west-2`](https://s3.console.aws.amazon.com/s3/buckets/sagemaker-rapids-hpo-us-west-2/) regions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just incorrect. Those buckets exist and data exists there, but not any datasets references in this document or the things it links to.
We can safely remove this... the Higgs notebook linked to in this doc handles downloading a dataset and putting it up on S3 itself.
deployment/source/examples/rapids-sagemaker-higgs/notebook.ipynb
Lines 92 to 106 in 3578ead
"source": [ | |
"!mkdir dataset\n", | |
"!wget -P dataset https://archive.ics.uci.edu/ml/machine-learning-databases/00280/HIGGS.csv.gz\n", | |
"!gunzip dataset/HIGGS.csv.gz" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": { | |
"tags": [] | |
}, | |
"outputs": [], | |
"source": [ | |
"s3_data_dir = session.upload_data(path=\"dataset\", key_prefix=\"dataset/higgs-dataset\")" |
@@ -353,7 +353,7 @@ | |||
} | |||
], | |||
"source": [ | |||
"!docker build -t {estimator_info['ecr_image']} --build-arg RAPIDS_IMAGE={{ rapids_container }} ." | |||
"!docker build -t {estimator_info['ecr_image']} --build-arg RAPIDS_IMAGE={estimator_info['rapids_container']} ." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{{ rapids_container }}
is already templated into this estiamtor_info
dictioanry a few lines up.
Just using a dictionary reference here instead of repeating the templating reduces the risk of inconsistencies, and makes interactive testing (where I found it convenient to git clone
the repo and manually replace these with different image tags) a bit easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great thanks @jameslamb
Contributes to #402.
Contributes to #382 (by removing an image with a reference to the no-longer-supported
rapidsai/core
Docker image).The main purpose of this PR is to reduce duplication in the SageMaker example at https://docs.rapids.ai/deployment/nightly/cloud/aws/sagemaker/. If you read that page from top to bottom, today it feels like 2 very-similar pages that were concatenated together. Details like starting a SageMaker Notebook instance and setting up a lifecycle configuration are repeated.
It also contains an incomplete example of how to use SageMaker Estimators. This proposes replacing that and instead encouraging the use of https://docs.rapids.ai/deployment/nightly/examples/rapids-sagemaker-higgs/notebook/, which contains an end-to-end complete example of using SageMaker Estimators.