Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simplify SageMaker docs #408

Merged
merged 3 commits into from
Aug 8, 2024
Merged

Conversation

jameslamb
Copy link
Member

Contributes to #402.

Contributes to #382 (by removing an image with a reference to the no-longer-supported rapidsai/core Docker image).

The main purpose of this PR is to reduce duplication in the SageMaker example at https://docs.rapids.ai/deployment/nightly/cloud/aws/sagemaker/. If you read that page from top to bottom, today it feels like 2 very-similar pages that were concatenated together. Details like starting a SageMaker Notebook instance and setting up a lifecycle configuration are repeated.

It also contains an incomplete example of how to use SageMaker Estimators. This proposes replacing that and instead encouraging the use of https://docs.rapids.ai/deployment/nightly/examples/rapids-sagemaker-higgs/notebook/, which contains an end-to-end complete example of using SageMaker Estimators.

@jameslamb jameslamb added improvement Improves an existing functionality non-breaking Introduces a non-breaking change cloud/aws Amazon Web Service cloud labels Aug 7, 2024
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB


### Upload data to S3

We offer the dataset for this demo in a public bucket hosted in either the [`us-east-1`](https://s3.console.aws.amazon.com/s3/buckets/sagemaker-rapids-hpo-us-east-1/) or [`us-west-2`](https://s3.console.aws.amazon.com/s3/buckets/sagemaker-rapids-hpo-us-west-2/) regions
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just incorrect. Those buckets exist and data exists there, but not any datasets references in this document or the things it links to.

We can safely remove this... the Higgs notebook linked to in this doc handles downloading a dataset and putting it up on S3 itself.

"source": [
"!mkdir dataset\n",
"!wget -P dataset https://archive.ics.uci.edu/ml/machine-learning-databases/00280/HIGGS.csv.gz\n",
"!gunzip dataset/HIGGS.csv.gz"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"s3_data_dir = session.upload_data(path=\"dataset\", key_prefix=\"dataset/higgs-dataset\")"

@@ -353,7 +353,7 @@
}
],
"source": [
"!docker build -t {estimator_info['ecr_image']} --build-arg RAPIDS_IMAGE={{ rapids_container }} ."
"!docker build -t {estimator_info['ecr_image']} --build-arg RAPIDS_IMAGE={estimator_info['rapids_container']} ."
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{{ rapids_container }} is already templated into this estiamtor_info dictioanry a few lines up.

Just using a dictionary reference here instead of repeating the templating reduces the risk of inconsistencies, and makes interactive testing (where I found it convenient to git clone the repo and manually replace these with different image tags) a bit easier.

@jameslamb jameslamb mentioned this pull request Aug 7, 2024
Copy link
Member

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great thanks @jameslamb

@jacobtomlinson jacobtomlinson merged commit e03c138 into rapidsai:main Aug 8, 2024
3 checks passed
@jameslamb jameslamb deleted the sagemaker-example branch August 8, 2024 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cloud/aws Amazon Web Service cloud improvement Improves an existing functionality non-breaking Introduces a non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants