-
Notifications
You must be signed in to change notification settings - Fork 533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Jobs] Allowing to specify intermediate bucket for file upload #4257
base: master
Are you sure you want to change the base?
[Jobs] Allowing to specify intermediate bucket for file upload #4257
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @zpoint! Left some comments about the design, correctness and lifecycle management for the bucket contents.
sky/utils/schemas.py
Outdated
jobs_configs['properties']['bucket'] = { | ||
'type': 'string', | ||
'pattern': '^(https|s3|gs|r2|cos)://.+', | ||
'required': [] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should include it in controller_resources_schema
and enable this feature for SkyServe as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add this to our docs https://github.com/skypilot-org/skypilot/blob/master/docs/source/reference/config.rst?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adde to controller_resources_schema
Could u point me out the code to change if I need to enable for SkyServe
?
Thanks @zpoint! I'm trying to run this yaml:
The task output should show me contents of my workdir and file_mount. Instead, I get:
Related to this: #4257 (comment)? |
sky/data/storage.py
Outdated
is_sky_managed: Optional[bool] = None, | ||
sync_on_reconstruction: Optional[bool] = True, | ||
_bucket_sub_path: Optional[str] = None, # pylint: disable=invalid-name | ||
_allow_bucket_creation: bool = True): # pylint: disable=invalid-name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this _allow_bucket_creation
arg needed?
Can we instead check store.is_sky_managed field after the store's initialization to check if bucket was created, and if it was created then raise an error?
Eg. near L727 in controller_utils.py, we can initialize the store object and check is_sky_managed, and raise error appropriately.
store = store_cls(name=bucket_name,
source=workdir,
_bucket_sub_path=bucket_sub_path,
**store_kwargs)
if store.is_sky_managed:
# Bucket was created, this should not happen since the bucket already exists
# cleanup and raise error
store.delete()
raise ...
Let me know if I'm missing something. It would be great if we could avoid adding new arguments to the Store/Storage interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do that in controller_utils.py
. This avoids adding a parameter but keeps the logic outside of the store. If we're not going to reuse the limitation of creating a bucket, it will be fine to handle it in controller_utils.py
@romilbhardwaj Review reminder 😊 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @zpoint! Tested functionality - works nicely. Left some comments on the code.
sky/utils/controller_utils.py
Outdated
f'Configured to use a non-existent bucket: {store.name}, ' | ||
'please check the jobs->bucket in your sky config.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f'Configured to use a non-existent bucket: {store.name}, ' | |
'please check the jobs->bucket in your sky config.') | |
f'Jobs bucket {store.name!r} does not exist. ' | |
'Please check jobs.bucket configuration in your SkyPilot config.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, the logging shown above is quite confusing ("Created S3 bucket", "Deleted S3 bucket"). Is there a way to hide it / different way to check if the bucket already exists?
I 12-23 14:28:46 controller_utils.py:691] ⚙︎ Translating workdir to SkyPilot Storage...
I 12-23 14:28:47 storage.py:1590] Created S3 bucket 'romil-jobs-bucket2' in us-east-1
I 12-23 14:28:54 storage.py:1386] Deleted S3 bucket romil-jobs-bucket2.
sky.exceptions.StorageBucketCreateError: Configured to use a non-existent bucket: romil-jobs-bucket2, please check the jobs->bucket in your sky config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Find a way to hide the above logs by using sky_logging.silent()
⚙︎ Translating workdir and file_mounts with local source paths to SkyPilot Storage...
sky.exceptions.StorageBucketCreateError: Jobs bucket 'zpointggg-bucket-s3-intermediate' does not exist. Please check jobs.bucket configuration in your SkyPilot config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @zpoint! lgtm, we can merge this after storage related smoke tests pass.
Feature for #3978
jobs: bucket
in~/.sky/config.yaml
.s3, gcs, azure, r2, ibm
skypilot-filemounts-{username}-{run_id}
._bucket_sub_path
(Support bothCOPY
andMOUNT
based on the_bucket_sub_path
, The user will only see the subpath instead of the whole bucket if a subpath is specified)_
means internal use only, remove_
to expose in the futurepytest -s tests/test_smoke.py::test_managed_jobs_intermediate_storage
pytest -s tests/test_smoke.py::TestStorageWithCredentials::test_bucket_sub_path --gcp
_bucket_sub_path
works as expected.For managed jobs:
workdir -> {tmp-bucket-A}/workdir
file_mount -> {tmp-bucket-B}{i}
tmp_files -> {tmp-bucket-C}/tmp-files
workdir -> {intermediate-bucket}/job-{run_id}/workdir
file_mount -> {intermediate-bucket}/job-{run_id}/local-file-mounts/{i}
tmp_files -> {intermediate-bucket}/job-{run_id}/tmp-files
{intermediate-bucket}/job-{run_id}
bucket will be deleted automatically as before.Even using the same bucket, each job creates its own subdirectory based on the '{run-id}', and subdirectories of auto-created files will be cleaned up after the job is finished.
This allows users who don't have permission to create a bucket to manually specify a bucket name under
~/.sky/config.yaml
.Test plan
smoke test
custom test
(sky) ➜ cat ~/.sky/config.yaml jobs: bucket: s3://zpoint-bucket-s3/
Launch
Looks good
Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
conda deactivate; bash -i tests/backward_compatibility_tests.sh