-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
s3fs.copy
fails when using endpoint_url=endpoint_url
for s3fs.S3FileSystem(...)
#824
Comments
Can you please include the traceback, so I can see which branch within cp_file is failing and where? |
Thanks @martindurant for taking a look into this! Here my traceback and some insights from debugging: traceback:---------------------------------------------------------------------------
NoSuchBucket Traceback (most recent call last)
File ~/micromamba/envs/s3fs-test/lib/python3.11/site-packages/s3fs/core.py:112, in _error_wrapper(func, args, kwargs, retries)
111 try:
--> 112 return await func(*args, **kwargs)
113 except S3_RETRYABLE_ERRORS as e:
File ~/micromamba/envs/s3fs-test/lib/python3.11/site-packages/aiobotocore/client.py:383, in AioBaseClient._make_api_call(self, operation_name, api_params)
382 error_class = self.exceptions.from_code(error_code)
--> 383 raise error_class(parsed_response, operation_name)
384 else:
NoSuchBucket: An error occurred (NoSuchBucket) when calling the CopyObject operation: The specified bucket does not exist
The above exception was the direct cause of the following exception:
FileNotFoundError Traceback (most recent call last)
/Users/robinholzinger/robin/test/s3fs_test/debug.ipynb Cell 2 line 1
12 dst_path = f'{base_path}{dummyfile_name}_dst'
14 print(fs.read_text(src_path, encoding='utf-8'))
---> 15 fs.copy(src_path, dst_path)
16 # print(fs.read_text(dst_path, encoding='utf-8'))
17 # fs.rm_file(dst_path)
File ~/micromamba/envs/s3fs-test/lib/python3.11/site-packages/fsspec/asyn.py:118, in sync_wrapper.<locals>.wrapper(*args, **kwargs)
115 @functools.wraps(func)
116 def wrapper(*args, **kwargs):
117 self = obj or args[0]
--> 118 return sync(self.loop, func, *args, **kwargs)
File ~/micromamba/envs/s3fs-test/lib/python3.11/site-packages/fsspec/asyn.py:103, in sync(loop, func, timeout, *args, **kwargs)
101 raise FSTimeoutError from return_result
102 elif isinstance(return_result, BaseException):
--> 103 raise return_result
104 else:
105 return return_result
File ~/micromamba/envs/s3fs-test/lib/python3.11/site-packages/fsspec/asyn.py:56, in _runner(event, coro, result, timeout)
54 coro = asyncio.wait_for(coro, timeout=timeout)
55 try:
---> 56 result[0] = await coro
57 except Exception as ex:
58 result[0] = ex
File ~/micromamba/envs/s3fs-test/lib/python3.11/site-packages/fsspec/asyn.py:390, in AsyncFileSystem._copy(self, path1, path2, recursive, on_error, maxdepth, batch_size, **kwargs)
388 if on_error == "ignore" and isinstance(ex, FileNotFoundError):
389 continue
--> 390 raise ex
File ~/micromamba/envs/s3fs-test/lib/python3.11/asyncio/tasks.py:452, in wait_for(fut, timeout)
449 loop = events.get_running_loop()
451 if timeout is None:
--> 452 return await fut
454 if timeout <= 0:
455 fut = ensure_future(fut, loop=loop)
File ~/micromamba/envs/s3fs-test/lib/python3.11/site-packages/s3fs/core.py:1745, in S3FileSystem._cp_file(self, path1, path2, preserve_etag, **kwargs)
1740 await self._copy_etag_preserved(
1741 path1, path2, size, total_parts=int(parts_suffix)
1742 )
1743 elif size <= MANAGED_COPY_THRESHOLD:
1744 # simple copy allowed for <5GB
-> 1745 await self._copy_basic(path1, path2, **kwargs)
1746 else:
1747 # if the preserve_etag is true, either the file is uploaded
1748 # on multiple parts or the size is lower than 5GB
1749 assert not preserve_etag
File ~/micromamba/envs/s3fs-test/lib/python3.11/site-packages/s3fs/core.py:1626, in S3FileSystem._copy_basic(self, path1, path2, **kwargs)
1624 if ver1:
1625 copy_src["VersionId"] = ver1
-> 1626 await self._call_s3(
1627 "copy_object", kwargs, Bucket=buc2, Key=key2, CopySource=copy_src
1628 )
1629 except ClientError as e:
1630 raise translate_boto_error(e)
File ~/micromamba/envs/s3fs-test/lib/python3.11/site-packages/s3fs/core.py:339, in S3FileSystem._call_s3(self, method, *akwarglist, **kwargs)
337 logger.debug("CALL: %s - %s - %s", method.__name__, akwarglist, kw2)
338 additional_kwargs = self._get_s3_method_kwargs(method, *akwarglist, **kwargs)
--> 339 return await _error_wrapper(
340 method, kwargs=additional_kwargs, retries=self.retries
341 )
File ~/micromamba/envs/s3fs-test/lib/python3.11/site-packages/s3fs/core.py:139, in _error_wrapper(func, args, kwargs, retries)
137 err = e
138 err = translate_boto_error(err)
--> 139 raise err
FileNotFoundError: The specified bucket does not exist Here some highlighted snippets from the stack trace with variable values. # core.py:1742
async def _cp_file(...):
# 'channel1/subdir/dummy_file' = self._strip_protocol('channel1/subdir/dummy_file')
path1 = self._strip_protocol(path1)
# `channel1`, `subdir/dummy_file`, None = self.split_path('channel1/subdir/dummy_file')
# NOTE: split is already wrong semantically, because path does not contain bucket_name,
# this does cause the error though
bucket, key, vers = self.split_path(path1)
...
# await self._copy_basic('channel1/subdir/dummy_file', 'channel1/subdir/dummy_file_dst', **{})
# NOTE: issue has not propagated to this point
await self._copy_basic(path1, path2, **kwargs) # core.py:1613
async def _copy_basic(self, path1, path2, **kwargs):
# `channel1`, `subdir/dummy_file`, None = self.split_path('channel1/subdir/dummy_file')
buc1, key1, ver1 = self.split_path(path1)
# `channel1`, `subdir/dummy_file_dst`, None = self.split_path('channel1/subdir/dummy_file_dst')
buc2, key2, ver2 = self.split_path(path2)
# NOTE: here the wrong splits seem to have more severe consequences
...
# {"Bucket": `channel1`, "Key": `subdir/dummy_file`}
copy_src = {"Bucket": buc1, "Key": key1}
# await self._call_s3("copy_object", kwargs, Bucket=`channel1`, Key=`subdir/dummy_file_dst`, CopySource={"Bucket": `channel1`, "Key": `subdir/dummy_file`})
await self._call_s3("copy_object", kwargs, Bucket=buc2, Key=key2, CopySource=copy_src)
... Other methods like fs.rm_file('channel1/subdir/testfile')
# core.py:1803
async def _rm_file(self, path, **kwargs):
# `channel1`, `subdir/testfile`, None = self.split_path('channel1/subdir/testfile')
bucket, key, _ = self.split_path(path)
# await self._call_s3("delete_object", Bucket=`channel1`, Key=`subdir/testfile`)
await self._call_s3("delete_object", Bucket=bucket, Key=key)
once we patch # await self._call_s3("copy_object", kwargs, Bucket=`channel1``, Key=`subdir/dummy_file_dst``, CopySource={"Bucket": `rh-devbox``, "Key": `channel1/subdir/dummy_file``})
await self._call_s3("copy_object", kwargs, Bucket=buc2, Key=key2, CopySource=copy_src) |
Sorry, I'm not immediately seeing what you mean by "bad splits" |
When calling |
I suppose we expect non-regionalised endpoints in this config; the |
@martindurant Thanks for the input, your remark about the non-regionalised endpoints lead me to inspect my It seems that not including the bucket name ( Overall I was a bit unlucky that the previous The docs seem to be a bit outdated concerning the |
The docs are talking about endpoints to non-AWS services, which don't seem to have this complication. If you want to add some clarifying text there about your situation, that would be appreciated! |
Working with the
move
andcopy
functions ofs3fs
I encountered the problem that the current implementation seems to cause issues when specifying anendpoint_url=endpoint_url
in the constructor ofs3fs.S3FileSystem
.(V1) Without
endpoint_url
all functions (read, write, move, copy) work as expected when specifying paths with a<bucket_name>/
prefix.(V2) When using
endpoint_url
the read, write, remove, ... options still work without issues, however,copy
(and thereforemove
) failsFileNotFoundError: The specified bucket does not exist
Examples (setup)
Example 1 - without
endpoint_url
(fully working)Example 2: with
endpoint_url
(copy not working)Error:
The text was updated successfully, but these errors were encountered: