Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot use UPath on S3 with pandas: PermissionError/Access Denied #241

Open
ba1dr opened this issue Jul 23, 2024 · 3 comments
Open

Cannot use UPath on S3 with pandas: PermissionError/Access Denied #241

ba1dr opened this issue Jul 23, 2024 · 3 comments
Labels
bug 🐛 Something isn't working
Milestone

Comments

@ba1dr
Copy link

ba1dr commented Jul 23, 2024

import pandas as pd
from upath import UPath

AWS_KEY = "AKIAxxxxxxx"
AWS_SECRET = "xxxxxxxxxxxxxxx"

bucket = 'upathtest'
fkey = f"folder1/folder2/test1.xlsx"
s3base = UPath(f"s3://{bucket}", key=AWS_KEY, secret=AWS_SECRET)
s3path = s3base / fkey

print(list(s3base.iterdir()))      # THIS WORKS!
with s3path.open('w') as ff:
    ff.write("test1,test2")        # THIS WORKS EITHER!

df = pd.DataFrame()
df.to_excel(s3path)           # !! This fails
Traceback

Traceback (most recent call last):
  File "/mypath/venv/lib/python3.11/site-packages/s3fs/core.py", line 113, in _error_wrapper
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mypath/venv/lib/python3.11/site-packages/aiobotocore/client.py", line 411, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/mypath/try_fss.py", line 57, in <module>
    main()
  File "/mypath/try_fss.py", line 53, in main
    test03()
  File "/mypath/try_fss.py", line 47, in test03
    pd.read_csv(s3path)
  File "/mypath/venv/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
    return _read(filepath_or_buffer, kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mypath/venv/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 620, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mypath/venv/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1620, in __init__
    self._engine = self._make_engine(f, self.engine)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mypath/venv/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1880, in _make_engine
    self.handles = get_handle(
                   ^^^^^^^^^^^
  File "/mypath/venv/lib/python3.11/site-packages/pandas/io/common.py", line 728, in get_handle
    ioargs = _get_filepath_or_buffer(
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mypath/venv/lib/python3.11/site-packages/pandas/io/common.py", line 443, in _get_filepath_or_buffer
    ).open()
      ^^^^^^
  File "/mypath/venv/lib/python3.11/site-packages/fsspec/core.py", line 147, in open
    return self.__enter__()
           ^^^^^^^^^^^^^^^^
  File "/mypath/venv/lib/python3.11/site-packages/fsspec/core.py", line 105, in __enter__
    f = self.fs.open(self.path, mode=mode)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mypath/venv/lib/python3.11/site-packages/fsspec/spec.py", line 1303, in open
    f = self._open(
        ^^^^^^^^^^^
  File "/mypath/venv/lib/python3.11/site-packages/s3fs/core.py", line 689, in _open
    return S3File(
           ^^^^^^^
  File "/mypath/venv/lib/python3.11/site-packages/s3fs/core.py", line 2183, in __init__
    super().__init__(
  File "/mypath/venv/lib/python3.11/site-packages/fsspec/spec.py", line 1742, in __init__
    self.size = self.details["size"]
                ^^^^^^^^^^^^
  File "/mypath/venv/lib/python3.11/site-packages/fsspec/spec.py", line 1755, in details
    self._details = self.fs.info(self.path)
                    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/mypath/venv/lib/python3.11/site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mypath/venv/lib/python3.11/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/mypath/venv/lib/python3.11/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
                ^^^^^^^^^^
  File "/mypath/venv/lib/python3.11/site-packages/s3fs/core.py", line 1375, in _info
    out = await self._call_s3(
          ^^^^^^^^^^^^^^^^^^^^
  File "/mypath/venv/lib/python3.11/site-packages/s3fs/core.py", line 366, in _call_s3
    return await _error_wrapper(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/mypath/venv/lib/python3.11/site-packages/s3fs/core.py", line 145, in _error_wrapper
    raise err
PermissionError: Forbidden

I tried to use client_kwargs - this does not work either.

aioboto_client_kwargs = {
    'aws_access_key_id': AWS_KEY,
    'aws_secret_access_key': AWS_SECRET,
}
s3base = UPath(f"s3://{bucket}", client_kwargs=aioboto_client_kwargs)
...
# same error

AWS user has AmazonS3FullAccess policy attached.

@ap--
Copy link
Collaborator

ap-- commented Jul 23, 2024

Thank you for opening the issue.

The implementation in pandas.io.common of _get_filepath_or_buffer basically converts the provided UPath instance into a string and drops the storage_options.
This causes pandas to then try to interpret the returned s3 uri without the storage options.

The reason for this happening is that UPath incorrectly pretends to be local path, which is going to be fixed when we move the correct base class: PathBase which is not going to provide a __fspath__ dunder anymore for non-local paths.

In the future we could also try to add support for arbitrary PathBase subclasses in pandas. But at least for universal_pathlib the mentioned changes in UPath should happen first.

All that being said, you can either provide the buffer as you've done in the with context directly to .to_excel() or provide the storage_options explicitly as shown here:

import pandas as pd
from upath import UPath

pth = UPath(f"s3://some-bucket/some-file", key=..., secret=...)

df = pd.DataFrame()
df.to_excel(pth, storage_options=pth.storage_options)   

Let me know if that helps,
Andreas

@ap-- ap-- added the bug 🐛 Something isn't working label Jul 23, 2024
@ba1dr
Copy link
Author

ba1dr commented Jul 24, 2024

Thank you for the answer. However, this does not help much, as the idea was in simply replacing the Path objects to UPath, without changing it everywhere. I am refactoring a big piece of code and was hoping this could help to transparently work with any path objects.

@ap--
Copy link
Collaborator

ap-- commented Jul 24, 2024

Given the current implementation in pandas, and the current implementation in universal_pathlib, what you can do to achieve what you're asking for is to not provide credentials explicitly, but set the credentials via any of the supported methods for s3fs described here: https://s3fs.readthedocs.io/en/latest/#credentials

I also recommend to subscribe to #193 to be notified once work starts to move UPath to its correct base class available in future versions of stdlib pathlib (and backported in pathlib-abc)

@ap-- ap-- added this to the v0.3.0 milestone Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants