Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access Denied when IAM policy give access (Read/Write/Listing) to only a prefix area #847

Open
pdemarti opened this issue Feb 9, 2024 · 14 comments

Comments

@pdemarti
Copy link

pdemarti commented Feb 9, 2024

Say I have the following IAM policy for a user named "test-user" (as a Python dict instead of JSON, for concision):

{'Version': '2012-10-17',
 'Statement': [{'Effect': 'Allow',
   'Action': ['s3:PutObject', 's3:GetObject', 's3:DeleteObject'],
   'Resource': ['arn:aws:s3:::my_bucket/some/prefix/${aws:username}/*']},
  {'Effect': 'Allow',
   'Action': 's3:ListBucket',
   'Resource': 'arn:aws:s3:::my_bucket',
   'Condition': {'StringLike': {'s3:prefix': ['some/prefix/${aws:username}/*']}}}]}

This works just fine:

import boto3

prefix = 'some/prefix/test-user'
prefix_slash = f'{prefix}/'

# valid path
boto3.client('s3').list_objects_v2(Bucket='my_bucket', Prefix=prefix_slash)

# should be unreadable
try:
    boto3.client('s3').list_objects_v2(Bucket='my_bucket', Prefix=prefix)
    assert RuntimeError(f'{prefix} (no slash) was expected to fail')
except ClientError:
    pass

But the following code gives me Access Denied:

import s3fs

fs = s3fs.S3FileSystem(client_kwargs={'region_name': 'us-east-1'})

topdir = 'my_bucket/some/prefix/test-user'
fs.ls(f'{topdir}/')

If, however, I change the listing permission in IAM so the whole bucket can be listed:

[..., {'Effect': 'Allow',
   'Action': 's3:ListBucket',
   'Resource': 'arn:aws:s3:::my_bucket'}
]

Then it instantly works.

Another puzzling thing is, if I change the IAM permission back, the code above keeps working, even after starting a new Python interpreter.

Versions

import s3fs
import botocore

>>> {m.__name__: m.__version__ for m in [s3fs, botocore]}
{'s3fs': '2023.4.0', 'botocore': '1.29.76'}
@pdemarti
Copy link
Author

pdemarti commented Feb 9, 2024

The reason it is a problem is that I want to let a user upload data to "their" area, without being able to see other areas in the bucket.

E.g., this fails in s3fs/core.py:140:

import pandas

df = pd.DataFrame([[1,2,3],[4,5,6]], columns=list('abc'), index=['foo', 'bar'])
filename = 'fubar.csv'
s3loc = f's3://{topdir}/{filename}'
df.to_csv(s3loc)

@pdemarti
Copy link
Author

pdemarti commented Feb 9, 2024

I guess I can circumvent s3fs altogether:

with io.StringIO() as f:
    df.to_csv(f)
    boto3.resource('s3').Bucket(bucket_name).Object(f'{prefix}/{filename}').put(Body=f.getvalue())

@martindurant
Copy link
Member

It would be useful to see the exception: the calls in s3fs and the response message. You can also turn on logging for "s3fs" to see the chain of calls being made.

I notice that you provide a region for s3fs but not for boto3, which is a possible difference between the two.

if I change the IAM permission back, the code above keeps working, even after starting a new Python interpreter

This is indeed puzzling, suggests that something is being cached, perhaps even on AWS's side.

I can circumvent s3fs altogether

Of course, s3fs is only a convenience layer, so no one is required to use it :)

@pdemarti
Copy link
Author

As I bumped again against this today, I thought I would provide a bit more info.

First thing: the fs.ls(topdir) fails only if that prefix on S3 doesn't exist (i.e., it is "empty").
As soon as I put at least one file under topdir (however deep), then fs.ls() works.

The pandas operation, however, always fails.

Attached are (sanitized) log files and stack traces from 4 cases:

  1. using s3fs with non-empty topdir (no error, log;
  2. using s3fs with empty topdir (stacktrace, log);
  3. using pandas with non-empty topdir (stacktrace, log);
  4. using pandas with empty topdir (stacktrace, log).

@martindurant
Copy link
Member

Thanks for the info, @pdemarti . I'll try to decode it. This was generated using your code at the top of this thread?

@pdemarti
Copy link
Author

Thanks for the info, @pdemarti . I'll try to decode it. This was generated using your code at the top of this thread?

Not exactly (there is error capture, logger setup, and sanitizing of the stack traces and log files). But, basically, the calls to s3fs and pandas are exactly as per the code above. I also use os.environ['AWS_PROFILE'] = 'test-user', to make sure this gets through to all flavors of calls.

@martindurant
Copy link
Member

What version of fsspec do you have? The pandas route is trying to ensure that the "directory" exists and is writable, but it doesn't have the permissions (rightly) to check anything above the given path; this code allows to ignore PermissionError since 4 months ago.

@martindurant
Copy link
Member

using s3fs with empty topdir

I'm not sure what we should do about this error. When doing ls(path), there are three possibilities:

  • path is a directory (it contains stuff), and we have permission to list it, so all is OK
  • path is a regular file, which would be revealed by listing the directory ebove it
  • path doesn't exist.

It's checking the second possibility that is throwing the error. In fact, it should be FileNotFound, right?

@martindurant
Copy link
Member

The following would change ls(no-directory) to FileNotFound:

@@ -996,14 +996,17 @@ class S3FileSystem(AsyncFileSystem):
         else:
             files = await self._lsdir(path, refresh, versions=versions)
             if not files and "/" in path:
-                files = await self._lsdir(
-                    self._parent(path), refresh=refresh, versions=versions
-                )
-                files = [
-                    o
-                    for o in files
-                    if o["name"].rstrip("/") == path and o["type"] != "directory"
-                ]
+                try:
+                    files = await self._lsdir(
+                        self._parent(path), refresh=refresh, versions=versions
+                    )
+                    files = [
+                        o
+                        for o in files
+                        if o["name"].rstrip("/") == path and o["type"] != "directory"
+                    ]
+                except IOError:
+                    pass

@pdemarti
Copy link
Author

using s3fs with empty topdir

I'm not sure what we should do about this error. When doing ls(path), there are three possibilities:

* path is a directory (it contains stuff), and we have permission to list it, so all is OK

* path is a regular file, which would be revealed by listing the directory ebove it

* path doesn't exist.

It's checking the second possibility that is throwing the error. In fact, it should be FileNotFound, right?

One issue with this is that if IAM only allows read access (listing and get object) to s3://my_bucket/prefix/*, then listing s3://my_bucket/prefix/ is allowed, but s3://my_bucket/prefix (no ending slash) is not. Thus, doing an fs.ls('bucket/key') should probably do:

  1. if key ends with '/': list objects with Prefix=f'{key}';
  2. otherwise:
    a. if key exists (as an Object), then that's the result of the listing;
    b. else (or if getting PermissionDenied): try listing `f'{key}/'.

Also, S3 being an object store, "directories" are of course just an illusion; they are "auto-created". For example, writing an object s3://my_bucket/foo/bar/x/y/z/tada.parquet is doable (if IAM-permitted) even if, prior to that write, there is no other object sharing the same prefix. The bucket could even be entirely empty. In this context, a non-POSIX view that would help is that any prefix ending with slash unconditionally exists. It might be "empty" (if there are no objects with a key starting with that prefix), but in any case, the prefix can be used to immediately put a new object "under it". No mkdir necessary.

To summarize, I believe the following should help with both scenarios in this issue (fs.ls() and pd.to_parquet()):

Assuming IAM read/write/list access to s3://bucket/some/prefix/*:

  • fs.ls('bucket/'): PermissionDenied;
  • fs.ls('bucket/some/'): PermissionDenied;
  • fs.ls('bucket/some/prefix'): PermissionDenied;
  • fs.ls('bucket/some/prefix/'): allowed and always found: either empty if there are no objects with that prefix, or the objects and/or dirs of these objects.
  • open for write (or upload object) to 'bucket/some/filename': PermissionDenied;
  • open for write (or upload object) to 'bucket/some/prefix/filename': success, whether other objects with prefix 'bucket/some/prefix/' exist or not; no listing of parent dirs required.

@pdemarti
Copy link
Author

pdemarti commented Mar 14, 2024

What version of fsspec do you have? The pandas route is trying to ensure that the "directory" exists and is writable, but it doesn't have the permissions (rightly) to check anything above the given path; this code allows to ignore PermissionError since 4 months ago.

Aha! First, apologies: I was using an old version (2023.4.0). I redid the checks with the latest from the default miniconda3 channel: 2023.10.0. It has the same behavior as described above.

However, that version predates the fix you mention. And indeed, I just tested with the latest version on conda-forge:

{'fsspec': '2024.2.0', 's3fs': '2024.2.0'}

And we get, for the 4 cases described earlier:

  • no error using s3fs with non-empty topdir;
  • still fails using s3fs with empty topdir (stacktrace, log);
  • no error using pandas with non-empty topdir;
  • no error using pandas with empty topdir.

So, I think the only thing to consider would be to successfully return an empty list when doing a fs.ls() of a prefix that doesn't exist (same spirit as auto_mkdir, I suppose). Essentially, similar to:

bkt = boto3.resource('s3').Bucket(bucket_name)
>>> list(bkt.objects.filter(Prefix='some/prefix/test-user/'))
[]

@martindurant
Copy link
Member

fs.ls('bucket/some/prefix/'): allowed and always found: either empty if there are no objects with that prefix, or the objects and/or dirs of these objects.

Of course, there is no way to tell the difference between not finding a path and a "virtual empty" directory - otherwise all possible directories always exist, and that seems wrong.

open for write (or upload object) to 'bucket/some/prefix/filename': success, whether other objects with prefix 'bucket/some/prefix/' exist or not; no listing of parent dirs required.

This is the specific case in your workflow that was not working as expected; but this has nothing to do with pandas writing.

@pdemarti
Copy link
Author

fs.ls('bucket/some/prefix/'): allowed and always found: either empty if there are no objects with that prefix, or the objects and/or dirs of these objects.

Of course, there is no way to tell the difference between not finding a path and a "virtual empty" directory - otherwise all possible directories always exist, and that seems wrong.

Yes, this is somewhat subjective, but I understand your reticence. That said, not all possible directories would always exist: if a path exists as an object, then it is a "file", not a "directory", and, although S3 allows it, it feels wrong to create objects under path/....

In the other cases, is there any actionable value in raising FileNotFound for a prefix that doesn't exist, since there is no mkdir for S3?

In any case, FileNotFound would still be preferable to PermissionDenied in that case.

@martindurant
Copy link
Member

it feels wrong to create objects under path/...

Oh, but people do, trust me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants