Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trigger a warning when trying to append to a file on google cloud storage #533

Open
Courvoisier13 opened this issue Mar 8, 2023 · 4 comments

Comments

@Courvoisier13
Copy link

When trying to append to a file on GCS using pandas, the file was being overwritten. That is because GCS files are immutable. But there was no warning and it took a while to debug.

The mode = 'a' in pandas.to_csv is passed on to fsspec.open. It would be nice if we have a warning or error in this case.
See pandas-dev/pandas#51821

import pandas as pd
df = pd.DataFrame({
    'account-start': ['2017-02-03', '2017-03-03', '2017-01-01'],
    'client': ['Alice Anders', 'Bob Baker', 'Charlie Chaplin'],
    'balance': [-1432.32, 10.43, 30000.00],
    'db-id': [1234, 2424, 251],
    'proxy-id': [525, 1525, 2542],
    'rank': [52, 525, 32],
    ...
})
header = True
to_csv_mode = 'w'
with pd.read_csv(gs_path, chunksize=1) as reader:
    for r in reader:
        r.to_csv(temp_gs_path, index=False, header=header, mode=to_csv_mode)
        header = False
        to_csv_mode = 'a'
@martindurant martindurant transferred this issue from fsspec/filesystem_spec Mar 8, 2023
@martindurant
Copy link
Member

(transferred to gcsfs)

@martindurant
Copy link
Member

Append in GCS is possible, via compose (currently used by the merge() method), where we upload the extra data to some scratch file, and when close/committing, do a merge to rewrite the final destination. S3fs does something similar, except that in that case there is no need to write the new data to an actual key which then needs to be removed when done.

@Courvoisier13 , are you interested in working on this?

@Courvoisier13
Copy link
Author

I've built indeed my own solution using compose. I'm a bit swamped right now. Maybe I can come back when i have more time.

@martindurant
Copy link
Member

It would be very helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants