Google Spreadsheet's metadata is being re-fetched too often #99

mephinet · 2024-01-04T11:39:16Z

gspread-pandas' Sheet class has implemented some logic to know when sheet metadata needs to be refetched - cf refresh_spread_metadata. However, the current implementation fails to make use of its potential. When accessing a multi-sheet spread using sheet_to_df, debug logs show that the spreadsheet's metadata endpoint is hit multiple times - in my example, 9 times to fetch 2 sheets.

I have created a standalone pytest that reproduces this behaviour: https://gist.github.com/mephinet/7415a4641a65fbf0d51f0eae11ee21dc
I failed to add it to the betamax'ed test suite because I ran into timeouts...

Tested with gspread-pandas version 3.2.3 on Arch Linux, as well as with the git master.

The text was updated successfully, but these errors were encountered:

mephinet · 2024-01-04T15:36:16Z

To show the potential: changing the Sheet.sheets decorator from @property to @cached_property reduces the requests from 9 down to 3:

one from open_spread -> open_by_url
one from open_spread -> refresh_metadata (which is unnecessary here)
and one from open_sheet (which is caused by accessing the sheets property)

Fixing this issue would improve the performance of gspread-pandas by a relevant factor as I'm currently seeing 429 responses caused by these metadata requests, causing a significant slowdown.

aiguofer · 2024-02-13T17:10:04Z

Hey, sorry I haven't had much time to maintain this and e-mails seem to get lost in my inbox. The reason I originally implemented all this refresh metadata logic is because there's a few methods that depend on available metadata. If we don't refresh it, the values will be wrong. For example, number of rows and cols and number of frozen rows and cols all depend on that metadata.

The best way to avoid these refreshes would be to parse the responses for the different methods that modify the sheet, extract the updated values, and update the metadata. Unfortunately, that is likely a pretty manual process since the response format depends on whether it's a batch operation or not.

I'm not sure if gspread v6 changes any of this, but unfortunately I don't have much time to look into this. any PRs would be welcome though!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Google Spreadsheet's metadata is being re-fetched too often #99

Google Spreadsheet's metadata is being re-fetched too often #99

mephinet commented Jan 4, 2024

mephinet commented Jan 4, 2024

aiguofer commented Feb 13, 2024

Google Spreadsheet's metadata is being re-fetched too often #99

Google Spreadsheet's metadata is being re-fetched too often #99

Comments

mephinet commented Jan 4, 2024

mephinet commented Jan 4, 2024

aiguofer commented Feb 13, 2024