You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
gspread-pandas' Sheet class has implemented some logic to know when sheet metadata needs to be refetched - cf refresh_spread_metadata. However, the current implementation fails to make use of its potential. When accessing a multi-sheet spread using sheet_to_df, debug logs show that the spreadsheet's metadata endpoint is hit multiple times - in my example, 9 times to fetch 2 sheets.
To show the potential: changing the Sheet.sheets decorator from @property to @cached_property reduces the requests from 9 down to 3:
one from open_spread -> open_by_url
one from open_spread -> refresh_metadata (which is unnecessary here)
and one from open_sheet (which is caused by accessing the sheets property)
Fixing this issue would improve the performance of gspread-pandas by a relevant factor as I'm currently seeing 429 responses caused by these metadata requests, causing a significant slowdown.
Hey, sorry I haven't had much time to maintain this and e-mails seem to get lost in my inbox. The reason I originally implemented all this refresh metadata logic is because there's a few methods that depend on available metadata. If we don't refresh it, the values will be wrong. For example, number of rows and cols and number of frozen rows and cols all depend on that metadata.
The best way to avoid these refreshes would be to parse the responses for the different methods that modify the sheet, extract the updated values, and update the metadata. Unfortunately, that is likely a pretty manual process since the response format depends on whether it's a batch operation or not.
I'm not sure if gspread v6 changes any of this, but unfortunately I don't have much time to look into this. any PRs would be welcome though!
gspread-pandas'
Sheet
class has implemented some logic to know when sheet metadata needs to be refetched - cfrefresh_spread_metadata
. However, the current implementation fails to make use of its potential. When accessing a multi-sheet spread usingsheet_to_df
, debug logs show that the spreadsheet's metadata endpoint is hit multiple times - in my example, 9 times to fetch 2 sheets.I have created a standalone pytest that reproduces this behaviour: https://gist.github.com/mephinet/7415a4641a65fbf0d51f0eae11ee21dc
I failed to add it to the betamax'ed test suite because I ran into timeouts...
Tested with gspread-pandas version 3.2.3 on Arch Linux, as well as with the git master.
The text was updated successfully, but these errors were encountered: