Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix storage uploads: use preferred chunk size #1743

Merged
merged 5 commits into from
Jan 19, 2024
Merged

Conversation

lferran
Copy link
Contributor

@lferran lferran commented Jan 19, 2024

Description

Problem:

  • GCS/S3 storages chunked uploads need to have chunks of a certain size
  • The size of the upload needs to be provided at the last chunk

We were not hitting this before because for nuclia-hosted nucliadb, it is NUA processing who pushes the files to the bucket already in the right key.

However, for NucliaDB Onprem, processed files need to be downloaded from NUA processing api and uploaded to NucliaDB's blob storage.
So far we had never hit this issue because we were using always PG and local file backends, which have no restrictions on chunk sizes.

Right now, when uploading to the GCS bucket we get these errors:

Giving up _append(...) after 4 tries (nucliadb_utils.storages.gcs.GoogleCloudException: 400: Invalid request.  The number of bytes uploaded is required to be equal or greater than 262144, except for the final request (it's recommended to be the exact multiple of 262144).  The received request contained 5813 bytes, which does not meet this requirement.)

This is something that I had fixed for the export/import, but it is hitting us now OnPrem when ingesting messages from NUA.

How was this PR tested?

Unit tests & local tests

@lferran lferran requested a review from a team January 19, 2024 12:27
Copy link

codecov bot commented Jan 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (c3f6c03) 82.20% compared to head (b368cf2) 82.22%.
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1743      +/-   ##
==========================================
+ Coverage   82.20%   82.22%   +0.01%     
==========================================
  Files         336      336              
  Lines       19609    19608       -1     
==========================================
+ Hits        16119    16122       +3     
+ Misses       3490     3486       -4     
Flag Coverage Δ
ingest 69.05% <ø> (ø)
nucliadb 70.35% <100.00%> (+<0.01%) ⬆️
reader 79.41% <100.00%> (ø)
sdk 40.58% <ø> (ø)
search 79.04% <ø> (ø)
standalone 88.29% <ø> (ø)
train 63.71% <ø> (ø)
utils 81.41% <100.00%> (+0.16%) ⬆️
writer 85.15% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lferran lferran merged commit 4950044 into main Jan 19, 2024
83 checks passed
@lferran lferran deleted the fix-storage-uploads branch January 19, 2024 20:52
@lferran
Copy link
Contributor Author

lferran commented Jan 22, 2024

[sc-8583]

Copy link

This pull request has been linked to Shortcut Story #8583: Fix GCS upload bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants