Skip to content

S3 fixes for non AWS instances#12454

Open
qqmyers wants to merge 5 commits into
IQSS:developfrom
QualitativeDataRepository:S3FixesForB2StorJ
Open

S3 fixes for non AWS instances#12454
qqmyers wants to merge 5 commits into
IQSS:developfrom
QualitativeDataRepository:S3FixesForB2StorJ

Conversation

@qqmyers

@qqmyers qqmyers commented Jun 11, 2026

Copy link
Copy Markdown
Member

What this PR does / why we need it: QDR has used storJ and is now switching to use BackBlaze B2.

This PR fixes a critical issue we found using B2 which may be relevant for other stores that do not support tagging (which we use to tag files as temporary until the upload completes). In short, while the .disable-tagging=true setting stops Dataverse from writing a tag, it still tried to delete the tag even with that flag (resulting in B2 deleting the file :-)). The change here only tries to delete the tag if we set it/disable-tagging is false/not set. (This appears to be the only change needed for B2 - I've added it to the list of compatible S3 implementations.)

For storJ, which works fine, we noticed that the /cleanStorage APi call to delete unnecessary files (i.e. upload attempts where the file was sent to S3 and the call to addFiles to Dataverse didn't happen/failed) fails when there are many files in the dataset (probably > 1000). This PR updates the code to send a maxKeys(1000) when any list of objects is done - apparently without this storJ doesn't send a continuation token.

With the fix for storJ and /cleanStorage, the PR refactors the code to list objects in S3 to reduce duplicate/unnecessary code.

Which issue(s) this PR closes:

  • Closes #

Special notes for your reviewer:

Suggestions on how to test this: The changes have been tested at QDR. If more than regression testing is desired, one can get a free B2 account (up to 10GB) for testing and set up a B2 store - should work in creating datasets, etc. as any other S3 store. For storJ, the problem can only be seen with more than 1K files and only with the /cleanStorage api call - happy to try and provide details if someone wants to recreate the issue.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

@qqmyers qqmyers added the GDCC: QDR of interest to QDR label Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

GDCC: QDR of interest to QDR

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant