-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CORE-123: new file-pairing API for data uploader #1452
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. I just added a comment on implementing a paging solution to handle a large number of files.
src/main/scala/org/broadinstitute/dsde/firecloud/dataaccess/HttpGoogleServicesDAO.scala
Show resolved
Hide resolved
val fileList: List[GcsObjectName] = | ||
googleServicesDao.listBucket(workspaceBucket, Option(matchingOptions.prefix), recursive) | ||
|
||
logger.info(s"found ${fileList.length} files") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to place a limit on this based on the number of files returned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added in 9505b6e
Introduces a new API at
This API will:
The driver use case for this API is the "Data Uploader" in Terra UI, though we may find that scripters/notebook users also want to use the API.
I have tested this running locally against ~100,000 files in a bucket, and the file-matching portion of the algorithm executes in < 2 seconds. The end result is a 30MB TSV so the API is slow overall, but the size is unavoidable at that scale.