Storage service: refactor out the use of lsar in get_base_directory() #992
Labels
Needs sponsorship
Status: refining
The issue needs additional details to ensure that requirements are clear.
Please describe the problem you'd like to be solved
Currently the storage service uses lsar to find the base directory of a compressed AIP, code here: https://github.com/artefactual/archivematica-storage-service/blob/83d7b5a7da79c158cb99bd0e3426b92fdde0d3f0/storage_service/locations/models/package.py#L343-L361
This code is potentially inefficient – it uses more and more memory as the size of the AIP grows (both in the output from
lsar
, and the size of thedirectories
list). It might struggle on a very large AIP.Not a bug per se, but potentially room for improvement.
Describe the solution you'd like to see implemented
We know an AIP will only be compressed in a handful of formats (because Archivematica created them!). Use the Python standard library to open the file directly, and iterate over the members, rather than loading it all into a big JSON string.
Pseudo-code:
There's already some code to identify compression formats in
utils.py
.Describe alternatives you've considered
None.
Additional context
For Artefactual use:
Before you close this issue, you must check off the following:
The text was updated successfully, but these errors were encountered: