Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem: reingest failing on 1.9.1 #594

Closed
sromkey opened this issue Mar 21, 2019 · 8 comments
Closed

Problem: reingest failing on 1.9.1 #594

sromkey opened this issue Mar 21, 2019 · 8 comments
Assignees
Labels
Severity: critical A high-priority situation that is causing the software to fail catastrophically, making it unusable.
Milestone

Comments

@sromkey
Copy link
Contributor

sromkey commented Mar 21, 2019

Expected behaviour
Reingest should succeed.

Current behaviour
It is not. The error in the UI just says "Error re-ingesting package: An unknown error occurred."

Error in the SS logs:

Traceback (most recent call last):
  File "/usr/share/archivematica/virtualenvs/archivematica-storage-service/local/lib/python2.7/site-packages/tastypie/resources.py", line 220, in wrapper
    response = callback(request, *args, **kwargs)
  File "/usr/lib/archivematica/storage-service/locations/api/resources.py", line 99, in wrapper
    result = func(resource, request, bundle, **kwargs)
  File "/usr/lib/archivematica/storage-service/locations/api/resources.py", line 1090, in reingest_request
    processing_config)
  File "/usr/lib/archivematica/storage-service/locations/models/package.py", line 1909, in start_reingest
    local_path, temp_dir = self.extract_file()
  File "/usr/lib/archivematica/storage-service/locations/models/package.py", line 1438, in extract_file
    rc = subprocess.check_output(command)
  File "/usr/share/archivematica/virtualenvs/archivematica-storage-service/local/lib/python2.7/site-packages/gevent/subprocess.py", line 349, in check_output
    raise CalledProcessError(retcode, process.args, output=output)
CalledProcessError: Command '['unar', '-force-overwrite', '-o', u'/var/archivematica/storage_service/tmpqnXa7R', u'/var/archivematica/sharedDirectory/www/AIPsStore/c18b/11f4/4900/4c23/93f5/8fa2/b2e8/b491/smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491.7z']' returned non-zero exit status 1```

When trying to run the unar command by hand:

```root@ss191bionic:/var/log/archivematica/storage-service# unar -force-overwrite -o /var/archivematica/storage_service/tmpqnXa7R /var/archivematica/sharedDirectory/www/AIPsStore/c18b/11f4/4900/4c23/93f5/8fa2/b2e8/b491/smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491.7z
/var/archivematica/sharedDirectory/www/AIPsStore/c18b/11f4/4900/4c23/93f5/8fa2/b2e8/b491/smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491.7z: 7-Zip
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/  (dir)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/  (dir)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/logs/  (dir)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/logs/transfers/  (dir)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/logs/transfers/smoke_test-a138ef57-41d5-429e-b24e-11208939a806/  (dir)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/logs/transfers/smoke_test-a138ef57-41d5-429e-b24e-11208939a806/logs/  (dir)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/objects/  (dir)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/objects/metadata/  (dir)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/objects/metadata/transfers/  (dir)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/objects/metadata/transfers/smoke_test-a138ef57-41d5-429e-b24e-11208939a806/  (dir)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/objects/submissionDocumentation/  (dir)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/objects/submissionDocumentation/transfer-smoke_test-a138ef57-41d5-429e-b24e-11208939a806/  (dir)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/thumbnails/  (dir)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/bag-info.txt  (71 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/bagit.txt  (55 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/METS.c18b11f4-4900-4c23-93f5-8fa2b2e8b491.xml  (404026 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/README.html  (7859 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/logs/fileFormatIdentification.log  (180 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/logs/filenameCleanup.log  (8618 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/logs/transfers/smoke_test-a138ef57-41d5-429e-b24e-11208939a806/logs/checksum.md5-Check-20190321T162646Z  (784 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/logs/transfers/smoke_test-a138ef57-41d5-429e-b24e-11208939a806/logs/checksum.sha1-Check-20190321T162646Z  (784 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/logs/transfers/smoke_test-a138ef57-41d5-429e-b24e-11208939a806/logs/checksum.sha256-Check-20190321T162647Z  (784 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/logs/transfers/smoke_test-a138ef57-41d5-429e-b24e-11208939a806/logs/fileFormatIdentification.log  (3134 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/logs/transfers/smoke_test-a138ef57-41d5-429e-b24e-11208939a806/logs/filenameCleanup.log  (824 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/objects/View_from_lookout_over_Queenstown_towards_the_Remarkables_in_spring-deb1de24-af9f-4c3d-be9f-281db42548da.tif  (47941948 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/objects/View_from_lookout_over_Queenstown_towards_the_Remarkables_in_spring.jpg  (6271469 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/objects/beihai-881a8ffc-c807-47d3-abdf-e6e8858dfaa3.tif  (12441895 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/objects/beihai.tif  (12446432 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/objects/bird.mp3  (5992608 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/objects/metadata/transfers/smoke_test-a138ef57-41d5-429e-b24e-11208939a806/checksum.md5  (309 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/objects/metadata/transfers/smoke_test-a138ef57-41d5-429e-b24e-11208939a806/checksum.sha1  (349 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/objects/metadata/transfers/smoke_test-a138ef57-41d5-429e-b24e-11208939a806/checksum.sha256  (469 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/objects/metadata/transfers/smoke_test-a138ef57-41d5-429e-b24e-11208939a806/directory_tree.txt  (325 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/objects/metadata/transfers/smoke_test-a138ef57-41d5-429e-b24e-11208939a806/metadata.csv  (3268 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/objects/metadata/transfers/smoke_test-a138ef57-41d5-429e-b24e-11208939a806/rights.csv  (1129 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/objects/ocr-image-ace4d78d-a962-498a-a6d1-ec772da20d70.tif  (819510 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/objects/ocr-image.png  (14644 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/objects/piiTestDataCreditCardNumbers.txt  (277 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/objects/submissionDocumentation/transfer-smoke_test-a138ef57-41d5-429e-b24e-11208939a806/Agreement-Gift-MBRS-project.pdf  (100831 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/objects/submissionDocumentation/transfer-smoke_test-a138ef57-41d5-429e-b24e-11208939a806/METS.xml  (89452 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/thumbnails/19c3dd08-0dfd-438e-ab4d-1a144158f282.jpg  (4967 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/thumbnails/3bb30737-7712-454f-bc4c-664afdf4bb2c.jpg  (3377 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/thumbnails/bb1edad1-641f-42f5-bae1-ad0afb51ac28.jpg  (1096 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/thumbnails/d098317b-b978-4eac-aac3-4e6e6c493b06.jpg  (6030 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/thumbnails/ec0315b3-7b02-415f-b57f-3cccdf9f9918.jpg  (1096 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/manifest-sha256.txt  (4299 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/tagmanifest-md5.txt  (145 B)... OK.
  smoke_test-c18b11f4-4900-4c23-93f5-8fa2b2e8b491/data/objects/bird-890307d0-7fd8-43ee-8e23-cd35cf1a688a.wav  (26431800 B)... Failed! (Wrong checksum)
Extraction to directory "/var/archivematica/storage_service/tmpqnXa7R" failed (1 file failed.)```

**Steps to reproduce**
Make an AIP, attempt to do full reingest


**Your environment (version of Archivematica, OS version, etc)**
1.9.1 on bionic, with the AM and SS installed on separate VMs.


---
**For Artefactual use:**
Please make sure these steps are taken before moving this issue from Review to Verified in Waffle:

- All PRs related to this issue are properly linked 👍
- All PRs related to this issue have been merged 👍
- Test plan for this issue has been implemented and passed 👍
- Documentation regarding this issue has been written and it has been added to the release notes, if needed 👍
@sromkey sromkey added the Severity: critical A high-priority situation that is causing the software to fail catastrophically, making it unusable. label Mar 21, 2019
@sromkey sromkey added this to the 1.9.1 milestone Mar 21, 2019
@ross-spencer ross-spencer assigned sevein and mamedin and unassigned ross-spencer Mar 21, 2019
@mamedin
Copy link

mamedin commented Mar 21, 2019

I could reproduce the issue with the transfer archivematica-sampladate>training-resources>DemoTransfer with the AM and SS installed on separate VMs. When using AM and SS in the same VM this issue doesn't happen.

I have compared both environments when running a reingest.

a) Pipeline and SS in Separate VMs: SS uses unar to extract the 7z AIP file.

INFO      2019-03-21 13:06:23  locations.models.package:package:extract_file:1437:  Extracting file with: ['unar', '-force-overwrite', '-o', u'/var/archivematica/storage_service/tmpbAJTCN', u'/var/archivematica/sharedDirectory/www/AIPsStore/92c1/dd6f/135f/4174/aff8/be41/2504/ab83/mamedin-demotransfer-191-newpath-92c1dd6f-135f-4174-aff8-be412504ab83.7z'] to /var/archivematica/storage_service/tmpbAJTCN/mamedin-demotransfer-191-newpath-92c1dd6f-135f-4174-aff8-be412504ab83

b) Standard AM: SS uses 7z to extract the 7z AIP file:

INFO      2019-03-21 13:05:00  locations.models.package:package:extract_file:1437:  Extracting file with: ['7z', 'x', '-bd', '-y', '-o/var/archivematica/storage_service/tmpw20S3f', u'/var/archivematica/sharedDirectory/www/AIPsStore/d6ea/3170/a8d8/4743/91ea/5fa2/4ac1/da17/demotransfer-d6ea3170-a8d8-4743-91ea-5fa24ac1da17.7z'] to /var/archivematica/storage_service/tmpw20S3f/demotransfer-d6ea3170-a8d8-4743-91ea-5fa24ac1da17

I run the same "unar" command for each AIP and I got the same error in both enviroments:

  demotransfer-d6ea3170-a8d8-4743-91ea-5fa24ac1da17/data/objects/bird-3557551d-ade6-4cf4-8b1a-30b2a83e726e.wav  (26431800 B)... Failed! (Wrong checksum)
Extraction to current directory failed! (1 file failed.)

@ross-spencer
Copy link
Contributor

The reason for the switch is in package.extract_files():

        if self.is_compressed:
            # The command used to extract the compressed file at
            # full_path was, previously, universally::
            #
            #     $ unar -force-overwrite -o extract_path full_path
            #
            # The problem with this command is that unar treats __MACOSX .rsrc
            # ("resource fork") files differently than 7z and tar do. 7z and
            # tar convert these .rsrc files to ._-prefixed files. Similar
            # behaviour with unar can be achieved by passing `-k hidden`.
            # However, while a command like::
            #
            #     $ unar -force-overwrite -k hidden -o extract_path full_path
            #
            # preserves the .rsrc MACOSX files as ._-prefixed files, it does so
            # differently than 7z/tar do: the resulting .-prefixed files have
            # different sizes than those created via unar. This makes
            # ``bag.validate`` choke.
            if self.full_pointer_file_path:
                compression = utils.get_compression(self.full_pointer_file_path)
            else:
                compression = None  # no pointer file :. command will be unar

And I am just trying to bottom out this issue here: #599 but if there is no pointer file, which is the behavior we are seeing, the default is to unar. There are a number of ways of solving this, short- to longer- term. Once 599 is understood those options will be a little clearer.

@ross-spencer
Copy link
Contributor

ross-spencer commented Mar 22, 2019

As shown in #599 a pointer file isn't created in this type of deployment. And noted above, as we see in this line here: compression = None # no pointer file :. command will be unar - unar is the fallback tool used for packages during reingest.

Note: the discussion below is contingent on #599. If we can ensure pointer files are created through configuration then this problem disappears for now.

There are three tools used for unpacking an archive in the function below:

def _get_decompr_cmd(compression, extract_path, full_path):
    """Returns a decompression command (as a list), given ``compression``
    (one of ``COMPRESSION_ALGORITHMS``), the destination path
    ``extract_path`` and the path of the archive ``full_path``.
    """
    if compression in (utils.COMPRESSION_7Z_BZIP, utils.COMPRESSION_7Z_LZMA):
        return ['7z', 'x', '-bd', '-y', '-o{0}'.format(extract_path),
                full_path]
    elif compression == utils.COMPRESSION_TAR_BZIP2:
        return ['/bin/tar', 'xvjf', full_path, '-C', extract_path]
    return ['unar', '-force-overwrite', '-o', extract_path, full_path]
  • /bin/tar
  • 7z
  • unar

7z can technically (caveat: just a little testing) cover all three use cases here. unar can too, but it seems that there is an upstream problem which is part of the in the issue reported here which locally I can confirm:

  demo-normalize-12c8aeaf-4d03-44d3-969d-510613776e77/manifest-sha256.txt  (3736 B)... OK.
  demo-normalize-12c8aeaf-4d03-44d3-969d-510613776e77/tagmanifest-md5.txt  (145 B)... OK.
  demo-normalize-12c8aeaf-4d03-44d3-969d-510613776e77/data/objects/bird-be8cb7ad-ce54-419c-9151-edf5aa11a277.wav  (26431800 B)... Failed! (Wrong checksum)
  • unar is the only tool that can perform the function of all the tools in a single step.
  • for bz tar 7z needs two jumps.

So the answer isn't straight forward.

I would also like to understand the implications of using tar, two outputs shown below using both 7z and unar all the timestamps in the bag show today:

ross-spencer@artefactual:~/Desktop/temp/aips/rein/unarFolder/11-f7fba8b8-dfad-4ec9-b6ea-3f5059b21a06/data/objects$ ls -la
total 24
drwxr-xr-x 6 ross-spencer ross-spencer 4096 Mar 22 21:53 .
drwxr-xr-x 4 ross-spencer ross-spencer 4096 Mar 22 21:53 ..
drwxr-xr-x 3 ross-spencer ross-spencer 4096 Mar 22 21:53 metadata
drwxr-xr-x 2 ross-spencer ross-spencer 4096 Mar 22 21:53 one_file
drwxr-xr-x 3 ross-spencer ross-spencer 4096 Mar 22 21:53 one_zip
drwxr-xr-x 3 ross-spencer ross-spencer 4096 Mar 22 21:53 submissionDocumentation
---
ross-spencer@artefactual:~/Desktop/temp/aips/rein/7zFolder/11-f7fba8b8-dfad-4ec9-b6ea-3f5059b21a06/data/objects$ ls -la
total 24
drwx------ 6 ross-spencer ross-spencer 4096 Mar 22 21:53 .
drwx------ 4 ross-spencer ross-spencer 4096 Mar 22 21:53 ..
drwx------ 3 ross-spencer ross-spencer 4096 Mar 22 21:53 metadata
drwx------ 2 ross-spencer ross-spencer 4096 Mar 22 21:53 one_file
drwx------ 3 ross-spencer ross-spencer 4096 Mar 22 21:53 one_zip
drwx------ 3 ross-spencer ross-spencer 4096 Mar 22 21:53 submissionDocumentation

Actual solutions

  1. The pointer file is quite crucial, but in lieu of one, we could use a system tool like file to provide an indicator of the file format and call a tool based on that:
  • e.g. if a pointer file doesn't exist, use file``
$ file --mime-type -F : *
11-f7fba8b8-dfad-4ec9-b6ea-3f5059b21a06.tar:            application/x-tar
12-81b44c2c-e65f-4871-818e-056f79382e18.tar.bz2:        application/x-bzip2
13-ebd86be1-7153-4ef2-a2dc-b49010a11e3d.7z:             application/x-7z-compressed
demo-f56695de-01dd-46b6-86cf-3b84f8ed20a8.7z:           application/x-7z-compressed
demo-normalize-12c8aeaf-4d03-44d3-969d-510613776e77.7z: application/x-7z-compressed
  • The pointer file will be regenerated at the end of the re-ingest process.
  1. Another approach could be to keep the order of execution as we do now. If there isn't a pointer file, use unar, but then if there is an error, fallback to 7z. This solution would be a little quicker to implement.

NB If there were time to work on the upstream issue, i.e. identify why unar isn't working with our zip containing normalized files, then that would be great to follow up on as well.

@ross-spencer
Copy link
Contributor

NB. The different package types are currently listed like this by file:

file -F : --mime-type *
type_3-46db9581-544f-4810-a5a4-f3c988dc0ae5.7z:      application/x-7z-compressed
type_4-00368858-7e61-4f39-9938-2ee86c5c600e.7z:      application/x-7z-compressed
type_5-3ae92cc8-6042-4f6b-b44d-e9de951e5172.tar:     application/x-tar
type_6-84f6e507-ccb5-4b86-92d0-ed0458808286.7z:      application/x-7z-compressed
type_7-f78ed85f-1972-4b0f-b43e-cbc43b7ecb08.tar.bz2: application/x-bzip2

@mamedin
Copy link

mamedin commented Mar 25, 2019

I found the same error decompressing with the unar command this AIP 7z file in bionic, xenial and CentOS.

I built from sources the unar command and I got the same error.

@ross-spencer
Copy link
Contributor

@sromkey sromkey removed the Status: in progress Issue that is currently being worked on. label Apr 1, 2019
@evelynPM evelynPM self-assigned this Apr 2, 2019
@evelynPM
Copy link

evelynPM commented Apr 3, 2019

Full AIP reingest is now working on am19bionic.qa.

@sromkey sromkey added Status: verified and removed Status: review The issue's code has been merged and is ready for testing/review. labels Apr 3, 2019
@sallain
Copy link
Member

sallain commented Apr 11, 2019

Added to release notes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Severity: critical A high-priority situation that is causing the software to fail catastrophically, making it unusable.
Projects
None yet
Development

No branches or pull requests

6 participants