Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem: Pointer files are not being saved in 1.9.1 when the storage service is on a different VM #599

Closed
ross-spencer opened this issue Mar 22, 2019 · 6 comments
Assignees
Milestone

Comments

@ross-spencer
Copy link
Contributor

ross-spencer commented Mar 22, 2019

Expected behaviour

After storing a compressed AIP I can download a pointer file from the server.

Current behaviour

A 404 error occurs.

image

Steps to reproduce

Store a compressed AIP on the bionic server and try to download the pointer.

Your environment (version of Archivematica, OS version, etc)

AM19 Bionic.

Additional context

Related to #594


For Artefactual use:
Please make sure these steps are taken before moving this issue from Review to Verified in Waffle:

  • All PRs related to this issue are properly linked 👍
  • All PRs related to this issue have been merged 👍
  • Test plan for this issue has been implemented and passed 👍
  • Documentation regarding this issue has been written and it has been added to the release notes, if needed 👍
@ross-spencer
Copy link
Contributor Author

ross-spencer commented Mar 22, 2019

The code in the storage service that determines whether or not a pointer file is created looks like this:

    def should_have_pointer_file(self, package_full_path=None,
                                 package_type=None):
        """Returns ``True`` if the package is both an AIP/AIC and is a file.
        Note: because storage in certain locations (e.g., GPG encrypted
        locations) can result in packaging and hence transformation of an AIP
        directory to an AIP file, this predicate may return ``True`` after
        ``move_from_storage_service`` is called but ``False`` before.
        """
        if not package_full_path:
            package_full_path = os.path.join(
                self.current_location.space.path,
                self.current_location.relative_path,
                self.current_path)
        if not package_type:
            package_type = self.package_type
        isfile = os.path.isfile(package_full_path)
        isaip = package_type in (Package.AIP, Package.AIC)
        ret = isfile and isaip
        if not ret:
            if not isfile:
                LOGGER.info('Package should not have a pointer file because %s'
                            ' is not a file', package_full_path)
            if not isaip:
                LOGGER.info('Package should not have a pointer file because it'
                            ' is not an AIP or an AIC; it is a(n) %s', package_type)
        return ret

The code can determine that we are looking at the AIP, but at the time of checking it cannot determine whether it is a file os.path.isfile() were it a file, it would be a '7z' package and thus need a pointer file.

The reason the storage service cannot check whether this is a file is that the file is still on the dashboard's server. We can see the path in the logging output: /var/archivematica/sharedDirectory/currentlyProcessing/rs-logging-test-a467eeb0-91b5-45a8-98c6-e58a3b81f77e/rs-logging-test-a467eeb0-91b5-45a8-98c6-e58a3b81f77e.7z is not a file

The storage service does not contain this location:

root@ss191bionic:/# find * | grep currentlyProcessing
root@ss191bionic:/# 

where it does exist in the Dashboard:

root@am191bionic:/# find * | grep currentlyProcessing
var/archivematica/sharedDirectory/currentlyProcessing

If a pointer file is stored it should end up here (for the package above): var/archivematica/storage_service/a467/eeb0/91b5/45a8/98c6/e58a/3b81/f77e/pointer.a467...

Questions and solutions

Questions

  1. Is this a configuration issue? Should a mount exist in the Storage Service to be able to see the Dashboard shared directory? The mount could be created at:
root@ss191bionic:/var/archivematica/sharedDirectory# pwd
/var/archivematica/sharedDirectory
  1. More intriguingly, is this anticipated with the design of the storage service? (This is a gap in my knowledge) but shouldn't the Storage Service be able to see all locations and encapsulate all locations for an Archivematica instance?
    i. The documentation only says that /var/archivematica/sharedDirectory should be associated with 'exactly one pipeline' and so this would change per instance.

Possible solutions

First, it seems this is unlikely to be a regression. So we should ask if this should be fixed for 1.9.1.

Depending also on some of the answers above then the most likely solution seems:

  1. We can create the pointer file later on in the process, where the storage service can see the location the AIP is stored: var/archivematica/sharedDirectory/www/AIPsStore/a467/eeb0/91b5/45a8/98c6/e58a/3b81/f77e/rs-logging-test-a467eeb0-91b5-45a8-98c6-e58a3b81f77e.7z

  2. I have run out of steam this evening, so while I think I had ideas here, they're not coming right now, so might get back to this next week...

NB There are a lot of moving parts in the store_aip work in the storage service. It would be nice to wrap all of them into a call that is made right at the end of all processing (creating an AIP, re-ingesting an AIP) so that a pointer file is created dynamically from the information it can derive from the package at runtime. Or at the very least maintain state somewhere like the database up until a package is uploaded and then make a single call to create_pointer_file_for_package(...). This could easily be moved to another module, and extensions to AM such as creating a pointer for all AIP types could be a lot easier to implement.

Additionally: If we can create pointer files for this type of deployment we can resolve part of the issue in the original ticket: #594

@ross-spencer
Copy link
Contributor Author

From @jhsimpson in Slack:

I will step through it here, if you or anyone else can check my steps here to make sure I am not getting this wrong, that would be helpful
@mamedin has configured this environment using a pipeline local filesystem, and it is configured correctly as he stated

so in the storage service there are 2 Spaces defined
one is local filesystem and that Space has the aip storage location in it
that is a filesytem on the ss machine, local to it, and only accessible from it
the other Space is of type pipeline local filesystem
and has the currently processing location defined in it
that is a filesystem from the am machine, and the ss machine has ssh access to it
and that is all working as it should be
there does appear to be a bug that was introduced in 1.7.0 for this configuration
I am not sure we have a client that has this configuration that has tried aip reingest on 1.7.x or newer
here is what I think the bug is, let me step through it
storage service logs stardate March 20 11:44:55:
`INFO      2019-03-20 11:44:55  locations.models.package:package:store_aip:652:  store_aip called in Package class of SS`
so the store_aip method is here: `https://github.com/artefactual/archivematica-storage-service/blob/stable/0.14.x/storage_service/locations/models/package.py#L630`
this method calls two other methods one after the other: `_store_aip_to_pending()` and then `_store_aip_ensure_pointer_file()`
in `_store_aip_to_pending()` there is a line of code that sets the value for a variable called `origin_full_path` https://github.com/artefactual/archivematica-storage-service/blob/stable/0.14.x/storage_service/locations/models/package.py#L682
that is set to the path of the space plus the relative path of the aip
it does not have any concept of the Space that path works in
later in the method, that variable is used when calling `should_have_pointer()` https://github.com/artefactual/archivematica-storage-service/blob/stable/0.14.x/storage_service/locations/models/package.py#L699
in `should_have_pointer()` the code uses os.path.isfile to determine if the aip is a file or a folder https://github.com/artefactual/archivematica-storage-service/blob/stable/0.14.x/storage_service/locations/models/package.py#L614
but it does not realize that the path it is checking is not accessible like that

@ross-spencer
Copy link
Contributor Author

Fixes

  1. Immediate fix to enable creation of pointer files for AIPs arriving from a remote origin: Ensure that the storage service can see an AIP to test whether a pointer file needs creating artefactual/archivematica-storage-service#443

  2. A longer-term improvement (not yet scoped) for making reingest more reliable, and thus, enabling creation of pointer files for AIP instances where pointer files haven't previously been created: WIP: Develop integration testing around handling AM package-types and make package-extract more robust artefactual/archivematica-storage-service#441

@sromkey
Copy link
Contributor

sromkey commented Apr 3, 2019

I can now download a pointer file from the dashboard which I assume means this is working!

@sromkey sromkey added Status: verified and removed Status: review The issue's code has been merged and is ready for testing/review. labels Apr 3, 2019
@ross-spencer
Copy link
Contributor Author

Deffo! 🎆

@sallain
Copy link
Member

sallain commented Apr 11, 2019

Added to release notes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants