Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re-archive hand remediated moabs that may need special treatment. #1402

Open
jmartin-sul opened this issue Feb 24, 2020 · 9 comments
Open

re-archive hand remediated moabs that may need special treatment. #1402

jmartin-sul opened this issue Feb 24, 2020 · 9 comments
Labels
recovery retrieval and reconstitution from cloud archives replication_failure failure to replicate specific object(s), whether due to cloud provider hiccup or bug in our code

Comments

@jmartin-sul
Copy link
Member

jmartin-sul commented Feb 24, 2020

Some Moabs are/were invalid and need remediation; they were archived to cloud (because pres cat trusted that robots handed over good objects, or they went bad from hand edits later). should they be re-archived once they've been remediated?

remediation tickets by tag, moab_remediation online moab that may need remediation (e.g. missing files, extraneous files, corrupted content) (accurate as of 2020-04-23, but click through to see if anything is missing before closing this)

Discussed F2F w/ @andrewjbtw and he said he prefers to re-create and re-push the archives for all remediated Moabs. Even in cases where the remediation was just to remove an extraneous datastream file (i.e. no expected content was missing/corrupt), the upside of that approach is that if we do ever have to retrieve the archives from the cloud, we won't be scratching our heads and wondering/worrying about why the archived Moabs don't fully validate. The downside in general of re-archiving is that we have to do the work of deleting the old archive versions and re-pushing new versions, and that we generally frown on messing with copies that are already considered archived (good archival practice would be that the thing is written and then left undisturbed forever if possible).

@andrewjbtw would also like @julianmorley's opinion on the difficulty of clearing the old archive copies (as well as any reservations he might have that we haven't thought of).

if we decide to re-archive, using the druid lists we've collected on the above linked issues should make it easy to trigger archiving for said druids using the existing pres cat code. [answered, see comments]

@jmartin-sul
Copy link
Member Author

jmartin-sul commented Feb 28, 2020

discussed in preservation planning meeting yesterday with @julianmorley, and he is ok with re-archiving these moabs. it will be a bit of a pain, since it may all be done via manual deletion via AWS/IBM web console (slight chance he'll script it -- he's mulling it over). and it'll result in some notification spam (because we get alerts for deleted archives, since we expect such deletions to be very rare). but we decided that these things are tolerable/worth it.

once julian has deleted the bad archives, we will need to:

  • delete the ZippedMoabVersions and their associated ZipParts for the deleted archives (from pres cat's DB)
  • feed those druids to ZipMaker for re-archiving of the remediated moabs.

this is on hold until the moabs are actually remediated.

@jmartin-sul jmartin-sul changed the title some moabs are invalid due only to extraneous files; they were archived to cloud. should they be re-archived after cleanup? [BLOCKED] re-archive remediated moabs Feb 28, 2020
@jmartin-sul
Copy link
Member Author

if it's more manageable, it's fine to spin off subtickets from this for specific druid lists from this overall effort. it is possible that we won't get all the remediation done all at once, and then we may want to do the re-archiving as we go also. also fine to just do it all under this ticket. whatever is easiest for the people doing the re-archiving.

@andrewjbtw andrewjbtw reopened this Apr 17, 2020
@jmartin-sul jmartin-sul changed the title [BLOCKED] re-archive remediated moabs re-archive remediated moabs Apr 23, 2020
@jmartin-sul
Copy link
Member Author

this is unblocked for the things that are linked.

once we have a process for this, it should be documented alongside moab remediation notes, and we should also instruct moab remediators to file re-archiving tickets for moabs that they remediate (since we expect there will be occasional remediations on an ongoing basis).

@ndushay
Copy link
Contributor

ndushay commented May 11, 2020

Has Julian been asked to delete the archived copies from the cloud? Should that be done all in a lump as a first step?

@jmartin-sul
Copy link
Member Author

Has Julian been asked to delete the archived copies from the cloud? Should that be done all in a lump as a first step?

i'd be hesitant to do that without a clear plan for pushing through what needs to be re-archived very shortly after that deletion, since for the most part, the mis-archived moabs are better than having no archived moabs (e.g. the hundreds with extraneous datastream xml).

as a first step, maybe the thing to do is to gather an explicit central list of druids to re-archive, based on the tickets linked from this one?

then a clear coordinated plan to delete and re-push, with the re-push following very shortly after the deletion.

we'll also need to delete pres cat's database records for the zips we're deleting from the cloud. my gut feeling (without having thought much about that part yet) is that we want to delete our DB records first.

if we wanted to be extra careful, we could delete and re-push without doing all cloud endpoints at once, so that we're not wiping all cloud archive copies for a moab simultaneously.

@ndushay ndushay added replication_failure failure to replicate specific object(s), whether due to cloud provider hiccup or bug in our code and removed archiving labels Dec 16, 2022
@ndushay ndushay self-assigned this Dec 16, 2022
@ndushay
Copy link
Contributor

ndushay commented Dec 16, 2022

I'm going to close this ticket because we now have audits that will find problems with both on prem storage and with replicated content. Plus, it's 2 1/2 years old.

@ndushay ndushay closed this as completed Dec 16, 2022
@andrewjbtw
Copy link

That's not what this ticket is about. It's about specific moabs that were manipulated and need special treatment. It is unfortunate that it's stayed open for so long (I take a lot of the responsibility) but it's not done and has important information.

@andrewjbtw andrewjbtw reopened this Dec 16, 2022
@andrewjbtw
Copy link

It might be superseded by what I'm doing with yet more moabs that need to be edited, so all could be re-archived under a new ticket that combines those with the ones identified here. I haven't reached the point of filing the new ticket because it's very difficult to edit moabs by hand to make changes that affect previous/all versions.

@ndushay ndushay removed their assignment Dec 16, 2022
@ndushay
Copy link
Contributor

ndushay commented Jan 19, 2023

@andrewjbtw JustiListtman has done a massive remediation of all Moabs with replication errors. If you would like us to check on specific ones as you work on them, perhaps list druids here?

@ndushay ndushay changed the title re-archive remediated moabs re-archive hand remediated moabs that may need special treatment. Jan 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
recovery retrieval and reconstitution from cloud archives replication_failure failure to replicate specific object(s), whether due to cloud provider hiccup or bug in our code
Projects
None yet
Development

No branches or pull requests

3 participants