Problem: there is no ADR for Archivematica reporting functionality #24

peterVG · 2020-05-05T22:19:49Z

There is pent up demand from Archivematica users to introduce reporting functionality to Archivematica. They want statistics about what their Archivematica deployments are doing and when, as well as detailed breakdowns of the content in their Archivematica systems. While the existing Archival Storage search and hit display does provide some useful information, it does not aggregate this information or present it in management-style reports. Work on a comprehensive reporting feature has been delayed because it hasn't been clear where the canonical source of Archivematica statistical and content information is stored or which of these sources is the most convenient, comprehensive, and performant source for building reports. There is also some mixup between logging and reporting functionality. All of this is complicated by the fact that production Archivematica deployments are often split over multiple processing pipelines. This ADR should address these problems and provide options for moving forward with a solution(s).

peterVG · 2020-05-06T00:26:56Z

See reporting/0011-reporting.md

ross-spencer · 2020-05-08T18:48:28Z

@peterVG this is starting to take good shape.

As you noted in Slack, then using the PR functionality (you can create a WIP/Draft draft now in Github) will be good to do more detailed revision.

Some thoughts that I hope will help until our meeting Monday:

Examples

In the context and problem statement, I wonder if you can break the examples of reporting out into categories with fewer specific examples, e.g. Repository maintenance reporting (might describe how many packages, how many deletions, etc.); File format reporting (might describe no.s formats, no.s of significant properties).

I think that would then feed into the considered options as to what data is in, and which data is out, of scope in this ADR. (My feeling is that we won't be able to tackle it all).

Exhausting our data sources

I think it might start to look over-whelming but I think we can add to the data sources in Archivematica. I think I'd like to exhaust them here, at least for discussion.

I was thinking we'd at least need to add:

System logs, e.g. Nginx/MCPServer, (vs. logs which are also ancilliary contents of an AIP (for now)).
Prometheus.
SIPs as a 1st class-package due to the SFU work. (I might need to describe my language around this in person! -- but yeah, I do see a world where folks can offload their SIPs from backlog instead of always creating AIPs now)

I was trying to think of others. I flip-flop between the API a lot. There is definitely information to be extracted from there which can be a different rendering to the database. It also might not be the same API as one we might create for this work?

I like that you've noted we might need to generate information. It's a good question, if we generate it in Archivematica where do we keep it? (Enhance the METS? Other DB tables?) Is Archivematica already saturated with regards to new information?

I wonder then if the Technical forces section could then start to be split into:

Sources of information (and their longevity)
Long-term support of components, e.g. theoretically, our ES index could be replaced by any client with another indexing solution. We could replace it ourselves.
Challenges, e.g.
- the aggregation of data across multiple clients/multiple servers is a really great point, and I think there's a topology which that conjures.
- Securing the data is a great one you've picked out as well. (I wonder if future ADRs will have a separate Security section?)
- Some data not being there is another great one.

There may be other sections after we chat. I think this will then help draw out more decisions we want to make.

Emphasizing the use of this data

And just the last thing, but it would be good to keep in sight where this data ends up. And I think there may be plenty of places - mgmt reports, etc. but for Archivematica, keeping in mind that it might then consume its own reports somehow to drive re-ingest, or PAR-like actions, will be good to do do in this ADR.

One potential impact say (hypothetically), is that, we might write something extracts the data, provides nice reports, and on top of that nice visualizations. But we might also keep in mind that that thing we write, we might also write an API so that it can then be worked back into Archivematica (or indeed visualization tools). Certainly, we'll write some form of interface that we can cleanly work with and abstract from.

peterVG assigned peterVG and ross-spencer May 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem: there is no ADR for Archivematica reporting functionality #24

Problem: there is no ADR for Archivematica reporting functionality #24

peterVG commented May 5, 2020

peterVG commented May 6, 2020 •

edited by ross-spencer

Loading

ross-spencer commented May 8, 2020

Problem: there is no ADR for Archivematica reporting functionality #24

Problem: there is no ADR for Archivematica reporting functionality #24

Comments

peterVG commented May 5, 2020

peterVG commented May 6, 2020 • edited by ross-spencer Loading

ross-spencer commented May 8, 2020

peterVG commented May 6, 2020 •

edited by ross-spencer

Loading