Set mets:agent for bashlib processors #1263

kba · 2024-07-29T12:03:02Z

No description provided.

bertsky · 2024-07-31T21:58:11Z

Indeed.

In Pythonic core, we use workspace.mets.add_agent during Processor.run_processor.

In Bashlib, we could add a new subcommand ocrd bashlib add-agent -m mets.xml [other-params], and wrap that in some exported function ocrd__add_agent in lib.bash to be used by processors when done. Or we already include it in ocrd__wrap.

kba · 2024-08-06T10:51:49Z

In Bashlib, we could add a new subcommand ocrd bashlib add-agent -m mets.xml [other-params], and wrap that in some exported function ocrd__add_agent in lib.bash to be used by processors when done.

Exactly, but I would prefer an ocrd workspace add-agent subcommand for consistency.

Or we already include it in ocrd__wrap.

That would mean that the agent is added before any processing takes place, whereas in run_processor we only add the agent if the processing succeeds. So I think there's no way around bashlib processors adding the agent themselves as the last step of the script.

bertsky · 2024-08-06T12:03:53Z

Exactly, but I would prefer an ocrd workspace add-agent subcommand for consistency.

To be consistent with what exactly?

In ocrd bashlib we already have input-files [CLI-params]. Consistency (to me) would mean that we should add add-agent [CLI-params] there, because we also have to resolve all processor CLI parameters here, and it's also bashlib-specific.

Adding a general purpose add-agent to ocrd workspace would mean one still needs to translate the CLI parameters into mets:agent and mets:name / mets:note parameters somehow via shell in every processor.

kba · 2024-08-11T11:34:49Z

Exactly, but I would prefer an ocrd workspace add-agent subcommand for consistency.

To be consistent with what exactly?

I mean consistent with e.g. ocrd workspace set-id, i.e. have all the METS metadata functionality in ocrd workspace.

In ocrd bashlib we already have input-files [CLI-params]. Consistency (to me) would mean that we should add add-agent [CLI-params] there, because we also have to resolve all processor CLI parameters here, and it's also bashlib-specific.

Also true, there probably won't be a need to do ocrd workspace add-agent beyond bashlib, so I'm fine with either.

Adding a general purpose add-agent to ocrd workspace would mean one still needs to translate the CLI parameters into mets:agent and mets:name / mets:note parameters somehow via shell in every processor.

Yes, no way around that, though we can wrap that in a ocrd__add_agent function which relies on the the other ocrd__ variables.

bertsky · 2024-08-11T19:10:44Z

I mean consistent with e.g. ocrd workspace set-id, i.e. have all the METS metadata functionality in ocrd workspace.

ok, got it. But then we should also have get-agent etc.

So the ocrd workspace add-agent would be very hard to use in itself, but at least we could say the CLI is complete.

Yes, no way around that, though we can wrap that in a ocrd__add_agent function which relies on the the other ocrd__ variables.

Indeed. But doing it in Python (i.e. ocrd bashlib instead of lib.bash) is still easier.

We could also do both. So

offer a bare-bones ocrd workspace add-agent
offer a ocrd bashlib add-agent [CLI-params]

kba · 2024-08-12T10:14:12Z

I mean consistent with e.g. ocrd workspace set-id, i.e. have all the METS metadata functionality in ocrd workspace.

ok, got it. But then we should also have get-agent etc.

So the ocrd workspace add-agent would be very hard to use in itself, but at least we could say the CLI is complete.

It's probably not a good investment of effort to offer generic CLI getters/setters for something we only need for bashlib (@maxnth raised this question, hence this issue). So, I'm good with just ocrd bashlib add-agent.

Yes, no way around that, though we can wrap that in a ocrd__add_agent function which relies on the the other ocrd__ variables.

Indeed. But doing it in Python (i.e. ocrd bashlib instead of lib.bash) is still easier.

Agreed, so we'd have a ocrd bashlib add-agent subcommand that accepts options for --executable, --other-role and the usual CLI arguments (-I, -O, -g, -P etc.) and adds a mets:agent just like at the end of run_processor.

Considering that most times, I use processors with ocrd process instead of directly, and ocrd_network also relies on it, we could also instead add the agent at the end of run_cli. We should not do both, obviously, and the CLI should work self-contained, so that's probably not a real solution.

What about processingStep PAGE-XML metadata? Should we also add an option for --page-xml, so that is also consistent with what is Processor.add_metadata in the v3 API

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set mets:agent for bashlib processors #1263

Set mets:agent for bashlib processors #1263

kba commented Jul 29, 2024

bertsky commented Jul 31, 2024

kba commented Aug 6, 2024

bertsky commented Aug 6, 2024

kba commented Aug 11, 2024

bertsky commented Aug 11, 2024

kba commented Aug 12, 2024

Set mets:agent for bashlib processors #1263

Set mets:agent for bashlib processors #1263

Comments

kba commented Jul 29, 2024

bertsky commented Jul 31, 2024

kba commented Aug 6, 2024

bertsky commented Aug 6, 2024

kba commented Aug 11, 2024

bertsky commented Aug 11, 2024

kba commented Aug 12, 2024