Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set mets:agent for bashlib processors #1263

Open
kba opened this issue Jul 29, 2024 · 6 comments
Open

Set mets:agent for bashlib processors #1263

kba opened this issue Jul 29, 2024 · 6 comments

Comments

@kba
Copy link
Member

kba commented Jul 29, 2024

No description provided.

@bertsky
Copy link
Collaborator

bertsky commented Jul 31, 2024

Indeed.

In Pythonic core, we use workspace.mets.add_agent during Processor.run_processor.

In Bashlib, we could add a new subcommand ocrd bashlib add-agent -m mets.xml [other-params], and wrap that in some exported function ocrd__add_agent in lib.bash to be used by processors when done. Or we already include it in ocrd__wrap.

@kba
Copy link
Member Author

kba commented Aug 6, 2024

In Bashlib, we could add a new subcommand ocrd bashlib add-agent -m mets.xml [other-params], and wrap that in some exported function ocrd__add_agent in lib.bash to be used by processors when done.

Exactly, but I would prefer an ocrd workspace add-agent subcommand for consistency.

Or we already include it in ocrd__wrap.

That would mean that the agent is added before any processing takes place, whereas in run_processor we only add the agent if the processing succeeds. So I think there's no way around bashlib processors adding the agent themselves as the last step of the script.

@bertsky
Copy link
Collaborator

bertsky commented Aug 6, 2024

Exactly, but I would prefer an ocrd workspace add-agent subcommand for consistency.

To be consistent with what exactly?

In ocrd bashlib we already have input-files [CLI-params]. Consistency (to me) would mean that we should add add-agent [CLI-params] there, because we also have to resolve all processor CLI parameters here, and it's also bashlib-specific.

Adding a general purpose add-agent to ocrd workspace would mean one still needs to translate the CLI parameters into mets:agent and mets:name / mets:note parameters somehow via shell in every processor.

@kba
Copy link
Member Author

kba commented Aug 11, 2024

Exactly, but I would prefer an ocrd workspace add-agent subcommand for consistency.

To be consistent with what exactly?

I mean consistent with e.g. ocrd workspace set-id, i.e. have all the METS metadata functionality in ocrd workspace.

In ocrd bashlib we already have input-files [CLI-params]. Consistency (to me) would mean that we should add add-agent [CLI-params] there, because we also have to resolve all processor CLI parameters here, and it's also bashlib-specific.

Also true, there probably won't be a need to do ocrd workspace add-agent beyond bashlib, so I'm fine with either.

Adding a general purpose add-agent to ocrd workspace would mean one still needs to translate the CLI parameters into mets:agent and mets:name / mets:note parameters somehow via shell in every processor.

Yes, no way around that, though we can wrap that in a ocrd__add_agent function which relies on the the other ocrd__ variables.

@bertsky
Copy link
Collaborator

bertsky commented Aug 11, 2024

I mean consistent with e.g. ocrd workspace set-id, i.e. have all the METS metadata functionality in ocrd workspace.

ok, got it. But then we should also have get-agent etc.

So the ocrd workspace add-agent would be very hard to use in itself, but at least we could say the CLI is complete.

Yes, no way around that, though we can wrap that in a ocrd__add_agent function which relies on the the other ocrd__ variables.

Indeed. But doing it in Python (i.e. ocrd bashlib instead of lib.bash) is still easier.

We could also do both. So

  • offer a bare-bones ocrd workspace add-agent
  • offer a ocrd bashlib add-agent [CLI-params]

@kba
Copy link
Member Author

kba commented Aug 12, 2024

I mean consistent with e.g. ocrd workspace set-id, i.e. have all the METS metadata functionality in ocrd workspace.

ok, got it. But then we should also have get-agent etc.

So the ocrd workspace add-agent would be very hard to use in itself, but at least we could say the CLI is complete.

It's probably not a good investment of effort to offer generic CLI getters/setters for something we only need for bashlib (@maxnth raised this question, hence this issue). So, I'm good with just ocrd bashlib add-agent.

Yes, no way around that, though we can wrap that in a ocrd__add_agent function which relies on the the other ocrd__ variables.

Indeed. But doing it in Python (i.e. ocrd bashlib instead of lib.bash) is still easier.

Agreed, so we'd have a ocrd bashlib add-agent subcommand that accepts options for --executable, --other-role and the usual CLI arguments (-I, -O, -g, -P etc.) and adds a mets:agent just like at the end of run_processor.

Considering that most times, I use processors with ocrd process instead of directly, and ocrd_network also relies on it, we could also instead add the agent at the end of run_cli. We should not do both, obviously, and the CLI should work self-contained, so that's probably not a real solution.

What about processingStep PAGE-XML metadata? Should we also add an option for --page-xml, so that is also consistent with what is Processor.add_metadata in the v3 API

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants