Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add ability to copy more fields from the report #2513

Merged
merged 3 commits into from
Jul 25, 2024

Conversation

kamil-certat
Copy link
Contributor

Sometimes collectors sets more context information, e.g. the source file name. Currently, parsers usually remove those non-standard information. With this change, we let the user decide to copy more information to the event.

In addition, the currently copied fields were documented.

Real use case: the HTTP collector sets filename when extracting data from an archive. This filename contains an information we want to include in the final event, which we cannot retrieve in any other way.

Sometimes collectors sets more context information, e.g.
the source file name. Currently, parsers usually remove
those non-standard information. With this change, we let
the user decide to copy more information to the event.

In addition, the currently copied fields were documented.
@kamil-certat kamil-certat marked this pull request as ready for review July 9, 2024 13:05
Copy link
Member

@sebix sebix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced by the parameter name. Custom fields sound very generic and is ambiguous, also it lacks any mention of report or event.

What about replicate_report_fields(_to_event)? It's longer, but it speaks by itself.

Comment on lines 1249 to 1255
if self.copy_custom_fields:
for key in self.copy_custom_fields:
if key not in report:
continue
for event in events:
event.add(key, report.get(key), overwrite=False)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this code is in intelmq.lib.bot.ParserBot.process, it will not be active in all parsers as the above documentation suggests.

Adding it to intelmq.lib.message.Report.__init__ (and passing the parameter in intelmq.lib.bot.Bot.new_event will cover all cases AFAIK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, you mean some parsers do not use the process? Then they do not use the default_fields, right?

Copy link
Member

@sebix sebix Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

process can be overwritten

Copy link
Contributor Author

@kamil-certat kamil-certat Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, it could be, but I didn't know it's a broad practice - I've looked at the source code, and you're right, a lot of parsers rely on overriding the whole process. I'll move the code to the new_event, and I think we should consider moving support of default_fields there as well as this is something I'd expect all parsers to support (but it's a separated thing).

CHANGELOG.md Outdated Show resolved Hide resolved
@sebix sebix added the feature Indicates new feature requests or new features label Jul 9, 2024
@kamil-certat
Copy link
Contributor Author

@sebix I'm open to name changes, but hmm, I'm not convinced with your suggestion either. Actually, my original idea was copy_fields (as direct similarity to default_fields) or copy_report_fields. However, I realised it would then create a confusion that those are all copied (not parsed) fields, which is not true - creating an event from the report has a hardcoded list of fields to replicate. This is how additional came to play :D

What would you say about:

  • extended_report_fields
  • copy_additional_report_fields
  • replicate_extra_report_fields (although this is too narrow, with a different harmonisation it could be more than extra.*)
  • extendend_report_fields_to_copy (actually, this suggests we want a list, while copy_... sounds like Y/N question)

...or any other? I think that the last sounds currently the best

@sebix
Copy link
Member

sebix commented Jul 22, 2024

Some remarks by @th-certbund:

  • "custom" is also very generic
  • "report" is ambiguous too
  • What about "copy_collector_provided_fields" or similar?

@kamil-certat kamil-certat requested a review from sebix July 25, 2024 08:54
Copy link
Member

@aaronkaplan aaronkaplan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, I read it and it's understandable.

@aaronkaplan
Copy link
Member

thank you @kamil-certat !!

@aaronkaplan aaronkaplan merged commit 9037088 into certtools:develop Jul 25, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: bots feature Indicates new feature requests or new features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants