-
Notifications
You must be signed in to change notification settings - Fork 353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add parser for Windows Event Log XML output #442
Comments
|
Since "Window Event Log XML output" is not always proper XML (see https://github.com/dfirlabs/evtx-specimens), might be better to have libewf/pyevtx expose string/value names |
Cannot parse "named values" from the XML since value names are not unique, see example below
Likely would need the WEVT templates here |
When dealing with "named values" that are not unique, has the team thought about enumeration of field names (i.e. creating Am I correct in thinking that going the "WEVT templates" route would thus require a template for every possible event? If so would the team be amenable to the community providing (via survey or other) the top X events that they'd like to have parsed to begin the fun? |
Who is "the team" ? What "community" ? What are "top X events" for what purpose ? Who is going to do the survey, you? This is an open source project feel free to contribute.
There are strings this is the most reliable way. This does not address that [From https://osdfir.blogspot.com/2021/10/common-misconceptions-about-windows.html]:
More research is needed what the behavior of what each of these "values" means in the context of a specific version of an event log provider.
Unclear to me what you mean, please elaborate
No, not every event has a corresponding WEVT_TEMPLATE resource |
Thanks for the feedback @joachimmetz. I'll tap out. |
@rj-chap can you be more clear and explain your expressions. This is an international audience, not native English speakers, and I don't assume you mean https://www.urbandictionary.com/define.php?term=tap%20out |
This is now my favorite GitHub issue ever. 100%. I just meant I would leave the conversation and watch the development unfold. But we've gone this far, might as well keep going! No way I can resist after a fantastic response such as that.
As you note in your linked article, correlation between the values (or params) and their location(s) within (or even association to) event message strings are often convoluted. Personally my focus is on raw
I'd love to see In your linked reference, I would like to see the raw field data, such as:
Correlation of values such as Your thoughts? |
This comment is related to the event strings (information extracted from eg. EventData and their application in the event message. This is not only convoluted this might be non-existent. Additional information is necessary to indicate so. The event strings used the message string at least have some context to indicate how they should be used.
That is only 1 form event strings can be specified as, also see: https://github.com/libyal/libevtx/blob/main/documentation/Windows%20XML%20Event%20Log%20(EVTX).asciidoc#event-data
There is nothing in the EventXML that makes Since they are not unique they cannot be used as keys, see one of the previous examples. Suffixing can be a work-around, but that would mean we're altering data which has its own set of challenges.
the Also what do you mean with "Correlation" in this context? This is just a way of "parameter substitution"
It is unclear to me what problem are you trying to solve or what point you are trying to make. What is the underlying analysis method you are trying to improve? Where is that documented? |
I'd like to begin by stating that I truly appreciate plaso along with all of the work that has gone into the project. I have nothing but respect for everyone who has devoted their time and passion to the project. As such I am not trying to prove any point or indicate that any of my suggestions are "right" or "correct." For that matter I do not intend to cause any consternation on your behalf. What I am trying to do is provide potential ways to assist the project in the ways that I can. I am not a developer. I am a DFIR analyst. The analysis method that I am attempting to improve is the ability for an analyst to query the data set generated by the tool. I'm looking at the ingestion of plaso-generated data into a log aggregation and/or SIEM tool for bulk analysis. A simple example being plaso data pushed into Elastic to be analyzed via Kibana or TimeSketch. Whether an analyst uses this bulk analysis approach or simply intends to analyze plaso-generated data within a CSV locally, the goal is to provide the ability to identify data quickly and efficiently. As-is, analysts can identify Event IDs ( This is where my ideas in this and my initial Issue (#3988) come into play. For a simple event such as a
For situations such as these, it would be absolutely amazing to have these values parsed by plaso's EVTX parser. Right now, the I think the easiest way to re-frame what I've been suggesting up to now would be to ask if it's possible the EVTX parser be updated to identify and extract these well-name potential fields for now, while additional research is conducted to find a solid way to parse all event log data in a more robust fashion. |
Parsing EventXML as XML is not an option, since it is not proper XML. So basically you're asking to move a complex and maintenance heavy solution into Plaso. While a more robust way is to just map the strings to predefined fields in ELK based on the message identifier / event provider version. |
Ah, understood. My initial feedback was aimed at discussing ways to handle the more difficult situations, but I'm obviously out of my element. Moving forward in our conversation, I ditched the idea of dealing with the more difficult scenarios and figured a quick fix to provide some additional context would work. I was thinking of looping through each line in the Abstracting the idea in bash:
On paper, the above seems like a quick fix to provide some additional context while a more solid solution is researched & built. It's easy for us non-developers to think things are "easy" when they are not. I think overall I'm looking for shortcuts that, to a legitimate developer such as yourself, only add maintenance and disorganization. It's not just a loop to identify possible Data Names and values... it's creating a new list to hold the dictionaries, appending to the list when a line matches the expression, looping through the array to unfold the items before returning, updating everything else to deal with additional fields that might be returned, and all the other things I can't think of because I'm not a dev. Thanks for all the feedback Joachim. The last thing I'd want to do is introduce more headache to the project. |
What do you mean with "more difficult situations"? Data format edge cases? A general good first step (as developer, analysts what ever hat you wear) understand the data and its edge cases. See https://osdfir.blogspot.com/2020/09/testing-digital-forensic-data.html for a more detailed write up.
If you do that in your analysis scripts and you're fully aware of this limitation that could be acceptable for the case at hand. IMHO not a method that is applicable for a tool like Plaso used for many different cases.
This has nothing to do with "developers versus non-developer". The fact that someone can code does not (necessarily) make them a developer. The fact that someone can analyze does not (necessarily) make them an analyst. This is confirmation bias (or tunnel vision) and a very concerning development in the DFIR field.
IMHO having a "researched solution" is a prerequisite for a method to be considered forensics in the first place. If one cannot reason about the about the method and findings it produces in a discrete and transparent way, it is not method that should be used in a forensics context. |
The case at hand is working an incident. As for this or similar methods' place in plaso, totally understood. This is why entities who perform IR at scale have parsers built on top of plaso or build their own tool outright for parsing. I was attempting to bridge the gap in these cases, as plaso makes things so much simpler for many folks around the world.
I think overall we differ in opinion on methods used to perform incident response. My general notes are not "forensic" in nature, as you note. But attempting to review every single |
As I said "a more robust way is to just map the strings to predefined fields in ELK based on the message identifier / event provider version." Again why are you focused (tunneled/scoped) on the xml_string? Using the xml_string is a "broken record" of the only way to scale this problem. It is not. Think outside the box, you don't need to look at the xml_string do things at scale. Have a read of https://osdfir.blogspot.com/2021/10/pearls-and-pitfalls-of-timeline-analysis.html.
There will always be custom needs, but AFAIK "these entities" have not "contributed" to this project as in a PR. There is a lot of tools built on top of other DFIR FOSS tools as well, with very little contribution back to these projects. This is mostly a different problem.
Then the best thing you can do is to research this gap. Do not just dump an idea to expect it to be implemented. Do the leg work, help out, contribute, build this "community" you are dreaming of. But do it the right way, not yet another broken analysis methodology based on assumptions.
Even in IR these principles apply, if you cannot sufficient rely on your methodologies you make the wrong conclusion. e.g. missing a secondary backdoor, missing a compromised host, exposing system privileges while collecting data, if your mitigation is actually sufficient, etc. especially at scale, independent of technology. Or you create a huge number of (false) positive wearing down your analysts. However this is more a separate holistic conversation than about a single feature. |
Note that Windows Event Log XML output (as exported by Windows EventViewer) is not necessary proper XML. Also see: https://github.com/dfirlabs/evtx-specimens and #3595
Extract TimeCreated from xml_stringChanges to extract EVTX TimeCreated #442 #2651Have libevtx expose xml_string creation timeDepends on add (content) creation time to evtxexport output libyal/libevtx#21Changed winevtx parser to not parse XML string #3595 #3596The text was updated successfully, but these errors were encountered: