Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core, plugin] : Virtual mappings dumping and caching #1237

Draft
wants to merge 7 commits into
base: develop
Choose a base branch
from

Conversation

Abyss-W4tcher
Copy link
Contributor

Hi 👋,

Following an observation made while investigating WIndowsAArch64 (#161 (comment)), I noticed that multiple plugins were scanning and physically mapping the entire virtual address space :

if sections is None:
sections = [
(self.minimum_address, self.maximum_address - self.minimum_address)
]

to later do bytes searching on the virtual (kernel) layer directly. However, on large memory samples, or heavily populated address spaces, this process takes quite some time, and needs to be reproduced on each plugin run.

As the input (the section to scan) and output (the virtual->physical mappings) are constant for a unique memory sample, there is no need to reproduce the calculations.


This PR introduces a new plugin windows.virtmapscanner and a new framework argument --virtmap-cache-path. Here is a demo :

  • Classical windows.filescan run :
time vol3.py -f memory_layer.raw -c conf.json windows.filescan > /dev/null 
Formatting...0.00               PDB scanning finished                  
0,02s user 0,04s system 0% cpu 4:52,12 total
  • Generate the virtual mappings cache with windows.virtmapscanner, and save it to a file :
# This command can be run once when first analyzing a memory sample
time vol3.py -f memory_layer.raw --save-config conf.json -o virtmapcache_dir/ windows.virtmapscanner
Formatting...0.00               PDB scanning finished                        
Volatility 3 Framework 2.8.0
  |                                 Sections | Virtual mappings cache file output
* | [(18446673704965373952, 70368744177663)] |             virtmapcache.json.xz
0,00s user 0,05s system 0% cpu 5:17,29 total
  • Run the windows.filescan plugin, providing the framework with the virtual mappings cache filepath :
time vol3.py -f memory_layer.raw -c conf.json --virtmap-cache-path virtmapcache_dir/virtmapcache.json.xz windows.filescan > /dev/null
Formatting...0.00               PDB scanning finished                  
0,00s user 0,03s system 0% cpu 1:54,60 total

The time spent running windows.virtmap is already profitable ! This does not only applies to the windows.filescan plugins, but any that does an entire virtual layer scanning (windows.netscan, windows.threads ...).

I'll be glad to hear your thoughts, and improvements, about this idea !


Notes :

  • The volshell integration hasn't been done yet
  • This applies to Linux too, the Linux plugin code should be 99% the same

@ikelos
Copy link
Member

ikelos commented Aug 14, 2024

Thanks, it looks like an interesting idea. I'm afraid it's another one that will likely take me some time to consider the implications of, and I'm also on holiday next week, but do please nudge me after a month if I haven't added any comments on it by then...

Copy link
Member

@ikelos ikelos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm very squeamish about this whole thing. It's a nice idea but it may easily lead to us returning incurrate results, and it's kinda bolted on the side of the framework. I think there might be nicer way of doing this, where basically whenever a layer would go to do this work, it checks if the value is already cached and if not it generates the cache as it does the manual bit. Then next run the cache will have a hit? We already have a caching mechanism/plan built in for dealing with symbol tables (which hardly gets used these days), so I feel with judicious use of uniquely naming the data so it can be found again we might be able to make this all automatic/in the background and workable for everyone.

The difficulty will be identifying the specific image we've got and making sure we accurately map it to the cache. I was going to suggest the dtb and a hash of the top level table should unqiuely identify a particular image, but this applies to all translation layers of any type. 5:S Some of them will have extremely large section_start/section_end segments, so it's really only intel that has so many of them and I'm surprised this saves that much time since the mapping method should just read the page tables from the layer and use that pretty efficiently. It depends how often the tables are reread, but I believe we cache them during a run?

Looking at your stats, this takes 7' 11" total for a task that normally takes 4' 36", so it isn't profitable until the second run at the earliest and really takes about 3 runs to start really being useful. I'd be very interested to see from a profiler where the bulk of that time is spent, both without and the process of building and then using the map.

My biggest concern is that it's either very manual (which may lead to a lot of user error and make the feature not worth using) or we make it automatic (and then if it makes a mistake it's a pig to identify and guide users through fixing it).

Either way, I would definitely need to see repeated runs of this with the results being verified as identical, and with the wrong file being passed in to see if it can detect errors if we decide to use it automatically (such as two paused VM snapshots of the same machine where lots of different processes have been started between the runs).

Caching stuff is fraught with issues which is why I'm not going to be rushing this in and even when we do get it in a smoother state, I'll probably still be debating whether it's worthwhile. This might take me some time to figure out what's the worst performing bit of the existing code, and then if there's anything that can be done to improve that specific aspect...

# Avoid decompressing and deserializing the file
# more than once. Saves time, but costs more RAM.
if not hasattr(self, "_virtmap_cache_dict"):
with open(filepath, "rb") as f:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, file handling is tricky. If this is running as a web interface on a remote server, standard open operations won't work. This should be a constructed ResourceAccessor class's open method so that it can open URL and compressed files automatically. It will then accept a URL rather than a file path.

@@ -551,6 +597,12 @@ def _scan_iterator(
assumed to have no holes
"""
for section_start, section_length in sections:
# Check the virtmap cache and use it if available
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is essentially going to generate it, why don't we... try to look it up in a cache and if it fails, do the reset of the code and then store those results into the cache? Rather than a completely separate plugin, that speeds up everything without costing any additional time? It also means the data can be stored in the local on-disk cache (like processed symbol file) which saves us having to worry about the user getting the data into the system, or messing with the UI at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generating it here would fit if an automatic mapping cache generation is implemented. This would result in filling the cache and adding processing time without the user requesting it ?

An additional plugin, also allows to unify a single method to save the cache, instead of it being randomly generated depending on the plugin first ran. As said, it is not a "ground-breaking" feature, from which the explicit core integration might come with concerns for users (this PR doesn't really add much more core logic, if the feature isn't turned on).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, isn't that exactly what this is generating here? Why not just save it rather than generating it in a different plugin as part of a separate step? This could also build up the cache as needed, so if parts haven't been computed yet, it does the full task, but if they are it uses the cache?

Implementing a cache (that might be wrong unless it's matched to the image very carefully) already seems like quite a deep-rooted feature to me...

@@ -398,6 +406,49 @@ def run(self):
plugin_config_path,
interfaces.configuration.HierarchicalDict(json_val),
)
if args.virtmap_cache_path:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just not so keen adding this functionality this way. What happens if we discovered we've missed something in the file format, or there's a more efficient way of storing the data, or something else that crops up. The more we expose this to the user and make them do work to get it, the more can go wrong. There's a comment further down that suggests a different way of working the cache that might solve a bunch of problems?

Copy link
Contributor Author

@Abyss-W4tcher Abyss-W4tcher Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, however both situation (manual / automatic) might produce different issues. With a manual implementation, at least it is easy to remove the argument and see if it was the problem (although it hopefully shouldn't, but I understand the concern).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but it'll still be bug reports we're trying to diagnose it from, so recommending --clear-cache seems as straightforward.

A list containing mappings for a specific section of this layer"""

# Check if layer is fully constructed first
if self.context.config.get(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What object has the requirement on this? It should be that the layer that uses it has an optional requirement on it, so that it'll get saved into any config files that get constructed. The layer identifier isn't even close to unique (almost all plugins use the same layer name, and class) so this will go badly when you have multiple layers you want to use this on (or a config you want to work for multiple images).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The layer_identifier should be unique ? Example layer_identifiers :

  • volatility3.framework.layers.intel.WindowsIntel32e.layer_name -> identifies the kernel layer (TranslationLayerRequirement name)
  • volatility3.framework.layers.intel.WindowsIntel32e.layer_name_Process5948 -> identifies the process 5948

I might have missed something, but it shouldn't be possible to have a duplicate layer string identifier in the layers pool ?

This specific "config"/"cache" is intended to be used for a unique memory capture, as even a dump from the same kernel a few seconds later would have different mappings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Layer name is unique for a run of volatility, but they'll likely all say primary1 or memory_layer1 or something. Across runs they're unlikely to even be different.

The process layers, similarly, won't be different across different images that have processes with the same pid... I don't think a dump from a few seconds later would have a different cache? The layer name would likely be the same, and many of the process layer names would too, but also, it could match a wildly different image...

@@ -247,7 +250,12 @@ def run(self):
default=[],
action="append",
)

parser.add_argument(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not yet comfortable with this machinery. It seems really bodged into the workings. 5:S

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, that's not a helpful comment. It's just a feeling I get but it doesn't feel like it's been smoothly integrated into how everything works, but rather has lots of external moving bits (like files the user has to pass in, which get stashed in a non-unique place in the config to get used where they're needed. As I say, I think there may be a better way that evades both the file issues, improves on the uniqueness and wouldn't require user interaction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of this cache might silently fill the filesystem without noticing the user, although the generated file is xz compressed. Indeed, everything could be stored in the cache, but then this would imply that --clear-cache command might also remove it, when trying to fix a symbol cache problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unintentionally filling the disk is a potential issue, but we can build cache checks on runs to clear out old files if it becomes a problem. What's the expected filesize for one of these xz files once it's made?

We'd want --clear-cache to do exactly that, it's supposed to set you back to square 1, so you can rule the cache out as a contributing factor to a problem.

@Abyss-W4tcher
Copy link
Contributor Author

Abyss-W4tcher commented Sep 3, 2024

Thanks for the review :)

The main idea, was to provide a way for users to use an optional functionality that would allow saving time by avoiding generating the same deterministic data (input and logic never changes) every plugin run.

As you pointed out, it is not integrated "automatically" in the framework, as I wanted it to be managed by the user, leaving him the task of selecting the right cache file for the right dump.

This hasn't been tested out on a lot of samples, but basically it calls internal framework api and stores the result (like an exported functools.lru_cache()) on disk. Since I patched the AArch64 layer scanning, the time spent on each plugin run to scan the kernel space has drastically decreased. However, on huge samples or when a lot of contiguous memory segments are allocated, this is still interesting to do the scanning once and then re-import the results afterhand.


A great application case you could consider, might be when a developer implements a scanning plugin, and needs a lot of trial and error to find needles in the virtual pages. This allows him to save quite some time on each test.


Either way, I would definitely need to see repeated runs of this with the results being verified as identical, and with the wrong file being passed in to see if it can detect errors if we decide to use it automatically (such as two paused VM snapshots of the same machine where lots of different processes have been started between the runs).

If this plugin doesn't output the same results twice, then the problem would reside inside the _layer_scan API, and directly affect the "normal" scanning process ?

With the wrong file being passed, I am afraid the time spent to verify the correctness of the mappings would require to regenerate the mappings, hence losing the initial idea of this feature (saving time and CPU cycles) ?


Making a hash of the DTB would work, but for samples taken from a different runtime kernel :/ . I wrapped my head around this problem a bit, but I couldn't find an easy solution.

Making a SHA256 of the sample would be too long, and be necessary on each run (storing it in a metadata circles the problem back) ...

@ikelos
Copy link
Member

ikelos commented Sep 3, 2024

Sure, I applaud the idea and I think it could be worthwhile using it all the time, but that would need three things:

  • Generating the map to disk when the map is first generated by the framework normally (I've got an idea of reading and hashing the first mapped page, or a few different indicators that should yield a "unique" value, but at the moment that is not unique at all).
  • A good way of uniquely mapping caches to input files (at the moment that's manual, both manual and automatic come with the chance of getting something wrong, at least with automatic we can do our best to only engage it when we're sure it matches)
  • A way to turn it off to check it's not causing problems (in the automatic route that would be via --clear-cache)

@Abyss-W4tcher
Copy link
Contributor Author

Abyss-W4tcher commented Sep 9, 2024

Alright, I am processing all your comments. I am thinking of an automated design, but the core of the problem relies on identifying a memory sample and more granularly a layer.

Additionnaly, instead of storing all the layers (of a sample) inside one cache file, requiring to load it entirely when needed, an approach where each layer cache is directly written to a file would speed-up the search and access. Even if we get layers from different samples "mixed" in the cache, this shouldn't be a problem if the layer identifier is strong.

For example, we might consider the following sample code :

stacked_layers = [f"{l.config['class']}.{l.name}" for l in current_layer.context.layers.values()]
"""['volatility3.framework.layers.physical.FileLayer.base_layer', 'volatility3.framework.layers.crash.WindowsCrashDump64Layer.memory_layer', 'volatility3.framework.layers.arm.WindowsIntel32e.layer_name']"""

layer_identifier = hashlib.sha256("unique identifier") # DTB, first mapped page ...
virtmap_filename = os.path.join(
    constants.CACHE_PATH,
    "virtmap_" + layer_identifier.hexdigest() + ".cache",
)
cache_content = {"metadata":{"stacked_layers":stacked_layers}, "mappings" : layer_mappings} # layer_mappings format is {"section": _scan_iterator_results}
with open(virtmap_filename, "w+") as f:
    json.dump(cache_content, f)

I don't know if any metadata would be interesting ? This could allow to easily recognize a file, if any manual investigation is needed.

The layer_identifier can be computed once on layer class instantiation, and used when :

  • Handling a section scan request, to determine if a mapping exists in the cache, reading it directly if available
  • Writing a freshly scanned section to disk (or appending the section results to the existing data), allowing to re-import the calculations directly on a later run

Determining if a cache file already exists is straightforward, and does not require a direct content scan. Of course, this relies on the cache filenames not being messed with.

As you pointed out, this automatic feature should be easily togglable (on/off), and also never enabled if the cache is not available (if #410 gets merged one day).

What are your thoughts on this design ? This would pop the plugin, most of the cli code, to mostly concentrate inside layers.py.

@ikelos
Copy link
Member

ikelos commented Sep 10, 2024

Yeah, so a per-layer design is ok, but identifying the layers is going to be quite a task. It's probably easiest to do it on a hash of a certain number of bytes, but if that's not spread across the entire layer then things like multiple memory snapshots may come back reporting as the layer cache hit. Honestly, it's going to be really difficult identifying one layer from another layer reliably. If there's a clever way of doing that, then go for it. Am I right in thinking that these maps will be dependent on the DTB (and therefore a process layer won't have the same sections as a kernel layer) or not?

@Abyss-W4tcher
Copy link
Contributor Author

These maps depend on the DTB (per-process and kernel), and the number of sections depends on the virtual pages allocated to the layer.

This is a difficult task, because highest level mappings might look the same between two samples, but PTE's could have been mapped and unmapped, resulting in small skipped sections. Apart from doing a SHA256 of the entire sample on each Volatility run, relying explicitely on the sample filepath (like the config ?), or trying sketchy things like gathering system clocks from the upper layer, there isn't for now an efficient way to uniquely identify a layer.

Initially, I designed this feature to work the same way as the config, where it mostly relies on the user not overwriting a memory sample filepath with another one. However, a wrong config won't get the stacking much far, whereas a wrong layer cache will silently return inaccurate data 🤔.

In the current state, I am afraid that the automatic path will introduce more issues than benefits... So, I guess it should be left as a side feature for now, as increasing the risk of returning inaccurate results automatically is what we really want to avoid ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants