LLVM MCA that performs analysis online.
On Ubuntu 22.04, you will need at the very least before you get started:
sudo apt install build-essential binutils cmake ninja-build
This project uses upstream LLVM:
git clone https://github.com/llvm/llvm-project.git
To build and install it, here are some suggested CMake configurations:
cd llvm-project
git am /path/to/LLVM-MCA-Daemon/patches/*.patch # Important: add custom LLVM patches
mkdir install
mkdir build && cd build
cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug \
-DCMAKE_INSTALL_PREFIX=$(realpath ../install) \
-DBUILD_SHARED_LIBS=ON \
-DLLVM_TARGETS_TO_BUILD="X86;ARM;PowerPC" \
-DLLVM_USE_LINKER=gold \
-DLLVM_DEFAULT_TARGET_TRIPLE=x86_64-pc-linux-gnu \
../llvm
ninja llvm-mca llvm-mc LLVMDebugInfoDWARF
ninja install
Note that LLVM-MCAD uses a modular design, so components related to QEMU/BinaryNinja/Vivisect are not built by default. Please checkout their prerequisites inside their folders under the plugins
directory.
You only need one additional CMake argument: LLVM_DIR
. This should point to LLVM's CMake subdirectory inside of the install prefix you gave above as CMAKE_INSTALL_PREFIX
. Here is an example:
mkdir .build && cd .build
cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug \
-DLLVM_DIR=$(realpath ../llvm-project/install/lib/cmake/llvm) \
-DLLVM_MCAD_ENABLE_PLUGINS=all \
../
ninja all
For instance, if you build llvm-project
using the previous steps, "/path/to/installed-llvm/lib/cmake/llvm" will be "/path/to/llvm-project/.build/lib/cmake/llvm".
Note that plugins under the plugins
folder are not built by default. Please add -DLLVM_MCAD_ENABLE_PLUGINS=all
if you want to build all of them, or give a semicolon-separated list of the choices qemu
, tracer
, binja
or vivisect
to select specifically which plugins to build.
Here are some other CMake arguments you can tweak:
LLVM_MCAD_ENABLE_ASAN
. Enable the address sanitizer.LLVM_MCAD_ENABLE_TCMALLOC
. Uses tcmalloc and its heap profiler.LLVM_MCAD_ENABLE_PROFILER
. Uses CPU profiler from gperftools.LLVM_MCAD_FORCE_ENABLE_STATS
. Enable LLVM statistics even in non-debug builds.
We also ship LLVM-MCAD with Docker. Simply run ./up
from the docker directory. Then use MCAD like so:
$ # LLVM-MCA-Daemon directory is located inside /work
$ # Port mappings are different for different Brokers. 50051 - Vivisect, 50052 - Binja
$ docker run -p 50052:50052 mcad_dev --debug -mtriple="armv7-linux-gnueabihf" -mcpu="cortex-a57" --use-call-inst --use-return-inst --noalias=false -load-broker-plugin=/work/LLVM-MCA-Daemon/build/plugins/binja-broker/libMCADBinjaBroker.so
Here is an example of using llvm-mcad
-- the main command line tool -- with the qemu-broker Broker plugin (Please refer to the plugins/qemu-broker
folder for more details about how to build this plugin).
First, on the server side:
# Server
cd .build
# ARM
./llvm-mcad -mtriple="armv7-linux-gnueabihf" -mcpu="cortex-a57" \
--load-broker-plugin=$PWD/plugins/qemu-broker/libMCADQemuBroker.so \
-broker-plugin-arg-host="localhost:9487"
# X86
./llvm-mcad -mtriple="x86_64-unknown-linux-gnu" -mcpu="skylake" \
--load-broker-plugin=$PWD/plugins/qemu-broker/libMCADQemuBroker.so \
-broker-plugin-arg-host="localhost:9487"
# PowerPC 64-bit little endian
./llvm-mcad -mtriple="powerpcle-linux-gnu" -mcpu="pwr10" \
--load-broker-plugin=$PWD/plugins/qemu-broker/libMCADQemuBroker.so \
-broker-plugin-arg-host="localhost:9487"
Then, on the client side:
# Client
# ARM
/path/to/qemu/build/qemu-arm -L /usr/arm-linux-gnueabihf \
-plugin /path/to/llvm-mcad/.build/plugins/qemu-broker/Qemu/libQemuRelay.so,\
arg="-addr=127.0.0.1",arg="-port=9487" \
-d plugin ./hello_world.arm
# X86
/path/to/qemu/build/qemu-x86_64 \
-plugin /path/to/llvm-mcad/.build/plugins/qemu-broker/Qemu/libQemuRelay.so,\
arg="-addr=127.0.0.1",arg="-port=9487" \
-d plugin ./hello_world.x86_64
# PowerPC 64-bit little endian
/path/to/qemu/build/qemu-ppc64le \
-plugin /path/to/llvm-mcad/.build/plugins/qemu-broker/Qemu/libQemuRelay.so,\
arg="-addr=127.0.0.1",arg="-port=9487" \
-d plugin ./hello_world.ppc64le
Here are some other important command line arguments:
-mca-output=<file>
. Print the MCA analysis result to a file instead of STDOUT.-broker=<asm|raw|plugin>
. Select the Broker to use.- asm. Uses the
AsmFileBroker
, which reads the input from an assembly file. This broker essentially turns the tool into the originalllvm-mca
. - raw. Uses the
RawBytesBroker
. Functionality TBD. - plugin. Uses Broker plugin loaded by the
-load-broker-plugin
flag.
- asm. Uses the
-load-broker-plugin=<plugin library file>
. Load a Broker plugin. This option implicitly selects the plugin Broker kind.-broker-plugin-arg.*
. Supply addition arguments to the Broker plugin. For example, if-broker-plugin-arg-foo=bar
is given, the plugin will receive-foo=bar
argument when it's registering with the core component.-cache-sim-config=<config file>
. Please refer to this document for more details.
We use LLVM's LIT testing infrastructure.
$ cd test
$ ./my-lit.py -v .
LLVM-MCAD is roughly splitted into two parts: The core component and the Broker. The core component manages and runs LLVM MCA. It is using a modified version of MCA which analyzes input instructions incrementally. That is, instead of retrieving all native instructions (from an assembly file, for example) and analyzing them at once, incremental MCA continuously fetches small batch of instructions and analyze them before fetching the next batch. The "native instructions" we're discussing here -- the input to MCA -- are represented by llvm::MCInst
instances. And Broker is the one who supplies these MCInst
batches to the core component.
A Broker exposes a simple interface, defined by the Broker
class, to the core component. You can either choose from one of the two built-in Brokers -- AsmFileBroker
and RawBytesBroker
(WIP) -- or create your own Brokers via the Broker plugin. There is basically no assumption on the execution model of a Broker, so you can either create a simple Broker like AsmFileBroker
, which simply reads from an assembly file, or a complex one like qemu-broker
in the plugins
folder that interfaces with QEMU using TCP socket and multi-threading.
Currently we're only displaying the MCA result using SummaryView
, which print nothing but basic information like total cycle counts and number of uOps. In the future we're planning to support more variety of MCA views, or even another plugin system for customized views.
- LLVM MCA for dynamic analysis. A lightweight, general-purpose execution trace simulation and analysis framework.
- Able to perform online analysis whose target program runs for a long duration.
- Able to scale up with the input.
- Acceptable (analysis) performance
- Low memory footprint
- Good interoperatability with upstream LLVM
- Be able to upstream this project (the core component) in the future.
- Fixed number of input sources.
- Different from the original
llvm-mca
tool, you can create your own Broker plugin to harvestMCInst
from arbitrary medias like execution traces or object files.
- Different from the original
- Multi-threading in the core component
- The core component should have a simple execution model, so we don't run MCA on a separate thread. You can, and encouraged to, run your custom Broker on a separate thread to increase the throughput.
- Has any assumption on Broker plugin's execution model.
- Manage Broker plugin's lifecycle.
- We dont' have explicit callbacks for Broker plugin's lifecycle. Developers of Brokers are expected to manage the lifecycle on their own, and encouraged to execute tasks in an on-demand fashion (e.g.
AsmFileBroker
only parses the assembly file after the first invocation to itsfetch
method).
- We dont' have explicit callbacks for Broker plugin's lifecycle. Developers of Brokers are expected to manage the lifecycle on their own, and encouraged to execute tasks in an on-demand fashion (e.g.
The main
branch is configured to work with upstream LLVM. We try our best to keep it compatible, but please raise an issue if it fails to build. LLVM_COMMIT_ID.txt
contains the latest commit ID that MCAD was reliably tested with.
The fse23
branch contains code that was part of the publication presented at FSE'23.
Please cite as follows:
@inproceedings{fse2023mcad,
author = {Hsu, Min-Yih and Hetzelt, Felicitas and Gens, David and Maitland, Michael and Franz, Michael},
title = {A Highly Scalable, Hybrid, Cross-Platform Timing Analysis Framework Providing Accurate Differential Throughput Estimation via Instruction-Level Tracing},
year = {2023},
isbn = {9798400703270},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3611643.3616246},
doi = {10.1145/3611643.3616246},
booktitle = {Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering},
pages = {821–831},
numpages = {11},
keywords = {combining static and dynamic analyses, differential throughput analysis, performance, throughput analysis},
location = {San Francisco, CA, USA},
series = {ESEC/FSE 2023}
}
However, this (and the broker-improvements
) branch contains outdated code that requires a custom LLVM available here (branch dev-incremental-mca
).
This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) and Naval Information Warfare Center Pacific (NIWC Pacific) under Contract Number N66001-20-C-4027 and 140D0423C0063. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the DARPA, NIWC Pacific, or its Contracting Agent, the U.S. Department of the Interior, Interior Business Center, Acquisition Services Directorate, Division III.