Revamp Testing Infrastructure and run Multi-Kernel Tests in CI #111

rhysre · 2022-06-29T02:41:38Z

This PR revamps the multi-kernel test setup in line with concerns recently
brought up by obs-profiling, extends it to support arm64 tests, and sets the
whole thing up to run in CI via GitHub actions.

Background

Concerns were brought up by obs-profiling about the initial iteration of the
testing framework locking them into something that wasn't adequate with regards
to cross platform compatibility.

Specifically, obs-profiling wanted the core of the framework to be very easy to
run on different architectures generally (be it generating a test initramfs for
x86 on an arm64 Mac or running a test in an arm64 guest on an x86 host).
Practically speaking, this means writing the core components of the framework
in Go, as it's super easy to cross compile for different architectures and
platforms.

While these concerns were less of a problem for AWP, obs-profiling still wanted
to be able to share common logic pertaining to running BPF tests with us as to
follow the DRY principle. The concern was that either obs-profiling would have
to create something else from scratch, thus performing redundant work, or use a
framework that wasn't quite what they were looking for.

Upon discussion with @florianl, the common logic that both teams will need is
really just limited to initramfs generation and test binary running (i.e. the
init process in the VM). Anything beyond that (e.g. specifics of the tests,
driver scripts to invoke QEMU) becomes pretty team specific.

As a result, we decided that both teams can just go install his
Bluebox project to provide this core
functionality of initramfs generation and a VM init process. Bluebox is written
in Go and easily compiles to run on arm64/x86 hosts or generate initramfs's for
arm64/x86 guests. Anything beyond what Bluebox provides is team-specific and
can be implemented by the team writing the tests.

This will maximise the amount of shared code between our testing setups (which
is good from a DRY standpoint) while also allowing each team to build what
works best for them on top of Bluebox.

This PR provides the AWP-side realiziation of that work, and also sets up the
whole thing to run in CI via GitHub actions.

Changes Made

All initramfs generation logic has been moved into Bluebox. Thus the init/
directory and parts of run_tests.sh that were responsible for invoking cpio
to generate an initramfs have been removed. That job is now done by Bluebox.

The make container make target has been revamped to build a container that
can cross-compile the repo for x86_64 and arm64. See
docker/Dockerfile.builder. Building the repo for different architectures is
now as simple as make ARCH=x86_64 or make ARCH=aarch64. Debug builds are
analgous, the target name is just build-debug.

The build_mainline_kernels.sh script has been updated to build all arm64 and
x86_64 kernel images we need, and output kernel binaries with debug info, so
kernel code can be stepped through with QEMU in case of a gnarly BPF error (see
the debugging section in the updated testing/README.md). These binaries are
stored in the GCS bucket under the debug_binaries directory of each
architecture directory.

CI

As mentioned, a GitHub actions workflow has been added to run tests on multiple
kernels. It pulls kernel images from a GCS bucket and runs tests against them.

For this initial iteration, only mainline kernels (built with the
build_mainline_kernels.sh script) are run. Distro-specific kernels can be
easily added to the test matrix later via an extra line in the yaml.

Kernel images are cached via the cache action and only re-downloaded from GCS
if the containing directory changes to save bandwith and GCS costs.

It takes ~45s per kernel to run the kernel tests in a GitHub actions runner.
x86_64 and arm64 tests are run in parallel, and each encompass 8 kernels
currently (v5.11 - v5.18), so it takes ~6m to run them all. Unfortunately the
GitHub actions runners don't support KVM, which would drastically speed this
up. In the future, we can look at a self-hosted runner with KVM support if we
want to be really aggressive about testing on many, many kernels.

Whether tests pass or fail, a summary of the run and individual results are
archived in GitHub actions (see the summary page). results-mainline-{arch}
contains the console output for the tests run on each kernel and
run-summary-{arch}.txt contains a summary of the pass/fail status of each
test on that particular architecture.

Bugs

A handful of bugs were discovered and fixed to get the tests running on all
kernels (see commits). You'll note only one test is still failing --
TestFileRename on v5.11/aarch64. I'm opening this PR to get feedback now
while I continue to try to fix that, as fixing it has become a bit of a time
sink. (Thank you @mmat11 for helping me squash the last bug 🚀 )

CC: @norrietaylor @SeanHeelan @m-sample

EventsTrace was going into a print "\n" loop on a misconfigured kernel, which was hard to diagnose with the current error logging.

See comment, libbpf_probe_bpf_prog_type will return true on both aarch64 and x86_64 as it checks for the ability to _load_, not the ability to _attach_, which is what will fail on aarch64.

- Move the core initramfs and init process logic to Bluebox - Set up the whole thing to run in CI with mainline kernels v5.11-v5.18 - Redo testing/README.md for new changes

This file contains the result of all bpf_prink's, which often encompasses a wealth of information when probes fail to do something.

GPL/EventProbe/PathResolver.h

nicholasberlin

Nice work. A couple of comments.

non-GPL/LibEbpfEvents/LibEbpfEvents.c

testing/README.md

Co-authored-by: Nicholas Berlin <[email protected]>

We were attempting to dereference a struct dentry into a struct dentry *, which resulted in garbage data. Add FUNC_ARG_READ_PTREGS_NODEREF to skip the dereference step and use it to fix the test failures.

Additionally, remove nonexistent .PHONY targets and simplify build-debug by removing _internal-build-debug target.

Enabling bpf_printk has a huge performance penalty in the emulated VMs running in CI. Running aarch64 tests on Linux 5.11 without bpf_printk takes ~1m30s while the same run without bpf_printk takes ~2m10s. While this is larger than I'd expect, I can see it happening on an emulated VM, every bpf_printk call corresponds to a lot of extra instructions that have to be emulated. Disable the flag. This reduces the debug info present in CI artifacts, but if tests fail, devs should be able to run the failing test locally with -DENABLE_BPF_PRINTK to get debug output.

- Log stacktrace to stdout so logs are all serialized - Log banner warning that the trace file is always empty on CI - I can see this eating a day of a dev's time otherwise :P

testing/scripts/build_mainline_kernels.sh

testing/scripts/run_single_test.sh

.github/workflows/formatting-build.yml

.github/workflows/multikernel-tester.yml

docker/Dockerfile.builder

testing/README.md

testing/run_tests.sh

testing/scripts/build_mainline_kernels.sh

This logic should properly detect a lack of BPF trampoline support on x86 kernels that don't have it enabled (unlike the previous simple arch check). It's also incredibly subject to change in a new kernel release (what if taskstats_exit dissapears?), so add a multi-kernel test for feature probing. This should be nicely extensible to other features in the future.

.github/workflows/multikernel-tester.yml

Differences in clang-format versions were causing a headache here. clang-format 13 was fine with the code as-is, but clang-format 11 would make this change: diff --git a/non-GPL/TcLoader/TcLoader.c b/non-GPL/TcLoader/TcLoader.c index 9a8bc0c..1ed7c1d 100644 --- a/non-GPL/TcLoader/TcLoader.c +++ b/non-GPL/TcLoader/TcLoader.c @@ -371,10 +371,10 @@ static int netlink_qdisc(int cmd, unsigned int flags, const char *ifname) int rv = -1; struct rtnetlink_handle qdisc_rth = {.fd = -1}; struct netlink_msg qdisc_req = { - .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg)), - .n.nlmsg_flags = NLM_F_REQUEST | flags, - .n.nlmsg_type = cmd, - .t.tcm_family = AF_UNSPEC, + .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg)), + .n.nlmsg_flags = NLM_F_REQUEST | flags, + .n.nlmsg_type = cmd, + .t.tcm_family = AF_UNSPEC, }; if (!ifname) { This seems to be a result of the AlignConsecutiveAssignments directive in .clang-format (removing it makes the two versions agree). Dockerize the format step to cleanly solve the problem for all such cases in the future.

An option cannot have > 1 argument. Just put the list kernel images at the end (no specific flag required).

Co-Authored-By: Florian Lehner <[email protected]>

- Use readonly/local - Make conditionals more concise with || and && - Pull everything into functions, no naked code - Standardize on lowercase for local vars - Make command line arg format consistent - Rename run_single_test.sh to invoke_qemu.sh, pull out all test logic for better separation of concerns - Update README.md to suggest usage of invoke_qemu.sh instead of directly invoking QEMU - Add ~/go/bin to PATH in bash instead of workflow YAML - Greatly clean up GNU parallel invocation

Co-Authored-By: Lovel Rishi <[email protected]>

We could have restored the set of aarch64 kernels in the x86 tests with the workflow as it was.

docker/Dockerfile.builder

Makefile

Co-Authored-By: Lovel Rishi <[email protected]>

- Add tag-container target and cleanup / separate docker related vars - Remove DOCKER_IMG_UBUNUT_VERSION

Co-Authored-By: Florian Lehner <[email protected]>

We're not currently using it in CI, and all dev work on my end has been done without it. We can discuss enabling it if/when we start using a custom runner, but for now, remove the clutter.

- Make KVM probing an option, turned off by default (it seems to break debugging on my machine, and provides no real benefit anyways nested in a VBox VM) - Only provide one -append flag to QEMU, it will ignore all but the last, causing the -d option to not work

florianl

Thanks for the update! 🙏

GPL/EventProbe/PathResolver.h

lrishi

lgtm 👍🏽 Great work Rhys!! Thanks for all the collab Florian!

* Fix invalid enum relocation * Improve error logging on JSON unmarshal failure EventsTrace was going into a print "\n" loop on a misconfigured kernel, which was hard to diagnose with the current error logging. * Fix broken probe_set_features logic See comment, libbpf_probe_bpf_prog_type will return true on both aarch64 and x86_64 as it checks for the ability to _load_, not the ability to _attach_, which is what will fail on aarch64. * Rework multi-kernel-tester and run in CI - Move the core initramfs and init process logic to Bluebox - Set up the whole thing to run in CI with mainline kernels v5.11-v5.18 - Redo testing/README.md for new changes * Add debug build target * Clarify format / test-format targets, format code * Disable fail-fast in multikernel test action * Clarify and cleanup formatting/build CI workflow * Dump contents of tracefs trace file on test fail This file contains the result of all bpf_prink's, which often encompasses a wealth of information when probes fail to do something. * Fix typo in README.md Co-authored-by: Nicholas Berlin <[email protected]> * Fix test failures on Linux 5.11/aarch64 We were attempting to dereference a struct dentry into a struct dentry *, which resulted in garbage data. Add FUNC_ARG_READ_PTREGS_NODEREF to skip the dereference step and use it to fix the test failures. * Fix broken build-debug target in Makefile Additionally, remove nonexistent .PHONY targets and simplify build-debug by removing _internal-build-debug target. * Remove -DENABLE_BPF_PRINTK from CI builds Enabling bpf_printk has a huge performance penalty in the emulated VMs running in CI. Running aarch64 tests on Linux 5.11 without bpf_printk takes ~1m30s while the same run without bpf_printk takes ~2m10s. While this is larger than I'd expect, I can see it happening on an emulated VM, every bpf_printk call corresponds to a lot of extra instructions that have to be emulated. Disable the flag. This reduces the debug info present in CI artifacts, but if tests fail, devs should be able to run the failing test locally with -DENABLE_BPF_PRINTK to get debug output. * Improve logging on test failure - Log stacktrace to stdout so logs are all serialized - Log banner warning that the trace file is always empty on CI - I can see this eating a day of a dev's time otherwise :P * Fix BPF tramp detection, add multi-kernel tests This logic should properly detect a lack of BPF trampoline support on x86 kernels that don't have it enabled (unlike the previous simple arch check). It's also incredibly subject to change in a new kernel release (what if taskstats_exit dissapears?), so add a multi-kernel test for feature probing. This should be nicely extensible to other features in the future. * Document unintuitive API * Add comment to TestFeaturesCorrect RE: x86 * Dockerize code formatting Makefile target Differences in clang-format versions were causing a headache here. clang-format 13 was fine with the code as-is, but clang-format 11 would make this change: diff --git a/non-GPL/TcLoader/TcLoader.c b/non-GPL/TcLoader/TcLoader.c index 9a8bc0c..1ed7c1d 100644 --- a/non-GPL/TcLoader/TcLoader.c +++ b/non-GPL/TcLoader/TcLoader.c @@ -371,10 +371,10 @@ static int netlink_qdisc(int cmd, unsigned int flags, const char *ifname) int rv = -1; struct rtnetlink_handle qdisc_rth = {.fd = -1}; struct netlink_msg qdisc_req = { - .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg)), - .n.nlmsg_flags = NLM_F_REQUEST | flags, - .n.nlmsg_type = cmd, - .t.tcm_family = AF_UNSPEC, + .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg)), + .n.nlmsg_flags = NLM_F_REQUEST | flags, + .n.nlmsg_type = cmd, + .t.tcm_family = AF_UNSPEC, }; if (!ifname) { This seems to be a result of the AlignConsecutiveAssignments directive in .clang-format (removing it makes the two versions agree). Dockerize the format step to cleanly solve the problem for all such cases in the future. * Fix incorrect docker tag s/_TAG/_PULL_TAG/g Should be using the remote instead of the local. * Fix clang-format not running in a container find was running in the container, clang-format was not due to how the command was being interpreted aargh. * Fix incorrect script args in comment Co-Authored-By: Florian Lehner <[email protected]> * Remove time estimate in README Co-Authored-By: Florian Lehner <[email protected]> * Cleanup run_tests.sh arguments -d really should have allowed users to pass individual images if they want something more specific tested. Also remove old information from usage text and update for new -k arg. * Add note about debootstrap to builder script Co-Authored-By: Florian Lehner <[email protected]> * Fix improper getopts usage An option cannot have > 1 argument. Just put the list kernel images at the end (no specific flag required). * Fix incorrect variable name * Fix faulty bash list logic * Change BPFTOOL_VERSION to LINUX_TOOLS_VERSION Co-Authored-By: Florian Lehner <[email protected]> * Add gen_initramfs.sh script, update debug docs * Greatly clean up bash scripts - Use readonly/local - Make conditionals more concise with || and && - Pull everything into functions, no naked code - Standardize on lowercase for local vars - Make command line arg format consistent - Rename run_single_test.sh to invoke_qemu.sh, pull out all test logic for better separation of concerns - Update README.md to suggest usage of invoke_qemu.sh instead of directly invoking QEMU - Add ~/go/bin to PATH in bash instead of workflow YAML - Greatly clean up GNU parallel invocation * Fix caching logic in GH actions workflow Co-Authored-By: Lovel Rishi <[email protected]> * Fix inspecific restore key We could have restored the set of aarch64 kernels in the x86 tests with the workflow as it was. * Remove reduntant apt-get update Co-Authored-By: Lovel Rishi <[email protected]> * Cleanup Makefile - Add tag-container target and cleanup / separate docker related vars - Remove DOCKER_IMG_UBUNUT_VERSION * Fix broken test-format target * Add section to README.md on userspace debugging Co-Authored-By: Florian Lehner <[email protected]> * Add missing arg to gen_initramfs.sh help * Remove KVM section from README.md We're not currently using it in CI, and all dev work on my end has been done without it. We can discuss enabling it if/when we start using a custom runner, but for now, remove the clutter. * Fixups to scripts/invoke_qemu.sh - Make KVM probing an option, turned off by default (it seems to break debugging on my machine, and provides no real benefit anyways nested in a VBox VM) - Only provide one -append flag to QEMU, it will ignore all but the last, causing the -d option to not work * Add missing -o to find invocation in Makefile Co-authored-by: Nicholas Berlin <[email protected]> Co-authored-by: Florian Lehner <[email protected]> Co-authored-by: Lovel Rishi <[email protected]>

rhysre added 8 commits June 28, 2022 19:22

Fix invalid enum relocation

08ed72d

Improve error logging on JSON unmarshal failure

30b9336

EventsTrace was going into a print "\n" loop on a misconfigured kernel, which was hard to diagnose with the current error logging.

Fix broken probe_set_features logic

d922958

See comment, libbpf_probe_bpf_prog_type will return true on both aarch64 and x86_64 as it checks for the ability to _load_, not the ability to _attach_, which is what will fail on aarch64.

Rework multi-kernel-tester and run in CI

8ffa182

- Move the core initramfs and init process logic to Bluebox - Set up the whole thing to run in CI with mainline kernels v5.11-v5.18 - Redo testing/README.md for new changes

Add debug build target

de17e6d

Clarify format / test-format targets, format code

70a8a37

Disable fail-fast in multikernel test action

8da3d76

Clarify and cleanup formatting/build CI workflow

e08ca4f

rhysre requested review from florianl, mmat11, nicholasberlin, stanek-michal and a team June 29, 2022 02:41

Dump contents of tracefs trace file on test fail

dbbfe32

This file contains the result of all bpf_prink's, which often encompasses a wealth of information when probes fail to do something.

rhysre force-pushed the rhysre/testing-infra-changes branch from b53a9a4 to dbbfe32 Compare June 29, 2022 03:17

mmat11 reviewed Jun 29, 2022

View reviewed changes

GPL/EventProbe/PathResolver.h Show resolved Hide resolved

nicholasberlin reviewed Jun 29, 2022

View reviewed changes

non-GPL/LibEbpfEvents/LibEbpfEvents.c Outdated Show resolved Hide resolved

non-GPL/LibEbpfEvents/LibEbpfEvents.c Show resolved Hide resolved

testing/README.md Outdated Show resolved Hide resolved

rhysre requested a review from a team June 29, 2022 16:34

rhysre and others added 5 commits June 29, 2022 11:34

Fix typo in README.md

12a88fc

Co-authored-by: Nicholas Berlin <[email protected]>

Fix test failures on Linux 5.11/aarch64

225e700

We were attempting to dereference a struct dentry into a struct dentry *, which resulted in garbage data. Add FUNC_ARG_READ_PTREGS_NODEREF to skip the dereference step and use it to fix the test failures.

Fix broken build-debug target in Makefile

9a34c68

Additionally, remove nonexistent .PHONY targets and simplify build-debug by removing _internal-build-debug target.

Improve logging on test failure

956a1c9

- Log stacktrace to stdout so logs are all serialized - Log banner warning that the trace file is always empty on CI - I can see this eating a day of a dev's time otherwise :P

florianl reviewed Jul 6, 2022

View reviewed changes

rhysre added 3 commits July 7, 2022 00:11

Document unintuitive API

c6ec037

Add comment to TestFeaturesCorrect RE: x86

06dd84f

lrishi reviewed Jul 7, 2022

View reviewed changes

.github/workflows/multikernel-tester.yml Outdated Show resolved Hide resolved

rhysre and others added 8 commits July 7, 2022 22:05

Fix improper getopts usage

46481d6

An option cannot have > 1 argument. Just put the list kernel images at the end (no specific flag required).

Fix incorrect variable name

006d4a3

Fix faulty bash list logic

48df98b

Change BPFTOOL_VERSION to LINUX_TOOLS_VERSION

45a6b6b

Co-Authored-By: Florian Lehner <[email protected]>

Add gen_initramfs.sh script, update debug docs

aef8f9e

Fix caching logic in GH actions workflow

c2294c9

Co-Authored-By: Lovel Rishi <[email protected]>

Fix inspecific restore key

74eefae

We could have restored the set of aarch64 kernels in the x86 tests with the workflow as it was.

lrishi reviewed Jul 9, 2022

View reviewed changes

docker/Dockerfile.builder Show resolved Hide resolved

lrishi reviewed Jul 10, 2022

View reviewed changes

Makefile Show resolved Hide resolved

rhysre and others added 7 commits July 11, 2022 17:45

Remove reduntant apt-get update

0bbb901

Co-Authored-By: Lovel Rishi <[email protected]>

Cleanup Makefile

6344e6c

- Add tag-container target and cleanup / separate docker related vars - Remove DOCKER_IMG_UBUNUT_VERSION

Fix broken test-format target

266e30a

Add section to README.md on userspace debugging

936b063

Co-Authored-By: Florian Lehner <[email protected]>

Add missing arg to gen_initramfs.sh help

7264703

Remove KVM section from README.md

21a043c

We're not currently using it in CI, and all dev work on my end has been done without it. We can discuss enabling it if/when we start using a custom runner, but for now, remove the clutter.

rhysre requested review from nicholasberlin, mmat11, lrishi and florianl July 14, 2022 17:27

Add missing -o to find invocation in Makefile

4e10c93

florianl approved these changes Jul 18, 2022

View reviewed changes

mmat11 reviewed Jul 18, 2022

View reviewed changes

GPL/EventProbe/PathResolver.h Show resolved Hide resolved

mmat11 approved these changes Jul 18, 2022

View reviewed changes

lrishi approved these changes Jul 18, 2022

View reviewed changes

lrishi mentioned this pull request Jul 19, 2022

Update comments in readme file #112

Open

lrishi merged commit 2ceb1ba into main Jul 19, 2022

lrishi deleted the rhysre/testing-infra-changes branch July 19, 2022 12:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revamp Testing Infrastructure and run Multi-Kernel Tests in CI #111

Revamp Testing Infrastructure and run Multi-Kernel Tests in CI #111

rhysre commented Jun 29, 2022 •

edited

Loading

nicholasberlin left a comment

florianl left a comment

lrishi left a comment

Revamp Testing Infrastructure and run Multi-Kernel Tests in CI #111

Revamp Testing Infrastructure and run Multi-Kernel Tests in CI #111

Conversation

rhysre commented Jun 29, 2022 • edited Loading

Background

Changes Made

CI

Bugs

nicholasberlin left a comment

Choose a reason for hiding this comment

florianl left a comment

Choose a reason for hiding this comment

lrishi left a comment

Choose a reason for hiding this comment

rhysre commented Jun 29, 2022 •

edited

Loading