Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deadlock with tap/tun devices for arm guests #1232

Draft
wants to merge 5 commits into
base: dev
Choose a base branch
from
Draft

Conversation

AndrewFasano
Copy link
Member

No description provided.

codomania and others added 4 commits October 20, 2022 08:33
A guest boot hangs while probing the network interface when
iommu_platform=on is used.

The following qemu cli hangs without this patch:

# $QEMU \
  -netdev tap,fd=3,id=hostnet0,vhost=on,vhostfd=4 3<>/dev/tap67 4<>/dev/host-net \
  -device virtio-net-pci,netdev=hostnet0,id=net0,iommu_platform=on,disable-legacy=on \
  ...

Commit: c471ad0 (vhost_net: device IOTLB support) took care of
setting vhostfd to non-blocking when QEMU opens /dev/host-net but if
the fd is passed from qemu cli then we need to ensure that fd is set
to non-blocking.

Fixes: c471ad0 ("vhost_net: device IOTLB support")
Cc: [email protected]
Cc: Michael S. Tsirkin <[email protected]>
Cc: Jason Wang <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
When QEMU sets up a tap based network device backend, it mostly ignores errors
reported from various ioctl() calls it makes, assuming the TAP file descriptor
is valid. This assumption can easily be violated when the user is passing in a
pre-opened file descriptor. At best, the ioctls may fail with a -EBADF, but if
the user passes in a bogus FD number that happens to clash with a FD number that
QEMU has opened internally for another reason, a wide variety of errnos may
result, as the TUNGETIFF ioctl number may map to a completely different command
on a different type of file.

By ignoring all these errors, QEMU sets up a zombie network backend that will
never pass any data. Even worse, when QEMU shuts down, or that network backend
is hot-removed, it will close this bogus file descriptor, which could belong to
another QEMU device backend.

There's no obvious guaranteed reliable way to detect that a FD genuinely is a
TAP device, as opposed to a UNIX socket, or pipe, or something else. Checking
the errno from probing vnet hdr flag though, does catch the big common cases.
ie calling TUNGETIFF will return EBADF for an invalid FD, and ENOTTY when FD is
a UNIX socket, or pipe which catches accidental collisions with FDs used for
stdio, or monitor socket.

Previously the example below where bogus fd 9 collides with the FD used for the
chardev saw:

$ ./x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hostnet0,fd=9 \
  -chardev socket,id=charchannel0,path=/tmp/qga,server,nowait \
  -monitor stdio -vnc :0
qemu-system-x86_64: -netdev tap,id=hostnet0,fd=9: TUNGETIFF ioctl() failed: Inappropriate ioctl for device
TUNSETOFFLOAD ioctl() failed: Bad address
QEMU 2.9.1 monitor - type 'help' for more information
(qemu) Warning: netdev hostnet0 has no peer

which gives a running QEMU with a zombie network backend.

With this change applied we get an error message and QEMU immediately exits
before carrying on and making a bigger disaster:

$ ./x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hostnet0,fd=9 \
  -chardev socket,id=charchannel0,path=/tmp/qga,server,nowait \
  -monitor stdio -vnc :0
qemu-system-x86_64: -netdev tap,id=hostnet0,vhost=on,fd=9: Unable to query TUNGETIFF on FD 9: Inappropriate ioctl for device

Reported-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Daniel P. Berrange <[email protected]>
Tested-by: Dr. David Alan Gilbert <[email protected]>
Message-id: [email protected]
[lv: to simplify, don't check on EINVAL with TUNGETIFF as it exists since v2.6.27]
Signed-off-by: Laurent Vivier <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
When using C11 atomics, non-seqcst reads and writes do not participate
in the total order of seqcst operations.  In util/async.c and util/aio-posix.c,
in particular, the pattern that we use

          write ctx->notify_me                 write bh->scheduled
          read bh->scheduled                   read ctx->notify_me
          if !bh->scheduled, sleep             if ctx->notify_me, notify

needs to use seqcst operations for both the write and the read.  In
general this is something that we do not want, because there can be
many sources that are polled in addition to bottom halves.  The
alternative is to place a seqcst memory barrier between the write
and the read.  This also comes with a disadvantage, in that the
memory barrier is implicit on strongly-ordered architectures and
it wastes a few dozen clock cycles.

Fortunately, ctx->notify_me is never written concurrently by two
threads, so we can assert that and relax the writes to ctx->notify_me.
The resulting solution works and performs well on both aarch64 and x86.

Note that the atomic_set/atomic_read combination is not an atomic
read-modify-write, and therefore it is even weaker than C11 ATOMIC_RELAXED;
on x86, ATOMIC_RELAXED compiles to a locked operation.

Analyzed-by: Ying Fang <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Tested-by: Ying Fang <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>
These iothread mutex locks/unlocks were dropped somewhere
between qemu 2.9.1 and our current codebase. Assuming they
don't break record/replay, they probalby should still be here.

Without these, PANDA would deadlock about 75% of the time with
the vexpress-a9 ARM guest machine when using -netdev tap and
virtio networking.
@zestrada
Copy link
Member

Can confirm this works for the target where we first saw the bug. There appears to be an issue with the barebox minimal test case (with or without tap device):

/out/panda-taphang/build/arm-softmmu/panda-system-arm -M vexpress-a9 -nographic -no-reboot -monitor telnet::4443,server,nowait -kernel ./images/barebox-vexpress-ca9.img -m 1024m             
audio: Could not init `oss' audio driver                                         

barebox 2022.10.0 #1 Wed Oct 19 23:03:50 UTC 2022


Board: V2P-CA9
panda-system-arm: /out/panda-taphang/memory.c:918: memory_region_transaction_commit: Assertion `qemu_mutex_iothread_locked()' failed.
Aborted (core dumped)

@zestrada
Copy link
Member

Just a quick comment in case others run into this issue. The systems that were deadlocking were Linux kernels that somehow entered a race condition in QEMU during guest execution of the calibrate_delay kernel function. A hack to get around this has been to specify a loops per jiffy from a successful boot (e.g., without a tap device) on the kernel command line (e.g., lpj=43648). Your mileage may vary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PANDA hangs using network tap with QEMU vexpress-a9 ARM board
5 participants