Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deadlock with tap/tun devices for arm guests #1232

Draft
wants to merge 5 commits into
base: dev
Choose a base branch
from
Draft

Commits on Oct 20, 2022

  1. tap: set vhostfd passed from qemu cli to non-blocking

    A guest boot hangs while probing the network interface when
    iommu_platform=on is used.
    
    The following qemu cli hangs without this patch:
    
    # $QEMU \
      -netdev tap,fd=3,id=hostnet0,vhost=on,vhostfd=4 3<>/dev/tap67 4<>/dev/host-net \
      -device virtio-net-pci,netdev=hostnet0,id=net0,iommu_platform=on,disable-legacy=on \
      ...
    
    Commit: c471ad0 (vhost_net: device IOTLB support) took care of
    setting vhostfd to non-blocking when QEMU opens /dev/host-net but if
    the fd is passed from qemu cli then we need to ensure that fd is set
    to non-blocking.
    
    Fixes: c471ad0 ("vhost_net: device IOTLB support")
    Cc: [email protected]
    Cc: Michael S. Tsirkin <[email protected]>
    Cc: Jason Wang <[email protected]>
    Signed-off-by: Brijesh Singh <[email protected]>
    Signed-off-by: Jason Wang <[email protected]>
    codomania authored and AndrewFasano committed Oct 20, 2022
    Configuration menu
    Copy the full SHA
    732bcce View commit details
    Browse the repository at this point in the history
  2. net: detect errors from probing vnet hdr flag for TAP devices

    When QEMU sets up a tap based network device backend, it mostly ignores errors
    reported from various ioctl() calls it makes, assuming the TAP file descriptor
    is valid. This assumption can easily be violated when the user is passing in a
    pre-opened file descriptor. At best, the ioctls may fail with a -EBADF, but if
    the user passes in a bogus FD number that happens to clash with a FD number that
    QEMU has opened internally for another reason, a wide variety of errnos may
    result, as the TUNGETIFF ioctl number may map to a completely different command
    on a different type of file.
    
    By ignoring all these errors, QEMU sets up a zombie network backend that will
    never pass any data. Even worse, when QEMU shuts down, or that network backend
    is hot-removed, it will close this bogus file descriptor, which could belong to
    another QEMU device backend.
    
    There's no obvious guaranteed reliable way to detect that a FD genuinely is a
    TAP device, as opposed to a UNIX socket, or pipe, or something else. Checking
    the errno from probing vnet hdr flag though, does catch the big common cases.
    ie calling TUNGETIFF will return EBADF for an invalid FD, and ENOTTY when FD is
    a UNIX socket, or pipe which catches accidental collisions with FDs used for
    stdio, or monitor socket.
    
    Previously the example below where bogus fd 9 collides with the FD used for the
    chardev saw:
    
    $ ./x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hostnet0,fd=9 \
      -chardev socket,id=charchannel0,path=/tmp/qga,server,nowait \
      -monitor stdio -vnc :0
    qemu-system-x86_64: -netdev tap,id=hostnet0,fd=9: TUNGETIFF ioctl() failed: Inappropriate ioctl for device
    TUNSETOFFLOAD ioctl() failed: Bad address
    QEMU 2.9.1 monitor - type 'help' for more information
    (qemu) Warning: netdev hostnet0 has no peer
    
    which gives a running QEMU with a zombie network backend.
    
    With this change applied we get an error message and QEMU immediately exits
    before carrying on and making a bigger disaster:
    
    $ ./x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hostnet0,fd=9 \
      -chardev socket,id=charchannel0,path=/tmp/qga,server,nowait \
      -monitor stdio -vnc :0
    qemu-system-x86_64: -netdev tap,id=hostnet0,vhost=on,fd=9: Unable to query TUNGETIFF on FD 9: Inappropriate ioctl for device
    
    Reported-by: Dr. David Alan Gilbert <[email protected]>
    Signed-off-by: Daniel P. Berrange <[email protected]>
    Tested-by: Dr. David Alan Gilbert <[email protected]>
    Message-id: [email protected]
    [lv: to simplify, don't check on EINVAL with TUNGETIFF as it exists since v2.6.27]
    Signed-off-by: Laurent Vivier <[email protected]>
    Signed-off-by: Jason Wang <[email protected]>
    berrange authored and AndrewFasano committed Oct 20, 2022
    Configuration menu
    Copy the full SHA
    42ef16b View commit details
    Browse the repository at this point in the history
  3. async: use explicit memory barriers

    When using C11 atomics, non-seqcst reads and writes do not participate
    in the total order of seqcst operations.  In util/async.c and util/aio-posix.c,
    in particular, the pattern that we use
    
              write ctx->notify_me                 write bh->scheduled
              read bh->scheduled                   read ctx->notify_me
              if !bh->scheduled, sleep             if ctx->notify_me, notify
    
    needs to use seqcst operations for both the write and the read.  In
    general this is something that we do not want, because there can be
    many sources that are polled in addition to bottom halves.  The
    alternative is to place a seqcst memory barrier between the write
    and the read.  This also comes with a disadvantage, in that the
    memory barrier is implicit on strongly-ordered architectures and
    it wastes a few dozen clock cycles.
    
    Fortunately, ctx->notify_me is never written concurrently by two
    threads, so we can assert that and relax the writes to ctx->notify_me.
    The resulting solution works and performs well on both aarch64 and x86.
    
    Note that the atomic_set/atomic_read combination is not an atomic
    read-modify-write, and therefore it is even weaker than C11 ATOMIC_RELAXED;
    on x86, ATOMIC_RELAXED compiles to a locked operation.
    
    Analyzed-by: Ying Fang <[email protected]>
    Signed-off-by: Paolo Bonzini <[email protected]>
    Tested-by: Ying Fang <[email protected]>
    Message-Id: <[email protected]>
    Signed-off-by: Stefan Hajnoczi <[email protected]>
    bonzini authored and AndrewFasano committed Oct 20, 2022
    Configuration menu
    Copy the full SHA
    9b3da77 View commit details
    Browse the repository at this point in the history
  4. bugfix: restore iothread mutex locks and unlocks

    These iothread mutex locks/unlocks were dropped somewhere
    between qemu 2.9.1 and our current codebase. Assuming they
    don't break record/replay, they probalby should still be here.
    
    Without these, PANDA would deadlock about 75% of the time with
    the vexpress-a9 ARM guest machine when using -netdev tap and
    virtio networking.
    AndrewFasano committed Oct 20, 2022
    Configuration menu
    Copy the full SHA
    ebab9de View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    b5aad15 View commit details
    Browse the repository at this point in the history