Bug 2122788 - virtio-net TX stall after packet bursts (probably in qemu)
Summary: virtio-net TX stall after packet bursts (probably in qemu)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: qemu-kvm
Version: 9.2
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: rc
: ---
Assignee: Laurent Vivier
QA Contact: Quan Wenli
URL:
Whiteboard:
Depends On: 2135806
Blocks: 2101375
TreeView+ depends on / blocked
 
Reported: 2022-08-30 19:27 UTC by Stefano Brivio
Modified: 2023-05-09 07:44 UTC (History)
13 users (show)

Fixed In Version: qemu-kvm-7.2.0-1.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-05-09 07:20:04 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-132767 0 None None None 2022-08-30 19:32:55 UTC
Red Hat Product Errata RHSA-2023:2162 0 None None None 2023-05-09 07:20:28 UTC

Comment 1 Laurent Vivier 2022-09-08 10:43:00 UTC
I'm able to reproduce the problem on your system with your binary but if I build QEMU from master on your system I'm not able to reproduce the problem.

Could you provide your configure command line?

Comment 2 David Gibson 2022-09-09 01:55:32 UTC
(In reply to Laurent Vivier from comment #1)
> I'm able to reproduce the problem on your system with your binary but if I
> build QEMU from master on your system I'm not able to reproduce the problem.
> 
> Could you provide your configure command line?

I'm not sure about Stefano, but I've just been using my distro qemu, specifically qemu-system-x86-6.2.0-14.fc36.x86_64.  Maybe the bug has been fixed upstream already?

Comment 3 Stefano Brivio 2022-09-13 03:26:19 UTC
Sorry, I missed this comment:

(In reply to Laurent Vivier from comment #1)
> I'm able to reproduce the problem on your system with your binary but if I
> build QEMU from master on your system I'm not able to reproduce the problem.
> 
> Could you provide your configure command line?

It's: ../configure --target-list=x86_64-softmmu

based on 09ed077d7f plus the latest version of your AF_UNIX socket patchset.

Feature summary (from meson.log):

qemu 7.0.91

  Directories
    Install prefix               : /usr/local
    BIOS directory               : share/qemu
    firmware path                : share/qemu-firmware
    binary directory             : /usr/local/bin
    library directory            : /usr/local/lib/x86_64-linux-gnu
    module directory             : lib/x86_64-linux-gnu/qemu
    libexec directory            : /usr/local/libexec
    include directory            : /usr/local/include
    config directory             : /usr/local/etc
    local state directory        : /var/local
    Manual directory             : /usr/local/share/man
    Doc directory                : /usr/local/share/doc
    Build directory              : /home/sbrivio/qemu/build
    Source path                  : /home/sbrivio/qemu
    GIT submodules               : ui/keycodemapdb tests/fp/berkeley-testfloat-3 tests/fp/berkeley-softfloat-3 dtc slirp

  Host binaries
    git                          : git
    make                         : make
    python                       : /usr/bin/python3 (version: 3.10)
    sphinx-build                 : NO
    gdb                          : /usr/bin/gdb
    iasl                         : NO
    genisoimage                  : 
    smbd                         : /usr/sbin/smbd

  Configurable features
    Documentation                : NO
    system-mode emulation        : YES
    user-mode emulation          : NO
    block layer                  : YES
    Install blobs                : YES
    module support               : NO
    fuzzing support              : NO
    Audio drivers                : oss
    Trace backends               : log
    D-Bus display                : NO
    QOM debugging                : NO
    vhost-kernel support         : YES
    vhost-net support            : YES
    vhost-user support           : YES
    vhost-user-crypto support    : YES
    vhost-user-blk server support: YES
    vhost-vdpa support           : YES
    build guest agent            : YES

  Compilation
    host CPU                     : x86_64
    host endianness              : little
    C compiler                   : cc -m64 -mcx16
    Host C compiler              : cc -m64 -mcx16
    C++ compiler                 : c++ -m64 -mcx16
    CFLAGS                       : -O2 -g
    CXXFLAGS                     : -O2 -g
    QEMU_CFLAGS                  : -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -Wold-style-declaration -Wold-style-definition -Wtype-limits -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wempty-body -Wnested-externs -Wendif-labels -Wexpansion-to-defined -Wimplicit-fallthrough=2 -Wno-missing-include-dirs -Wno-shift-negative-value -Wno-psabi -fstack-protector-strong
    QEMU_CXXFLAGS                : -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wundef -Wwrite-strings -fno-strict-aliasing -fno-common -fwrapv -Wtype-limits -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wempty-body -Wendif-labels -Wexpansion-to-defined -Wimplicit-fallthrough=2 -Wno-missing-include-dirs -Wno-shift-negative-value -Wno-psabi -fstack-protector-strong
    QEMU_OBJCFLAGS               : -Wold-style-definition -Wtype-limits -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wempty-body -Wnested-externs -Wendif-labels -Wexpansion-to-defined -Wno-initializer-overrides -Wno-missing-include-dirs -Wno-shift-negative-value -Wno-string-plus-int -Wno-typedef-redefinition -Wno-tautological-type-limit-compare -Wno-psabi
    QEMU_LDFLAGS                 : -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -fstack-protector-strong
    profiler                     : NO
    link-time optimization (LTO) : NO
    PIE                          : YES
    static build                 : NO
    malloc trim support          : YES
    membarrier                   : NO
    debug stack usage            : NO
    mutex debugging              : NO
    memory allocator             : system
    avx2 optimization            : YES
    avx512f optimization         : NO
    gprof enabled                : NO
    gcov                         : NO
    thread sanitizer             : NO
    CFI support                  : NO
    strip binaries               : NO
    sparse                       : NO
    mingw32 support              : NO

  Cross compilers
    x86_64                       : cc

  Targets and accelerators
    KVM support                  : YES
    HAX support                  : NO
    HVF support                  : NO
    WHPX support                 : NO
    NVMM support                 : NO
    Xen support                  : NO
    TCG support                  : YES
    TCG backend                  : native (x86_64)
    TCG plugins                  : YES
    TCG debug enabled            : NO
    target list                  : x86_64-softmmu
    default devices              : YES
    out of process emulation     : YES
    vfio-user server             : NO

  Block layer support
    coroutine backend            : ucontext
    coroutine pool               : YES
    Block whitelist (rw)         : 
    Block whitelist (ro)         : 
    Use block whitelist in tools : NO
    VirtFS support               : YES
    build virtiofs daemon        : YES
    Live block migration         : YES
    replication support          : YES
    bochs support                : YES
    cloop support                : YES
    dmg support                  : YES
    qcow v1 support              : YES
    vdi support                  : YES
    vvfat support                : YES
    qed support                  : YES
    parallels support            : YES
    FUSE exports                 : NO
    VDUSE block exports          : YES

  Crypto
    TLS priority                 : NORMAL
    GNUTLS support               : NO
    libgcrypt                    : NO
    nettle                       : NO
    AF_ALG support               : NO
    rng-none                     : NO
    Linux keyring                : YES

  Dependencies
    SDL support                  : NO
    SDL image support            : NO
    GTK support                  : NO
    pixman                       : YES 0.40.0
    VTE support                  : NO
    slirp support                : YES 4.7.0
    libtasn1                     : NO
    PAM                          : NO
    iconv support                : YES
    curses support               : YES
    virgl support                : NO
    curl support                 : NO
    Multipath support            : NO
    PNG support                  : YES 1.6.37
    VNC support                  : YES
    VNC SASL support             : NO
    VNC JPEG support             : YES 2.1.2
    OSS support                  : YES
    ALSA support                 : YES 1.2.6.1
    PulseAudio support           : NO
    JACK support                 : NO
    brlapi support               : NO
    vde support                  : NO
    netmap support               : NO
    l2tpv3 support               : YES
    Linux AIO support            : NO
    Linux io_uring support       : NO
    ATTR/XATTR support           : YES
    RDMA support                 : NO
    PVRDMA support               : NO
    fdt support                  : internal
    libcap-ng support            : YES
    bpf support                  : NO
    spice protocol support       : NO
    rbd support                  : NO
    smartcard support            : NO
    U2F support                  : NO
    libusb                       : NO
    usb net redir                : NO
    OpenGL support (epoxy)       : NO
    GBM                          : NO
    libiscsi support             : NO
    libnfs support               : NO
    seccomp support              : YES 2.5.4
    GlusterFS support            : NO
    TPM support                  : YES
    libssh support               : NO
    lzo support                  : NO
    snappy support               : NO
    bzip2 support                : NO
    lzfse support                : NO
    zstd support                 : NO
    NUMA host support            : YES
    capstone                     : NO
    libpmem support              : NO
    libdaxctl support            : NO
    libudev                      : NO
    FUSE lseek                   : NO
    selinux                      : YES 3.3

  Subprojects
    libvduse                     : YES
    libvhost-user                : YES

  User defined options
    Native files                 : /home/sbrivio/qemu/build/config-meson.cross
    backend                      : ninja
    prefix                       : /usr/local
    werror                       : true

Note that there are two binaries installed on my system: one if from distribution package (based on 7.0.0) in /usr/bin, the other one I described above is in /usr/local/bin (taking precedence).

Would it make sense that I try and rebase the version under /usr/local/bin/ to latest upstream?

Comment 4 Laurent Vivier 2022-09-13 15:46:43 UTC
From a previous analysis (Stefano?):

> virtio_net_tx_bh() calls virtio_net_flush_tx()
> 
> - if qemu_sendv_packet_async() returns 0 (where returning 0
>   is a summary for a number of conditions...), which for some reason
>   happens less frequently with a higher tx_burst, virtio_net_flush_tx()
>   returns -EBUSY
> 
> - in that case, virtio_net_tx_bh() doesn't reschedule. Sure, as the
>   comment says, notifications are re-enabled by the callback for
>   qemu_sendv_packet_async(), i.e. virtio_net_tx_complete(), which
>   however doesn't take care of rescheduling
> 
> - ...not even if we were in the middle of a "burst",
>   virtio_net_tx_complete() does its job and calls virtio_net_flush_tx()
>   one last time, and that's it. Perhaps it's fine in most cases, I'm
>   not sure why I don't hit the stall reliably on -EBUSY from
>   virtio_net_flush_tx().

In the case of net/socket.c (net/stream.c with my patch series) the only reason qemu_sendv_packet_async() sends back 0 is when net_socket_receive() (net_stream_receive()) cannot send the full buffer.

A quick hack (with stream.c) that retries to send the buffer rather than to return 0 seems to fix the problem:

diff --git a/net/stream.c b/net/stream.c
index a84295f209..3ab2966bb9 100644
--- a/net/stream.c
+++ b/net/stream.c
@@ -85,7 +85,7 @@ static ssize_t net_stream_receive(NetClientState *nc, const uint8_t *buf,
     size_t remaining;
     ssize_t ret;
 
-
+again:
     remaining = iov_size(iov, 2) - s->send_index;
     nlocal_iov = iov_copy(local_iov, 2, iov, 2, s->send_index, remaining);
     ret = qio_channel_writev(s->ioc, local_iov, nlocal_iov, NULL);
@@ -98,6 +98,7 @@ static ssize_t net_stream_receive(NetClientState *nc, const uint8_t *buf,
     }
     if (ret < (ssize_t)remaining) {
         s->send_index += ret;
+        goto again;
         s->ioc_write_tag = qio_channel_add_watch(s->ioc, G_IO_OUT,
                                                  net_stream_writable, s, NULL);
         return 0;

I'm investigating why net_socket_writable() (net_stream_writable()) doesn't send the remaining data of the buffer.

Comment 5 Laurent Vivier 2022-09-20 07:30:22 UTC
Setting vectors parameter of virtio-net device to 1 seems to be another workaround:

  ... -device virtio-net-pci,vectors=1,...

Comment 6 Stefano Brivio 2022-09-21 17:40:55 UTC
(In reply to Laurent Vivier from comment #5)
> Setting vectors parameter of virtio-net device to 1 seems to be another
> workaround:
> 
>   ... -device virtio-net-pci,vectors=1,...

I enthusiastically tried that to replace the current workaround in passt's testsuite, but I just hit the issue (just once) even with that. It doesn't happen as frequently, though, it might be a different issue, and I didn't have means to collect more information about it on the spot.

Comment 7 Chao Yang 2022-09-23 10:21:36 UTC
Wenli

Are you able to reproduce in QE's lab?

Comment 8 Quan Wenli 2022-09-26 06:06:16 UTC
(In reply to Chao Yang from comment #7)
> Wenli
> 
> Are you able to reproduce in QE's lab?

I tried 10 times with 9.2 downstream version qemu-kvm-7.1.0-1.el9.x86_64/5.14.0-70.13.1.el9_0.x86_64(host), can not reproduce issue.

Comment 9 Stefano Brivio 2022-11-10 12:20:34 UTC
This is now fixed in upstream qemu by:

commit df8d07081718c29d04d106583d9c300128686cda
Author: Laurent Vivier <lvivier>
Date:   Thu Oct 20 11:58:45 2022 +0200

    virtio-net: fix bottom-half packet TX on asynchronous completion

and it will make its way into RHEL 9.2 together with the planned rebase to qemu 7.2: https://bugzilla.redhat.com/show_bug.cgi?id=2135806. Setting this ticket as blocked by that one.

Comment 10 John Ferlan 2022-11-27 13:15:37 UTC
Updating the rest of the fields necessary for inclusion by rebase

Comment 13 Yanan Fu 2022-12-20 09:18:34 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 16 Quan Wenli 2022-12-28 09:25:56 UTC
Test with qemu-kvm-7.2.0-1.el9 10 times with steps from comment #0, the iperf3 works well and perforamnce looks good. so set it to VERIFIED. 


[root@dell-per440-18 ~]# iperf3 -c ${gw} -Z -P 5 -l 1M -i1 -t30 -O5 --pacing-timer 100000
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-30.00  sec  13.8 GBytes  3.94 Gbits/sec    0             sender
[  5]   0.00-30.01  sec  13.8 GBytes  3.94 Gbits/sec                  receiver
[  7]   0.00-30.00  sec  13.7 GBytes  3.93 Gbits/sec    0             sender
[  7]   0.00-30.01  sec  13.7 GBytes  3.92 Gbits/sec                  receiver
[  9]   0.00-30.00  sec  13.7 GBytes  3.91 Gbits/sec    0             sender
[  9]   0.00-30.01  sec  13.7 GBytes  3.91 Gbits/sec                  receiver
[ 11]   0.00-30.00  sec  13.6 GBytes  3.90 Gbits/sec    0             sender
[ 11]   0.00-30.01  sec  13.6 GBytes  3.90 Gbits/sec                  receiver
[ 13]   0.00-30.00  sec  13.3 GBytes  3.81 Gbits/sec    0             sender
[ 13]   0.00-30.01  sec  13.3 GBytes  3.81 Gbits/sec                  receiver
[SUM]   0.00-30.00  sec  68.0 GBytes  19.5 Gbits/sec    0             sender
[SUM]   0.00-30.01  sec  68.1 GBytes  19.5 Gbits/sec                  receiver

Comment 20 errata-xmlrpc 2023-05-09 07:20:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:2162


Note You need to log in before you can comment on or make changes to this bug.