Description of problem: If a guest is transmitting data trough a vhost-user-backed interface when the backend is suddenly closed, qemu enters an infinite loop potentially freezing the guest. Version-Release number of selected component (if applicable): I've reproduced this issue with AV8.0 (qemu-kvm-3.1.0-20.module+el8+2888+cdc893a8) but I'd guess the problem can still be reproduced in more versions. The function where it gets stuck is virtqueue_drop_all(). How reproducible: Always Steps to Reproduce: 1. Run a guest VM with a vhost-user netdev in server mode: An example libvirt xml section: <interface type='vhostuser'> <mac address='52:54:00:e6:da:91'/> <source type='unix' path='/tmp/vhost-user1' mode='server'/> <model type='virtio'/> <driver name='vhost' rx_queue_size='1024'/> <address type='pci' domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/> </interface> 2. Run testpmd in the host and set fwd to rxonly: $ testpmd -l 0,20,21,22,23 --socket-mem=1024 -n 4 --vdev 'net_vhost0,iface=/tmp/vhost-user1,client=1' --vdev 'net_vhost1,iface=/tmp/vhost-user2,client=1' --no-pci -- --rxq=1 --txq=1 --portmask=f -a --forward-mode=rxonly --nb-cores=4 3. In the guest, run testpmd in txmode: testpmd -l 0,1 \ --socket-mem 1024 \ -n 2 \ -- \ --portmask=3 \ -i testpmd> set fwd txonly testpmd> start 4. Check that packets are flowing testpmd> show port stats all 5. Kill the host's testpmd Actual results: Qemu's IO thread freezes Expected results: Qemu's IO thread should not freeze should not freeze Additional info: An example backtrace of the inifinite loop is: #0 0x000055d888ca1e69 in virtqueue_get_head () #1 0x000055d888ca1f82 in virtqueue_drop_all () #2 0x000055d888c89884 in virtio_net_drop_tx_queue_data () #3 0x000055d888e13669 in virtio_bus_cleanup_host_notifier () #4 0x000055d888ca64d6 in vhost_dev_disable_notifiers () #5 0x000055d888c8d5d2 in vhost_net_stop_one () #6 0x000055d888c8dad2 in vhost_net_stop () #7 0x000055d888c8a5b4 in virtio_net_set_status () #8 0x000055d888e33dc6 in qmp_set_link () #9 0x000055d888e394aa in chr_closed_bh () #10 0x000055d888f3c066 in aio_bh_poll () #11 0x000055d888f3f394 in aio_dispatch () #12 0x000055d888f3bf42 in aio_ctx_dispatch () #13 0x00007f84b7c6b67d in g_main_dispatch (context=0x55d88abd1c70) at gmain.c:3176 #14 g_main_context_dispatch (context=0x55d88abd1c70) at gmain.c:3829 #15 0x000055d888f3e618 in main_loop_wait () #16 0x000055d888d314f9 in main_loop () #17 0x000055d888bf19c4 in main ()
This issue was detected while testing BZ 1738768
top perf yields: Samples: 391K of event 'cycles:ppp', 4000 Hz, Event count (approx.): 117702907563 lost: 0/0 drop: 0/0gc Children Self Shared Object - 93.23% 65.50% qemu-system-x86_64 - 7.05% 0x2af0970 g_main_context_dispatch aio_ctx_dispatch aio_dispatch aio_dispatch_handlers virtio_queue_host_notifier_read virtio_queue_notify_vq virtio_net_handle_tx_bh - virtio_net_drop_tx_queue_data - 13.68% virtqueue_drop_all - 25.91% virtqueue_push - 9.96% virtqueue_fill + 5.31% vring_used_write 2.03% virtqueue_unmap_sggc + 1.35% trace_virtqueue_fillgc - 8.85% virtqueue_flushgc - 7.37% vring_used_idx_setgc - 3.09% address_space_cache_invalidategc + 2.54% invalidate_and_set_dirtygc - 1.59% virtio_stw_phys_cachedgc + 2.03% stw_le_phys_cachedgc 0.95% vring_get_region_cachesgc 8.58% rcu_read_unlockgc 4.43% rcu_read_lockgc - 4.61% virtqueue_get_headgc - 1.85% vring_avail_ringgc - 1.31% virtio_lduw_phys_cachedgc - 1.39% lduw_le_phys_cachedgc - 1.23% address_space_lduw_le_cachedgc - 0.95% lduw_le_pgc 2.01% virtio_queue_emptygc + 4.20% qemu_thread_startgc So I guess: - it's taking a lot to drop all the packets (maybe because the guest is still writing to the queue) - a lot of guest notifications are being received +Jason
Updating with some findings. It's not an infinite loop (although it seems that way), it's more an event flood. The issue that IMHO could be fixed is that notifications are left enabled when the backend closes, I'll look more deeply into that. On the other hand, the question remains: why would testpmd keep sending notifications (i.e: trying to transmit) when the link has gone down. So, the scope of this issue I guess is also limited by a having a misbehaving guest.
Hi Adrián, 8.1.1 AV and 8.2.0 AV both hit this issue. Do you plan to fix this on 8.1.1? If yes, I'll clone a new one to rhel8.2. Thanks. Best regards, Pei
Hi Pei. Yes, we'll need a bz on 8.1.1 as well. Thanks
(In reply to Adrián Moreno from comment #6) > Hi Pei. Yes, we'll need a bz on 8.1.1 as well. > Thanks Thanks Adrián. I've cloned this BZ to RHEL8.2-AV. Bug 1790360 - qemu-kvm: event flood when vhost-user backed virtio netdev is unexpectedly closed while guest is transmitting