Bug 2002686

Summary: hot unplug nic can not unplug the nic device successfully
Product: Red Hat Enterprise Linux 9 Reporter: leidwang <leidwang>
Component: qemu-kvmAssignee: Virtualization Maintenance <virt-maint>
qemu-kvm sub component: Networking QA Contact: Lei Yang <leiyang>
Status: CLOSED DUPLICATE Docs Contact:
Severity: high    
Priority: high CC: jinzhao, juzhang, leiyang, lijin, qizhu, virt-maint, yanghliu, ybendito, yfu, yvugenfi
Version: 9.0Keywords: Triaged
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-03-02 05:38:11 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description leidwang@redhat.com 2021-09-09 13:38:37 UTC
Description of problem:

hot unplug nic can not unplug the nic device successfully(Windows),this issue can only be reproduced when running a loop by automation.

Version-Release number of selected component (if applicable):
'kvm_version': '5.14.0-1.el9.x86_64'
'qemu_version': 'qemu-kvm-core-6.1.0-1.el9.x86_64'
virtio-win-prewhql-0.1-207.iso

How reproducible:
1/5

Steps to Reproduce:
1.boot up a guest
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1'  \
-sandbox on  \
-machine q35,memory-backend=mem-machine_mem \
-device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
-device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
-nodefaults \
-device VGA,bus=pcie.0,addr=0x2 \
-device i6300esb,bus=pcie-pci-bridge-0,addr=0x1 \
-watchdog-action reset \
-m 30720 \
-object memory-backend-ram,size=30720M,id=mem-machine_mem  \
-smp 20,maxcpus=20,cores=10,threads=1,dies=1,sockets=2  \
-cpu 'Cascadelake-Server-noTSX',hv_stimer,hv_synic,hv_vpindex,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_frequencies,hv_runtime,hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,+kvm_pv_unhalt \
-device intel-hda,bus=pcie-pci-bridge-0,addr=0x2 \
-device hda-duplex \
-chardev socket,server=on,wait=off,path=/tmp/avocado_r9m7yx5x/monitor-qmpmonitor1-20210903-185410-Y0bYPe9H,id=qmp_id_qmpmonitor1  \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-chardev socket,server=on,wait=off,path=/tmp/avocado_r9m7yx5x/monitor-catch_monitor-20210903-185410-Y0bYPe9H,id=qmp_id_catch_monitor  \
-mon chardev=qmp_id_catch_monitor,mode=control \
-device pvpanic,ioport=0x505,id=idpxqPBw \
-chardev socket,server=on,wait=off,path=/tmp/avocado_r9m7yx5x/serial-serial0-20210903-185410-Y0bYPe9H,id=chardev_serial0 \
-device isa-serial,id=serial0,chardev=chardev_serial0 \
-object rng-random,filename=/dev/random,id=passthrough-EK1tyvLX \
-device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
-device virtio-rng-pci,id=virtio-rng-pci-AH3whJxz,rng=passthrough-EK1tyvLX,bus=pcie-root-port-1,addr=0x0  \
-chardev socket,id=seabioslog_id_20210903-185410-Y0bYPe9H,path=/tmp/avocado_r9m7yx5x/seabios-20210903-185410-Y0bYPe9H,server=on,wait=off \
-device isa-debugcon,chardev=seabioslog_id_20210903-185410-Y0bYPe9H,iobase=0x402 \
-device ich9-usb-ehci1,id=usb1,addr=0x1d.0x7,multifunction=on,bus=pcie.0 \
-device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=0x1d.0x0,firstport=0,bus=pcie.0 \
-device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=0x1d.0x2,firstport=2,bus=pcie.0 \
-device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=0x1d.0x4,firstport=4,bus=pcie.0 \
-device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
-device qemu-xhci,id=usb2,bus=pcie-root-port-2,addr=0x0 \
-device usb-tablet,id=usb-tablet1,bus=usb2.0,port=1 \
-device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-3,addr=0x0 \
-blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/win2022-64-virtio-scsi_avocado-vt-vm1.qcow2,cache.direct=on,cache.no-flush=off \
-blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
-device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
-blockdev node-name=file_cd1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/iso/windows/winutils.iso,cache.direct=on,cache.no-flush=off \
-blockdev node-name=drive_cd1,driver=raw,read-only=on,cache.direct=on,cache.no-flush=off,file=file_cd1 \
-device scsi-cd,id=cd1,drive=drive_cd1,write-cache=on  \
-vnc :0  \
-rtc base=localtime,clock=host,driftfix=slew  \
-boot menu=off,order=cdn,once=c,strict=off \
-net none \
-no-hpet \
-enable-kvm \
-device pcie-root-port,id=pcie-root-port-4,port=0x4,addr=0x1.0x4,bus=pcie.0,chassis=5 \
-device virtio-balloon-pci,id=balloon0,bus=pcie-root-port-4,addr=0x0 \
-device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=6 \
-device pcie-root-port,id=pcie_extra_root_port_1,addr=0x3.0x1,bus=pcie.0,chassis=7 \
-device pcie-root-port,id=pcie_extra_root_port_2,addr=0x3.0x2,bus=pcie.0,chassis=8 \
-device pcie-root-port,id=pcie_extra_root_port_3,addr=0x3.0x3,bus=pcie.0,chassis=9

2.hotplug a nic to guest
{'execute': 'netdev_add', 'arguments': {'type': 'tap', 'id': 'idZfngmG', 'fd': '91', 'vhost': True}, 'id': 'dGUZvJ53'}
{'execute': 'device_add', 'arguments': OrderedDict([('id', 'idLYu96K'), ('driver', 'virtio-net-pci'), ('netdev', 'idZfngmG'), ('mac', '9a:be:8b:7a:3b:03'), ('bus', 'pcie_extra_root_port_0'), ('addr', '0x0')]), 'id': 'zb64omOe'}

3.Got the ip address of new nic and ping guest's new ip from host
4.Pause vm and resume vm
5.Ping guest's new ip from host
6.Unplug the nic from guest
{'execute': 'device_del', 'arguments': {'id': 'idLYu96K'}, 'id': 'dZGdQSCA'}
7.Check if the nic is unpluged successfully

Actual results:
Device idLYu96K is not unplugged by guest

Expected results:
Device idLYu96K is unplugged by guest
Additional info:

Comment 3 Lei Yang 2022-10-14 03:26:53 UTC
Hit same issue on rhel.8.8 with network function testing.

Test Version:
kernel-4.18.0-430.el8.x86_64
qemu-kvm-6.2.0-22.module+el8.8.0+16816+1d3555ec.x86_64
virtio-win-prewhql-0.1-227.iso

Comment 4 Lei Yang 2022-10-31 03:14:44 UTC
Hit same issue

Test Version:
kernel-5.14.0-179.el9.x86_64
qemu-kvm-7.1.0-3.el9.x86_64
libvirt-8.8.0-1.el9.x86_64
swtpm-0.7.0-3.20211109gitb79fd91.el9.x86_64
edk2-ovmf-20220826gitba0e0e4c6a-1.el9.noarch

Comment 16 ybendito 2023-02-11 13:01:42 UTC
Note that the problem happens only under avocado in hot-plug flow and is not reproducible with manual execution. This is due to inconsistency of TAP creation in avocado when the network adapter is attached from the beginning (in this case the avocado creates it with vnet_hdr option) and when the network adapter is hot-plugged (in this case the avocado creates it with vnet_hdr option), see
https://github.com/avocado-framework/avocado-vt/blob/master/virttest/qemu_vm.py#L4232
https://github.com/avocado-framework/avocado-vt/blob/master/virttest/qemu_vm.py#L2845

For example, the libvirt for virtio-net always creates the TAP with vnet_hdr
https://github.com/libvirt/libvirt/blob/master/src/qemu/qemu_interface.c#L435
https://github.com/libvirt/libvirt/blob/master/src/qemu/qemu_interface.c#L240

Qemu when it runs from the command line also by default creates the TAP with vnet_hdr.
So, if I'm not mistaken: the mainstream flow always uses TAP with vnet_hdr, although the qemu has an option there is the option to create it without vnet_hdr.

However, even if the TAP was created without vnet_hdr, the networking should work (some optional features of virtio-net are disabled) and there is no explanation yet why in the tests it sometimes works and sometimes does not. IMO, this lowers the priority on this bug but does not justify the erroneous behavior of the network stack.

My suggestions (to be discussed):
1. Add printout when the TAP is created to make visible in the logs whether vnet_hdr was specified at time of creation, i.e. here
https://github.com/avocado-framework/avocado-vt/blob/master/virttest/utils_net.py#L1325
2. Make the default mode of vnet_header to be True (at least for virtio-net) in all the tests
3. Add the parameter that allows to redefine vnet_hdr as False for testing purposes
4. Create additional test that will verify the network functions also with vnet_hdr=no with and without vhost, such a test will reproduce the problem (vhost + no vnet_hdr) with Windows VM and probably on Linux VM as well
5. Open a separate BZ according to the results of the test

Comment 17 ybendito 2023-02-11 13:05:57 UTC
(In reply to ybendito from comment #16)
> Note that the problem happens only under avocado in hot-plug flow and is not
> reproducible with manual execution. This is due to inconsistency of TAP
> creation in avocado when the network adapter is attached from the beginning
> (in this case the avocado creates it with vnet_hdr option) and when the
> network adapter is hot-plugged (in this case the avocado creates it with
> vnet_hdr option), see

Typo, the last one is WITHOUT vnet_hdr

> https://github.com/avocado-framework/avocado-vt/blob/master/virttest/qemu_vm.
> py#L4232
> https://github.com/avocado-framework/avocado-vt/blob/master/virttest/qemu_vm.
> py#L2845
> 
> For example, the libvirt for virtio-net always creates the TAP with vnet_hdr
> https://github.com/libvirt/libvirt/blob/master/src/qemu/qemu_interface.c#L435
> https://github.com/libvirt/libvirt/blob/master/src/qemu/qemu_interface.c#L240
> 
> Qemu when it runs from the command line also by default creates the TAP with
> vnet_hdr.
> So, if I'm not mistaken: the mainstream flow always uses TAP with vnet_hdr,
> although the qemu has an option there is the option to create it without
> vnet_hdr.
> 
> However, even if the TAP was created without vnet_hdr, the networking should
> work (some optional features of virtio-net are disabled) and there is no
> explanation yet why in the tests it sometimes works and sometimes does not.
> IMO, this lowers the priority on this bug but does not justify the erroneous
> behavior of the network stack.
> 
> My suggestions (to be discussed):
> 1. Add printout when the TAP is created to make visible in the logs whether
> vnet_hdr was specified at time of creation, i.e. here
> https://github.com/avocado-framework/avocado-vt/blob/master/virttest/
> utils_net.py#L1325
> 2. Make the default mode of vnet_header to be True (at least for virtio-net)
> in all the tests
> 3. Add the parameter that allows to redefine vnet_hdr as False for testing
> purposes
> 4. Create additional test that will verify the network functions also with
> vnet_hdr=no with and without vhost, such a test will reproduce the problem
> (vhost + no vnet_hdr) with Windows VM and probably on Linux VM as well
> 5. Open a separate BZ according to the results of the test

Comment 18 leidwang@redhat.com 2023-02-14 02:31:03 UTC
(In reply to ybendito from comment #16)
> Note that the problem happens only under avocado in hot-plug flow and is not
> reproducible with manual execution. This is due to inconsistency of TAP
> creation in avocado when the network adapter is attached from the beginning
> (in this case the avocado creates it with vnet_hdr option) and when the
> network adapter is hot-plugged (in this case the avocado creates it with
> vnet_hdr option), see
> https://github.com/avocado-framework/avocado-vt/blob/master/virttest/qemu_vm.
> py#L4232
> https://github.com/avocado-framework/avocado-vt/blob/master/virttest/qemu_vm.
> py#L2845
> 
> For example, the libvirt for virtio-net always creates the TAP with vnet_hdr
> https://github.com/libvirt/libvirt/blob/master/src/qemu/qemu_interface.c#L435
> https://github.com/libvirt/libvirt/blob/master/src/qemu/qemu_interface.c#L240
> 
> Qemu when it runs from the command line also by default creates the TAP with
> vnet_hdr.
> So, if I'm not mistaken: the mainstream flow always uses TAP with vnet_hdr,
> although the qemu has an option there is the option to create it without
> vnet_hdr.
> 
> However, even if the TAP was created without vnet_hdr, the networking should
> work (some optional features of virtio-net are disabled) and there is no
> explanation yet why in the tests it sometimes works and sometimes does not.
> IMO, this lowers the priority on this bug but does not justify the erroneous
> behavior of the network stack.
> 
> My suggestions (to be discussed):
> 1. Add printout when the TAP is created to make visible in the logs whether
> vnet_hdr was specified at time of creation, i.e. here
> https://github.com/avocado-framework/avocado-vt/blob/master/virttest/
> utils_net.py#L1325
> 2. Make the default mode of vnet_header to be True (at least for virtio-net)
> in all the tests
> 3. Add the parameter that allows to redefine vnet_hdr as False for testing
> purposes
> 4. Create additional test that will verify the network functions also with
> vnet_hdr=no with and without vhost, such a test will reproduce the problem
> (vhost + no vnet_hdr) with Windows VM and probably on Linux VM as well
> 5. Open a separate BZ according to the results of the test

Thanks a lot Yuri.

I will check your suggestions one by one,and update QE automation code or test plan.

Thanks,
Leidong

Comment 19 ybendito 2023-02-14 18:17:39 UTC
Just for record - in my smoke test with Fedora 36 VM the guest virtio-net with vhost=on,vnet_hdr=off does not acquire the IP address

Comment 20 leidwang@redhat.com 2023-03-02 03:23:27 UTC
Discussed this bz with Lei,this issue can be reproduced on linux guest,so change the component to qemu-kvm/networking.Thanks!

Comment 21 Lei Yang 2023-03-02 05:38:11 UTC
From QE's perspective, the current bug and Bug 2084003 are the same issue as Bug 1958175, so closed it "DUPLICATE". Please correct me if I'm wrong. 

Thanks
Lei

*** This bug has been marked as a duplicate of bug 1958175 ***