Bug 2033279
Summary: | [wrb][qemu-kvm 6.2] The hot-unplugged device can not be hot-plugged back | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Yanghang Liu <yanghliu> |
Component: | qemu-kvm | Assignee: | Kevin Wolf <kwolf> |
qemu-kvm sub component: | Devices | QA Contact: | Yanghang Liu <yanghliu> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | ailan, alex.williamson, chayang, coli, jinzhao, juzhang, kwolf, leiyang, lizhu, mark, mst, pezhang, pkrempa, virt-maint, yafu, yalzhang, yanghliu, yicui, ymankad |
Version: | 8.6 | Keywords: | Regression, TestBlocker, Triaged |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | qemu-kvm-6.2.0-6.module+el8.6.0+14165+5e5e76ac | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-05-10 13:24:21 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Yanghang Liu
2021-12-16 12:25:08 UTC
> Version-Release number of selected component (if applicable):
> qemu-kvm-6.2.0-1.rc2.scrmod+el8.6.0+13458+219ac088.wrb211124.x86_64
> libvirt-7.9.0-1.module+el8.6.0+13150+28339563.x86_64
The hot-unplugged PF/VF can be hot-plugged back successfully in the following test env:
qemu-kvm-6.1.0-5.module+el8.6.0+13430+8fdd5f85.x86_64
libvirt-7.9.0-1.module+el8.6.0+13150+28339563.x86_64
I am still not sure whether the root cause of this bug is in libvirt or qemu-kvm, but according to comment 1, open this bug in qemu-kvm first and mark this bug as regression. Feel free to move this bug to libvirt once we find that the root cause is in libvirt. I also encountered this issue when testing with wrb qemu. No issue for below combination: libvirt-7.10.0-1.module+el8.6.0+13502+4f24a11d.x86_64 qemu-kvm-6.1.0-5.module+el8.6.0+13430+8fdd5f85.x86_64 But when I update the qemu-kvm to be 6.2.0-1.rc1.scrmod+el8.6.0+13325+d4e3491c.wrb21117.x86_64, the issue occurs. So I think there may be some changes in the wrb qemu-kvm, which caused this libvirt 'noncooperation'. 1. Start vm with 1 interface: # virsh domiflist rhel Interface Type Source Model MAC ------------------------------------------------------------- vnet4 network default e1000e 52:54:00:c0:a0:9d 2. After the vm boot up successfully, hot-unplug the interace: # virsh detach-interface rhel network 52:54:00:c0:a0:9d Interface detached successfully check on guest OS, the interface is detached. But check the guest xml, the interface still exists, which is not expected. # virsh domiflist rhel Interface Type Source Model MAC ------------------------------------------------------------- vnet4 network default e1000e 52:54:00:c0:a0:9d # virsh dumpxml rhel | grep /interface -B7 <interface type='network'> <mac address='52:54:00:c0:a0:9d'/> <source network='default' portid='d3ed5141-8efd-4d69-be40-c8512530ea25' bridge='virbr0'/> <target dev='vnet4'/> <model type='e1000e'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </interface> This bug exists in the following test env: qemu-kvm-6.2.0-1.el9.x86_64 libvirt-7.10.0-1.el9.x86_64 Tested with: qemu-kvm-6.2.0-1.el9.x86_64 libvirt-7.10.0-1.el9.x86_64 For virtiofs and watchdog device, also met with the same issue in Comment #3: devices are hot-unplugged in the guest, but not removed from guest xml. This is the same with Bug 2036669 Keep this bug open for this issue still exists in qemu-kvm-6.2.0-5.module+el8.6.0+14025+ca131e0a.x86_64. > This bug is the same with Bug 2036669 > This issue can still be reproduced in qemu-kvm-6.2.0-4.el9.x86_64, while it is fixed in qemu-kvm-6.2.0-5.el9.x86_64. > Keep this bug open for this issue still exists in qemu-kvm-6.2.0-5.module+el8.6.0+14025+ca131e0a.x86_64. Hi Michael, Kevin and Yash It seems to me that a same bug has been fixed in qemu-kvm-6.2.0-5.el9.x86_64. May I ask if we can fix this bug on RHEL.8.6 as this bug is Regression and TestBlocker ? The original description of this bug doesn't contain any JSON -device in the command line, and it includes a correct DEVICE_DELETED event in the observed QMP traffic. Is this still true? If so, both the condition to trigger the bug and the result are different from bug 2036669, so this looks entirely unrelated. (In reply to Kevin Wolf from comment #19) > The original description of this bug doesn't contain any JSON -device in the command line, and it includes a correct DEVICE_DELETED event in the observed QMP traffic. > Is this still true? Hi Kevin, The information I added in the description indicates that "This bug cannot be reproduced when the -device qemu cmd is not in JSON format" I think this result is consistent with your bug. > Additional info: >(1) Only using qemu-kvm to test the same scenario in the same test env *does not reproduce this bug* <--- Please pay attention to the info I highlight here. >The Simplified qemu command line is as following: ... >-device vfio-pci,host=0000:e3:0a.0,bus=root.4,id=pf1 \ > The related qmp: > {"execute":"device_del","arguments":{"id":"vf1"}} > {"return": {}} > {"timestamp": {"seconds": 1639658800, "microseconds": 685326}, "event": "DEVICE_DELETED", "data": {"device": "vf1", "path": "/machine/peripheral/pf1"}} > {"execute":"device_add","arguments":{"driver":"vfio-pci","host":"0000:e3:0a.0","id":"vf1","bus":"root.4"}} > {"return": {}} Besides, let me translate the reproducer into a qemu command line/qmp to make this question clearer for us Test env: qemu-kvm-6.2.0-4.el9.x86_64 libvirt-7.10.0-1.el9.x86_64 > Steps to Reproduce: > 1.start a vm with a PF/VF > > # virt-install --machine=q35 --noreboot --name=rhel86 --memory=4096 > --vcpus=4 --graphics type=vnc,port=5986,listen=0.0.0.0 --network > bridge=switch,model=virtio,mac=52:54:00:00:86:86 --import --noautoconsole > --disk > path=/home/images/RHEL86.qcow2,bus=virtio,cache=none,format=qcow2,io=threads, > size=20 --hostdev pci_0000_e3_0a_0 > > The device xml: > > <hostdev mode='subsystem' type='pci' managed='yes'> > <driver name='vfio'/> > <source> > <address domain='0x0000' bus='0xe3' slot='0x0a' function='0x0'/> > </source> > </hostdev> The related qemu cmd line: -device {"driver":"vfio-pci","host":"0000:e3:0a.0","id":"hostdev0"} > 2.Hot-unplug the PF/VF > > # virsh detach-device-alias rhel86 hostdev0 > Device detach request sent successfully <--- But the PF/VF xml still > exists in the vm The related qmp: {"execute":"device_del","arguments":{"id":"hostdev0"},"id":"libvirt-405"} {"return": {}, "id": "libvirt-405"} There is not related info output like: "{"timestamp": {"seconds": 1643339608, "microseconds": 630965}, "event": "DEVICE_DELETED", "data": {"device": "hostdev0", "path": "/machine/peripheral/hostdev0"}}" > 3.check the PF/VF info in the vm > > # lspci or # ifconfig <-- There is no any info about the hot-unplugged > PF/VF > > # dmesg > [ 37.105546] pcieport 0000:00:02.3: pciehp: Slot(0-3): Attention button > pressed > [ 37.107395] pcieport 0000:00:02.3: pciehp: Slot(0-3): Powering off due to > button press > [ 42.634339] iavf 0000:04:00.0: Hardware reset detected > > 4. Hot-plug the PF/VF back to the vm > > # virsh attach-device rhel86 /tmp/device/0000\:e3\:0a.0.xml > error: Failed to attach device from /tmp/device/0000:e3:0a.0.xml > error: Requested operation is not valid: PCI device 0000:e3:0a.0 is in use > by driver QEMU, domain rhel86 The "Hot-plug the PF/VF back to the vm" op is blocked by libvirt because the "Hot-unplug the PF/VF" op has not finished yet. Sorry, I missed that this information was related to the case where it does *not* reproduce. Then yes, we can use this bug to fix it in 8.6. Note that in 9.0, the problem was first worked around in libvirt, but fixing just QEMU should be enough. rhel-8.6 will get (already probably got) libvirt-8.0 which has the workaround, as it is an upstreamed patch, so the code base is identical to rhel-9 in this regard. QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass. > Steps to Reproduce: > 1.start a vm with a PF/VF > > # virt-install --machine=q35 --noreboot --name=rhel86 --memory=4096 > --vcpus=4 --graphics type=vnc,port=5986,listen=0.0.0.0 --network > bridge=switch,model=virtio,mac=52:54:00:00:86:86 --import --noautoconsole > --disk > path=/home/images/RHEL86.qcow2,bus=virtio,cache=none,format=qcow2,io=threads, > size=20 --hostdev pci_0000_e3_0a_0 > > The device xml: > > <hostdev mode='subsystem' type='pci' managed='yes'> > <driver name='vfio'/> > <source> > <address domain='0x0000' bus='0xe3' slot='0x0a' function='0x0'/> > </source> > </hostdev> > > > 2.Hot-unplug the PF/VF > # virsh detach-device-alias rhel86 hostdev0 > 3.check the PF/VF info in the vm > # lspci or # ifconfig > # dmesg > 4. Hot-plug the PF/VF back to the vm > # virsh attach-device rhel86 /tmp/device/0000\:e3\:0a.0.xml Verification Result : PASS This bug can be reproduced in the following test evn: qemu-kvm-6.2.0-5.module+el8.6.0+14025+ca131e0a.x86_64 libvirt-7.10.0-1.module+el8.6.0+13502+4f24a11d.x86_64 This bug has been fixed in the following test env: qemu-kvm-6.2.0-6.module+el8.6.0+14167+61b0e671.x86_64 libvirt-7.10.0-1.module+el8.6.0+13502+4f24a11d.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: virt:rhel and virt-devel:rhel security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1759 |