RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2033279 - [wrb][qemu-kvm 6.2] The hot-unplugged device can not be hot-plugged back
Summary: [wrb][qemu-kvm 6.2] The hot-unplugged device can not be hot-plugged back
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: qemu-kvm
Version: 8.6
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Kevin Wolf
QA Contact: Yanghang Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-16 12:25 UTC by Yanghang Liu
Modified: 2022-05-10 13:38 UTC (History)
19 users (show)

Fixed In Version: qemu-kvm-6.2.0-6.module+el8.6.0+14165+5e5e76ac
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-05-10 13:24:21 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gitlab redhat/rhel/src/qemu-kvm qemu-kvm merge_requests 103 0 None None None 2022-01-28 16:53:42 UTC
Red Hat Issue Tracker RHELPLAN-105997 0 None None None 2021-12-16 12:29:41 UTC
Red Hat Product Errata RHSA-2022:1759 0 None None None 2022-05-10 13:25:03 UTC

Description Yanghang Liu 2021-12-16 12:25:08 UTC
Description of problem:
The hot-unplugged PF/VF can not be hot-plugged back

Version-Release number of selected component (if applicable):
host:
qemu-kvm-6.2.0-1.rc2.scrmod+el8.6.0+13458+219ac088.wrb211124.x86_64
libvirt-7.9.0-1.module+el8.6.0+13150+28339563.x86_64

How reproducible:
100%

Steps to Reproduce:
1.start a vm with a PF/VF

# virt-install --machine=q35 --noreboot --name=rhel86 --memory=4096 --vcpus=4 --graphics type=vnc,port=5986,listen=0.0.0.0  --network bridge=switch,model=virtio,mac=52:54:00:00:86:86 --import --noautoconsole --disk path=/home/images/RHEL86.qcow2,bus=virtio,cache=none,format=qcow2,io=threads,size=20 --hostdev pci_0000_e3_0a_0  

The device xml:

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0xe3' slot='0x0a' function='0x0'/>
      </source>
    </hostdev>


2.Hot-unplug the PF/VF

# virsh detach-device-alias rhel86 hostdev0
Device detach request sent successfully    <--- But the PF/VF xml still exists in the vm

3.check the PF/VF info in the vm

# lspci or # ifconfig  <-- There is no any info about the hot-unplugged PF/VF 

# dmesg
[   37.105546] pcieport 0000:00:02.3: pciehp: Slot(0-3): Attention button pressed
[   37.107395] pcieport 0000:00:02.3: pciehp: Slot(0-3): Powering off due to button press
[   42.634339] iavf 0000:04:00.0: Hardware reset detected

4. Hot-plug the PF/VF back to the vm

# virsh attach-device rhel86 /tmp/device/0000\:e3\:0a.0.xml 
error: Failed to attach device from /tmp/device/0000:e3:0a.0.xml
error: Requested operation is not valid: PCI device 0000:e3:0a.0 is in use by driver QEMU, domain rhel86



Actual results:
The PF/VF xml still exists in the vm after hot-unplug the PF/VF device
The hot-unplugged PF/VF can not be hot-plugged back

Expected results:
The hot-unplugged PF/VF can be hot-plugged back successfully

Additional info:

(1) Only using qemu-kvm to test the same scenario in the same test env *does not reproduce this problem*

The Simplified qemu command line is as following:
/usr/libexec/qemu-kvm -name rhel86 -M q35 -enable-kvm \
-monitor stdio \
-nodefaults \
-m 4G \
-boot menu=on \
-cpu host \
-smp 8,sockets=4,cores=2,threads=1,maxcpus=8 \
-qmp tcp:0:5555,server,nowait \
-device pcie-root-port,id=root.1,chassis=1,addr=0x2.0,multifunction=on \
-device pcie-root-port,id=root.2,chassis=2,addr=0x2.1 \
-device pcie-root-port,id=root.3,chassis=3,addr=0x2.2 \
-device pcie-root-port,id=root.4,chassis=4,addr=0x2.3 \
-device pcie-root-port,id=root.5,chassis=5,addr=0x2.4 \
-device pcie-root-port,id=root.6,chassis=6,addr=0x2.5 \
-device pcie-root-port,id=root.7,chassis=7,addr=0x2.6 \
-device pcie-root-port,id=root.8,chassis=8,addr=0x2.7 \
-blockdev node-name=back_image,driver=file,cache.direct=on,cache.no-flush=off,filename=/home/images/RHEL86.qcow2,aio=threads \
-blockdev node-name=drive-virtio-disk0,driver=qcow2,cache.direct=on,cache.no-flush=off,file=back_image \
-device virtio-blk-pci,drive=drive-virtio-disk0,id=disk0,bus=root.1 \
-device VGA,id=video1,bus=root.2 \
-vnc :0 \
-device virtio-net-pci,netdev=nic1,id=vnet0,mac=52:54:00:00:86:86,bus=root.3 \
-netdev tap,id=nic1,script=/etc/qemu-ifup,vhost=on \
-device vfio-pci,host=0000:e3:0a.0,bus=root.4,id=pf1 \


The related qmp:
{"execute":"device_del","arguments":{"id":"vf1"}}
{"return": {}}
{"timestamp": {"seconds": 1639658800, "microseconds": 685326}, "event": "DEVICE_DELETED", "data": {"device": "vf1", "path": "/machine/peripheral/pf1"}} 
{"execute":"device_add","arguments":{"driver":"vfio-pci","host":"0000:e3:0a.0","id":"vf1","bus":"root.4"}}
{"return": {}}

Comment 1 Yanghang Liu 2021-12-16 12:32:28 UTC
> Version-Release number of selected component (if applicable):
> qemu-kvm-6.2.0-1.rc2.scrmod+el8.6.0+13458+219ac088.wrb211124.x86_64
> libvirt-7.9.0-1.module+el8.6.0+13150+28339563.x86_64

The hot-unplugged PF/VF can be hot-plugged back successfully in the following test env:
qemu-kvm-6.1.0-5.module+el8.6.0+13430+8fdd5f85.x86_64
libvirt-7.9.0-1.module+el8.6.0+13150+28339563.x86_64

Comment 2 Yanghang Liu 2021-12-16 12:36:52 UTC
I am still not sure whether the root cause of this bug is in libvirt or qemu-kvm, 

but according to comment 1, open this bug in qemu-kvm first and mark this bug as regression.

Feel free to move this bug to libvirt once we find that the root cause is in libvirt.

Comment 3 yalzhang@redhat.com 2021-12-17 02:15:53 UTC
I also encountered this issue when testing with wrb qemu. 
No issue for below combination:
libvirt-7.10.0-1.module+el8.6.0+13502+4f24a11d.x86_64
qemu-kvm-6.1.0-5.module+el8.6.0+13430+8fdd5f85.x86_64

But when I update the qemu-kvm to be 6.2.0-1.rc1.scrmod+el8.6.0+13325+d4e3491c.wrb21117.x86_64, the issue occurs. So I think there may be some changes in the wrb qemu-kvm, which caused this libvirt 'noncooperation'.

1. Start vm with 1 interface:
# virsh domiflist rhel 
 Interface   Type      Source    Model    MAC
-------------------------------------------------------------
 vnet4       network   default   e1000e   52:54:00:c0:a0:9d

2. After the vm boot up successfully, hot-unplug the interace:
# virsh detach-interface rhel network  52:54:00:c0:a0:9d
Interface detached successfully

check on guest OS, the interface is detached.
But check the guest xml, the interface still exists, which is not expected.
# virsh domiflist rhel 
 Interface   Type      Source    Model    MAC
-------------------------------------------------------------
 vnet4       network   default   e1000e   52:54:00:c0:a0:9d

# virsh dumpxml rhel | grep /interface -B7
    <interface type='network'>
      <mac address='52:54:00:c0:a0:9d'/>
      <source network='default' portid='d3ed5141-8efd-4d69-be40-c8512530ea25' bridge='virbr0'/>
      <target dev='vnet4'/>
      <model type='e1000e'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>

Comment 6 Yanghang Liu 2021-12-20 07:50:10 UTC
This bug exists in the following test env:
qemu-kvm-6.2.0-1.el9.x86_64
libvirt-7.10.0-1.el9.x86_64

Comment 12 Lili Zhu 2021-12-30 06:20:36 UTC
Tested with:
qemu-kvm-6.2.0-1.el9.x86_64
libvirt-7.10.0-1.el9.x86_64

For virtiofs and watchdog device, also met with the same issue in Comment #3: devices are hot-unplugged in the guest, but not removed from guest xml.

Comment 13 yalzhang@redhat.com 2022-01-04 01:50:38 UTC
This is the same with Bug 2036669

Comment 17 Yanghang Liu 2022-01-27 02:38:36 UTC
Keep this bug open for this issue still exists in qemu-kvm-6.2.0-5.module+el8.6.0+14025+ca131e0a.x86_64.

Comment 18 Yanghang Liu 2022-01-27 03:22:59 UTC
 
>  This bug is the same with Bug 2036669 

>  This issue can still be reproduced in qemu-kvm-6.2.0-4.el9.x86_64, while it is fixed in qemu-kvm-6.2.0-5.el9.x86_64.

>  Keep this bug open for this issue still exists in qemu-kvm-6.2.0-5.module+el8.6.0+14025+ca131e0a.x86_64.


Hi Michael, Kevin and Yash

It seems to me that a same bug has been fixed in qemu-kvm-6.2.0-5.el9.x86_64.

May I ask if we can fix this bug on RHEL.8.6 as this bug is Regression and TestBlocker ?

Comment 19 Kevin Wolf 2022-01-27 11:43:53 UTC
The original description of this bug doesn't contain any JSON -device in the command line, and it includes a correct DEVICE_DELETED event in the observed QMP traffic.

Is this still true?

If so, both the condition to trigger the bug and the result are different from bug 2036669, so this looks entirely unrelated.

Comment 20 Yanghang Liu 2022-01-28 04:11:28 UTC
(In reply to Kevin Wolf from comment #19)

> The original description of this bug doesn't contain any JSON -device in the command line, and it includes a correct DEVICE_DELETED event in the observed QMP traffic.
> Is this still true?

Hi Kevin,

The information I added in the description indicates that "This bug cannot be reproduced when the -device qemu cmd is not in JSON format"

I think this result is consistent with your bug.


> Additional info:

>(1) Only using qemu-kvm to test the same scenario in the same test env *does not reproduce this bug*              <--- Please pay attention to the info I highlight here.

>The Simplified qemu command line is as following:
...
>-device vfio-pci,host=0000:e3:0a.0,bus=root.4,id=pf1 \

> The related qmp:
> {"execute":"device_del","arguments":{"id":"vf1"}}
> {"return": {}}
> {"timestamp": {"seconds": 1639658800, "microseconds": 685326}, "event": "DEVICE_DELETED", "data": {"device": "vf1", "path": "/machine/peripheral/pf1"}} 
> {"execute":"device_add","arguments":{"driver":"vfio-pci","host":"0000:e3:0a.0","id":"vf1","bus":"root.4"}}
> {"return": {}}

Comment 21 Yanghang Liu 2022-01-28 05:15:03 UTC
Besides, let me translate the reproducer into a qemu command line/qmp to make this question clearer for us

Test env:
qemu-kvm-6.2.0-4.el9.x86_64
libvirt-7.10.0-1.el9.x86_64


> Steps to Reproduce:
> 1.start a vm with a PF/VF
> 
> # virt-install --machine=q35 --noreboot --name=rhel86 --memory=4096
> --vcpus=4 --graphics type=vnc,port=5986,listen=0.0.0.0  --network
> bridge=switch,model=virtio,mac=52:54:00:00:86:86 --import --noautoconsole
> --disk
> path=/home/images/RHEL86.qcow2,bus=virtio,cache=none,format=qcow2,io=threads,
> size=20 --hostdev pci_0000_e3_0a_0  
> 
> The device xml:
> 
>     <hostdev mode='subsystem' type='pci' managed='yes'>
>       <driver name='vfio'/>
>       <source>
>         <address domain='0x0000' bus='0xe3' slot='0x0a' function='0x0'/>
>       </source>
>     </hostdev>

The related qemu cmd line:
-device {"driver":"vfio-pci","host":"0000:e3:0a.0","id":"hostdev0"} 

> 2.Hot-unplug the PF/VF
> 
> # virsh detach-device-alias rhel86 hostdev0
> Device detach request sent successfully    <--- But the PF/VF xml still
> exists in the vm

The related qmp:

{"execute":"device_del","arguments":{"id":"hostdev0"},"id":"libvirt-405"}
{"return": {}, "id": "libvirt-405"}

There is not related info output like: "{"timestamp": {"seconds": 1643339608, "microseconds": 630965}, "event": "DEVICE_DELETED", "data": {"device": "hostdev0", "path": "/machine/peripheral/hostdev0"}}"


> 3.check the PF/VF info in the vm
> 
> # lspci or # ifconfig  <-- There is no any info about the hot-unplugged
> PF/VF 
> 
> # dmesg
> [   37.105546] pcieport 0000:00:02.3: pciehp: Slot(0-3): Attention button
> pressed
> [   37.107395] pcieport 0000:00:02.3: pciehp: Slot(0-3): Powering off due to
> button press
> [   42.634339] iavf 0000:04:00.0: Hardware reset detected
> 
> 4. Hot-plug the PF/VF back to the vm
> 
> # virsh attach-device rhel86 /tmp/device/0000\:e3\:0a.0.xml 
> error: Failed to attach device from /tmp/device/0000:e3:0a.0.xml
> error: Requested operation is not valid: PCI device 0000:e3:0a.0 is in use
> by driver QEMU, domain rhel86

The "Hot-plug the PF/VF back to the vm" op is blocked by libvirt because the "Hot-unplug the PF/VF" op has not finished yet.

Comment 22 Kevin Wolf 2022-01-28 08:34:22 UTC
Sorry, I missed that this information was related to the case where it does *not* reproduce.

Then yes, we can use this bug to fix it in 8.6. Note that in 9.0, the problem was first worked around in libvirt, but fixing just QEMU should be enough.

Comment 23 Peter Krempa 2022-01-28 08:48:52 UTC
rhel-8.6 will get (already probably got) libvirt-8.0 which has the workaround, as it is an upstreamed patch, so the code base is identical to rhel-9 in this regard.

Comment 27 Yanan Fu 2022-02-09 06:14:20 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 28 Yanghang Liu 2022-02-09 09:38:11 UTC
> Steps to Reproduce:
> 1.start a vm with a PF/VF
> 
> # virt-install --machine=q35 --noreboot --name=rhel86 --memory=4096
> --vcpus=4 --graphics type=vnc,port=5986,listen=0.0.0.0  --network
> bridge=switch,model=virtio,mac=52:54:00:00:86:86 --import --noautoconsole
> --disk
> path=/home/images/RHEL86.qcow2,bus=virtio,cache=none,format=qcow2,io=threads,
> size=20 --hostdev pci_0000_e3_0a_0  
> 
> The device xml:
> 
>     <hostdev mode='subsystem' type='pci' managed='yes'>
>       <driver name='vfio'/>
>       <source>
>         <address domain='0x0000' bus='0xe3' slot='0x0a' function='0x0'/>
>       </source>
>     </hostdev>
> 
> 
> 2.Hot-unplug the PF/VF
> # virsh detach-device-alias rhel86 hostdev0


> 3.check the PF/VF info in the vm
> # lspci or # ifconfig
> # dmesg

> 4. Hot-plug the PF/VF back to the vm
> # virsh attach-device rhel86 /tmp/device/0000\:e3\:0a.0.xml 



Verification Result : PASS

  This bug can be reproduced in the following test evn:
    qemu-kvm-6.2.0-5.module+el8.6.0+14025+ca131e0a.x86_64
    libvirt-7.10.0-1.module+el8.6.0+13502+4f24a11d.x86_64

  This bug has been fixed in the following test env:
    qemu-kvm-6.2.0-6.module+el8.6.0+14167+61b0e671.x86_64
    libvirt-7.10.0-1.module+el8.6.0+13502+4f24a11d.x86_64

Comment 30 errata-xmlrpc 2022-05-10 13:24:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: virt:rhel and virt-devel:rhel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1759


Note You need to log in before you can comment on or make changes to this bug.