Bug 1025700
Summary: | qemu-kvm hang while hot unplug VF after release VF in host. | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Xu Han <xuhan> |
Component: | qemu-kvm | Assignee: | Bandan Das <bdas> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 7.0 | CC: | acathrow, alex.williamson, chayang, hhuang, juzhang, michen, mrezanin, sluo, virt-maint, xfu, xuhan |
Target Milestone: | rc | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2014-06-13 12:42:44 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Xu Han
2013-11-01 10:44:20 UTC
Please confirm how you create the VFs, using "max_vfs" to modprobe or using the sysfs interface. Also please confirm if you can reproduce with both the above methods of creating VFs. Tested this issue with kernel-debug-3.10.0-86.el7.x86_64 Scenario 1, create VFs via modprobe 1. create VFs # modprobe ixgbe max_vfs=2 2. check VFs on host # lspci | grep "Virtual Function" 05:10.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:10.1 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:10.2 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:10.3 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 3. bind one VF to vfio-pci # echo "0000:05:10.0" > /sys/bus/pci/devices/0000:05:10.0/driver/unbind # echo "8086 10ed" > /sys/bus/pci/drivers/vfio-pci/new_id 4. boot guest with assigned VF # /usr/libexec/qemu-kvm ...\ -device vfio-pci,host=0000:05:10.0,id=vf0 5. release VFs of PF '05:00.0' via sysfs interface on host # echo 0 > /sys/bus/pci/devices/0000\:05\:00.0/sriov_numvfs -- This process would hung. 6. check VFs on host # lspci | grep "Virtual Function" 05:10.1 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:10.2 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:10.3 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 7. reboot guest -- While guest booting, these messages below would be noticed. qemu-kvm: vfio: hot reset info failed: Operation not permitted qemu-kvm: vfio: hot reset info failed: Operation not permitted [ 1455.284560] vfio_pci_disable: Failed to reset device 0000:05:10.0 (-11) 8. hot unplug VF (qmp) { 'execute' : 'device_del', 'arguments' : { 'id' : 'vf0' } } {"return": {}} {"timestamp": {"seconds": 1392176134, "microseconds": 399986}, "event": "DEVICE_DELETED", "data": {"device": "vf0", "path": "/machine/peripheral/vf0"}} -- After hot unplug VF, the release process in step5 returned, qemu-kvm not hung. 9. check VFs # lspci | grep "Virtual Function" 05:10.1 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 05:10.3 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) -- VFs of PF '05:00.0' released succeed. 10. check dmesg on host # dmesg ... [ 1201.604169] INFO: task bash:4253 blocked for more than 120 seconds. [ 1201.610541] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1201.618459] bash D 0000000000000000 5232 4253 4249 0x00000080 [ 1201.618466] ffff8802222e7c10 0000000000000046 00000000001d5540 ffff8802222e7fd8 [ 1201.618472] ffff8802222e7fd8 00000000001d5540 ffff880226c08000 ffff880221c0ce40 [ 1201.618477] ffff880221d1ec00 ffff880222616098 ffff880221d1f000 ffff880226383a00 [ 1201.618487] Call Trace: [ 1201.618497] [<ffffffff81683159>] schedule+0x29/0x70 [ 1201.618505] [<ffffffffa05a5312>] vfio_del_group_dev+0xc2/0x150 [vfio] [ 1201.618510] [<ffffffff81098270>] ? wake_up_bit+0x30/0x30 [ 1201.618515] [<ffffffffa05ac12b>] vfio_pci_remove+0x1b/0x40 [vfio_pci] [ 1201.618521] [<ffffffff81345efb>] pci_device_remove+0x3b/0xb0 [ 1201.618528] [<ffffffff8140c68f>] __device_release_driver+0x7f/0xf0 [ 1201.618534] [<ffffffff8140c725>] device_release_driver+0x25/0x40 [ 1201.618538] [<ffffffff8140be8c>] bus_remove_device+0x11c/0x1a0 [ 1201.618541] [<ffffffff814086c2>] device_del+0x142/0x1e0 [ 1201.618547] [<ffffffff8133f414>] pci_stop_bus_device+0x94/0xa0 [ 1201.618551] [<ffffffff8133f502>] pci_stop_and_remove_bus_device+0x12/0x20 [ 1201.618560] [<ffffffff8135f27f>] virtfn_remove+0xef/0x180 [ 1201.618566] [<ffffffff8135fa44>] pci_disable_sriov+0x74/0x140 [ 1201.618577] [<ffffffffa0579ef3>] ixgbe_disable_sriov+0x83/0x160 [ixgbe] [ 1201.618586] [<ffffffffa057a2d7>] ixgbe_pci_sriov_configure+0x77/0x180 [ixgbe] [ 1201.618594] [<ffffffff813478c7>] sriov_numvfs_store+0xc7/0x130 [ 1201.618598] [<ffffffff814076b8>] dev_attr_store+0x18/0x30 [ 1201.618604] [<ffffffff8126df0b>] sysfs_write_file+0xdb/0x150 [ 1201.618608] [<ffffffff811ee090>] vfs_write+0xc0/0x1f0 [ 1201.618613] [<ffffffff8120e607>] ? fget_light+0x3a7/0x510 [ 1201.618616] [<ffffffff811eea9c>] SyS_write+0x4c/0xa0 [ 1201.618620] [<ffffffff8168f659>] system_call_fastpath+0x16/0x1b [ 1201.618624] 5 locks held by bash/4253: [ 1201.618627] #0: (sb_writers#3){.+.+.+}, at: [<ffffffff811ee18b>] vfs_write+0x1bb/0x1f0 [ 1201.618634] #1: (&buffer->mutex){+.+.+.}, at: [<ffffffff8126de6c>] sysfs_write_file+0x3c/0x150 [ 1201.618643] #2: (s_active#225){.+.+.+}, at: [<ffffffff8126def3>] sysfs_write_file+0xc3/0x150 [ 1201.618654] #3: (&iov->lock){+.+.+.}, at: [<ffffffff8135f277>] virtfn_remove+0xe7/0x180 [ 1201.618661] #4: (&__lockdep_no_validate__){......}, at: [<ffffffff8140c71d>] device_release_driver+0x1d/0x40 ... Scenario 2, create VFs via sysfs interface 1. create VFs # echo 2 > /sys/bus/pci/devices/0000\:05\:00.0/sriov_numvfs step2 to 10 were same as Scenario 1. Test results were same as Scenario 1. -- Unloading ixgbe module or binding parent PF to vfio-pci while VM running with assigned VF would not hit this issue as well in current kernel. After previous operating returned, VFs would not released, and qemu-kvm not hung. Additional info: While trying to verify Bug 1045175, I hit same log "qemu-kvm: vfio: hot reset info failed: Operation not permitted" with intel dual port 82576 SRIOV NIC, but no calltrace in dmesg. (In reply to xuhan from comment #5) > -- > Unloading ixgbe module or binding parent PF to vfio-pci while VM running > with assigned VF would not hit this issue as well in current kernel. After > previous operating returned, VFs would not released, and qemu-kvm not hung. From your comment above, can you clarify if except for the messages : qemu-kvm: vfio: hot reset info failed: Operation not permitted qemu-kvm: vfio: hot reset info failed: Operation not permitted do you think there's any other issue with the new kernel ? It looks like you don't see the qemu-kvm hang anymore right and so the problem is fixed ? Or am I missing something.. This issue itself is fixed by the new kernel. I just not sure whether exist some operations between step5 and 8 would lead to other issue. (In reply to xuhan from comment #8) > This issue itself is fixed by the new kernel. I just not sure whether exist > some operations between step5 and 8 would lead to other issue. Between step 5 and 8, I think the only potential issue is step 7 - 7. reboot guest -- While guest booting, these messages below would be noticed. qemu-kvm: vfio: hot reset info failed: Operation not permitted qemu-kvm: vfio: hot reset info failed: Operation not permitted [ 1455.284560] vfio_pci_disable: Failed to reset device 0000:05:10.0 (-11) This needs to be investigated but I would prefer if we have a new bug for it. The hang in step 5 is expected since the guest hasn't relinquished control yet (just like a rmmod when the driver is in use) Thanks. (In reply to Bandan Das from comment #9) > Between step 5 and 8, I think the only potential issue is step 7 - > 7. reboot guest > -- While guest booting, these messages below would be noticed. > qemu-kvm: vfio: hot reset info failed: Operation not permitted > qemu-kvm: vfio: hot reset info failed: Operation not permitted > [ 1455.284560] vfio_pci_disable: Failed to reset device 0000:05:10.0 (-11) > > This needs to be investigated but I would prefer if we have a new bug for > it. > > The hang in step 5 is expected since the guest hasn't relinquished control > yet (just like a rmmod when the driver is in use) > Thanks. Agreed. Thanks. Closing this since QE verified fix in recent kernels. Reopening.. Just got informed QE needs to close this. Verify this bug with component: kernel-3.10.0-109.el7.x86_64 Steps: 1. Check VF. # lspci | grep Emulex 07:00.0 Ethernet controller: Emulex Corporation OneConnect 10Gb NIC (be3) (rev 01) 07:00.1 Ethernet controller: Emulex Corporation OneConnect 10Gb NIC (be3) (rev 01) 07:04.0 Ethernet controller: Emulex Corporation OneConnect 10Gb NIC (be3) (rev 01) <- VF 07:04.1 Ethernet controller: Emulex Corporation OneConnect 10Gb NIC (be3) (rev 01) <- VF 2. Boot guest with assigned VF. # /usr/libexec/qemu-kvm -M pc-i440fx-rhel7.0.0 -cpu SandyBridge -m 4G -S -smp 4,threads=1,cores=4,sockets=1 -enable-kvm -name RHEL-Server-7.0-64 -uuid cca1433d-5bac-490f-a097-c5c80c1a083f -nodefconfig -nodefaults -k en-us -rtc base=utc,clock=host,driftfix=slew -qmp tcp:0:5000,server,nowait -boot order=c,menu=on -vga qxl -global qxl-vga.vram_size=67108864 -spice port=6000,disable-ticketing -device virtio-scsi-pci,id=scsi0 -drive file=/var/lib/libvirt/images/r7.img,if=none,id=drive-scsi0-0-0,cache=none,aio=native,rerror=stop,werror=stop -device scsi-hd,drive=drive-scsi0-0-0,id=os-disk,bus=scsi0.0,bootindex=1 -netdev tap,id=tap0,vhost=on,script=/etc/qemu-ifup,queues=2 -device virtio-net-pci,netdev=tap0,mac=54:d3:89:1c:a0:7d,id=net0,vectors=6,mq=on \ -device vfio-pci,host=07:04.0,id=vf0 \ -monitor stdio 3. Unbind its parent PF. # echo "0000:07:00.0" > /sys/bus/pci/devices/0000\:07\:00.0/driver/unbind 4. Hot unplug VF via QMP. { 'execute' : 'device_del', 'arguments' : { 'id' : 'vf0' } } Results: After step 3, the process would hung. After step 4, VF hot unplugged successfully, and the process of step 3 returned. # lspci | grep Emulex 07:00.0 Ethernet controller: Emulex Corporation OneConnect 10Gb NIC (be3) (rev 01) 07:00.1 Ethernet controller: Emulex Corporation OneConnect 10Gb NIC (be3) (rev 01) Base on these test results above, this bug has been fixed. This request was resolved in Red Hat Enterprise Linux 7.0. Contact your manager or support representative in case you have further questions about the request. |