Hide Forgot
Description of problem: boot guest and hotplug/hotunplug usb controller >1000 times. qemu will core dump. Version-Release number of selected component (if applicable): host info: # uname -r 2.6.32-191.el6.x86_64 # rpm -qa|grep kvm qemu-kvm-0.12.1.2-2.188.el6.x86_64 guest info: rhel6.2 (64 bit) How reproducible: always Steps to Reproduce: 1.unbind a usb controller in host 2.boot guest without usb controller /usr/libexec/qemu-kvm -m 4G -smp 4 -netdev tap,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:94:a3:8b -uuid 7c73a852-c316-4d61-b913-9dde17367a30 -drive file=/dev/migrate/data2,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-blk-pci0 -boot c -spice disable-ticketing,port=5911 -vga qxl -qmp tcp:0:6666,server,nowait 3.hotplug/hotunplug usb controller 2000 times (1)device_add driver=pci-assign host=00:1d.0 id=usb100 iommu=1 (2)device_del id=usb100 Actual results: qemu core dump Expected results: guest work well Additional info: bt trace message: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffff7037700 (LWP 31970)] 0x0000000000470cc4 in slow_bar_readl (opaque=0x2157298, addr=44) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/device-assignment.c:195 (gdb) bt #0 0x0000000000470cc4 in slow_bar_readl (opaque=0x2157298, addr=44) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/device-assignment.c:195 #1 0x00000000004eca2c in cpu_physical_memory_rw (addr=<value optimized out>, buf=<value optimized out>, len=4, is_write=0) at /usr/src/debug/qemu-kvm-0.12.1.2/exec.c:3546 #2 0x000000000042bd1c in handle_mmio (env=0x10903b0) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:868 #3 kvm_run (env=0x10903b0) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1020 #4 0x000000000042c009 in kvm_cpu_exec (env=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1699 #5 0x000000000042ce5f in kvm_main_loop_cpu (_env=0x10903b0) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1968 #6 ap_main_loop (_env=0x10903b0) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2018 #7 0x000000340f6077e1 in start_thread () from /lib64/libpthread.so.0 #8 0x000000340eee578d in clone () from /lib64/libc.so.6
"device_add driver=pci-assign host=00:1d.0 id=usb100 iommu=1" That looks more a pci passthru than a usb emulation issue, reassigning ...
added sleep 5 seconds between hotplug and hot-unplug, and add sleep 5 seconds before every times hot-plug as well. but it is still core dump
Can you please provide output of 'sudo lspci -vvv -s 1d.0' to identify the USB device being used?
(In reply to comment #4) > Can you please provide output of 'sudo lspci -vvv -s 1d.0' to identify the USB > device being used? # lspci -vvv -s 00:1a.0 00:1a.0 USB Controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 04) (prog-if 20 [EHCI]) Subsystem: Dell Device 0498 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 16 Region 0: Memory at dad70000 (32-bit, non-prefetchable) [size=1K] Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] Debug port: BAR=1 offset=00a0 Capabilities: [98] PCI Advanced Features AFCap: TP+ FLR+ AFCtrl: FLR- AFStatus: TP- Kernel driver in use: ehci_hcd
(In reply to comment #5) > (In reply to comment #4) > > Can you please provide output of 'sudo lspci -vvv -s 1d.0' to identify the USB > > device being used? > > # lspci -vvv -s 00:1a.0 > 00:1a.0 USB Controller: Intel Corporation 6 Series/C200 Series Chipset Family Comment 0 indicates device 00:1d.0 is being used, can you please confirm which device caused the problem, or maybe they both can trigger the bug? Thanks.
(In reply to comment #6) > (In reply to comment #5) > > (In reply to comment #4) > > > Can you please provide output of 'sudo lspci -vvv -s 1d.0' to identify the USB > > > device being used? > > > > # lspci -vvv -s 00:1a.0 > > 00:1a.0 USB Controller: Intel Corporation 6 Series/C200 Series Chipset Family > > Comment 0 indicates device 00:1d.0 is being used, can you please confirm which > device caused the problem, or maybe they both can trigger the bug? Thanks. sorry, just confirmed it again. device 00:1d.0 is being used. # lspci -vvv -s 00:1d.0 00:1d.0 USB Controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 04) (prog-if 20 [EHCI]) Subsystem: Dell Device 0498 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 17 Region 0: Memory at dad50000 (32-bit, non-prefetchable) [size=1K] Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] Debug port: BAR=1 offset=00a0 Capabilities: [98] PCI Advanced Features AFCap: TP+ FLR+ AFCtrl: FLR- AFStatus: TP- Kernel driver in use: ehci_hcd
Please re-test with this qemu-kvm rpm: https://brewweb.devel.redhat.com/taskinfo?taskID=4012380 I was able to reproduce the result, but not the exact scenario you describe in comment 0. The bug I found is a resource leak that results in a segfault once we overflow an internal resource. The inconsistency with your report is that this will occur at ~500 hotplug/unplug operations, not 1000 or 2000 as indicated here. Were you only able to get these high counts when not using a sleep between each hotplug and hotunplug operation? In comment 3 you indicate you added a sleep 5 for each, did you then get a failure after approximately 500 operations?
(In reply to comment #10) > Please re-test with this qemu-kvm rpm: > > https://brewweb.devel.redhat.com/taskinfo?taskID=4012380 > > I was able to reproduce the result, but not the exact scenario you describe in > comment 0. The bug I found is a resource leak that results in a segfault once > we overflow an internal resource. The inconsistency with your report is that > this will occur at ~500 hotplug/unplug operations, not 1000 or 2000 as > indicated here. Were you only able to get these high counts when not using a > sleep between each hotplug and hotunplug operation? In comment 3 you indicate > you added a sleep 5 for each, did you then get a failure after approximately > 500 operations? Sorry, so late reply to you. since I cann't reproduce this bug except SandBridge host. I will as soon as possible to take SandBridge host and re-test this bug.
(In reply to comment #10) > Please re-test with this qemu-kvm rpm: > > https://brewweb.devel.redhat.com/taskinfo?taskID=4012380 > > I was able to reproduce the result, but not the exact scenario you describe in > comment 0. The bug I found is a resource leak that results in a segfault once > we overflow an internal resource. The inconsistency with your report is that > this will occur at ~500 hotplug/unplug operations, not 1000 or 2000 as > indicated here. Were you only able to get these high counts when not using a > sleep between each hotplug and hotunplug operation? In comment 3 you indicate > you added a sleep 5 for each, did you then get a failure after approximately > 500 operations? testing scenarios: 1.I re-tested this bug with below qemu. test result: qemu works well(no core dump) https://brewweb.devel.redhat.com/taskinfo?taskID=4012380 2.without sleep between each hotplug and hotunplug operation sometimes(not 100%) can reproduce it,it still need to hotplug/unhotplug about 1000 times when reproducing.
(In reply to comment #12) > > 2.without sleep between each hotplug and hotunplug operation > sometimes(not 100%) can reproduce it,it still need to hotplug/unhotplug about > 1000 times when reproducing. This is not a realistic usage scenario test. PCI device hotplug occurs asynchronous to the device_del command, so you could very well be trying to add the device back before it's been removed. All hotplug testing should currently be done with a delay between each operation.
(In reply to comment #13) > (In reply to comment #12) > > > > 2.without sleep between each hotplug and hotunplug operation > > sometimes(not 100%) can reproduce it,it still need to hotplug/unhotplug about > > 1000 times when reproducing. > > This is not a realistic usage scenario test. PCI device hotplug occurs > asynchronous to the device_del command, so you could very well be trying to add > the device back before it's been removed. All hotplug testing should currently > be done with a delay between each operation. if delay 1 second or 2 seconds between each operation. testing get the same result(about 1000 times).
(In reply to comment #14) > (In reply to comment #13) > > (In reply to comment #12) > > > > > > 2.without sleep between each hotplug and hotunplug operation > > > sometimes(not 100%) can reproduce it,it still need to hotplug/unhotplug about > > > 1000 times when reproducing. > > > > This is not a realistic usage scenario test. PCI device hotplug occurs > > asynchronous to the device_del command, so you could very well be trying to add > > the device back before it's been removed. All hotplug testing should currently > > be done with a delay between each operation. > > if delay 1 second or 2 seconds between each operation. testing get the same > result(about 1000 times). Is it also a segfault? Can you run in gdb and provide the backtrace to see if it's the same as Comment 0?
(In reply to comment #16) > (In reply to comment #14) > > (In reply to comment #13) > > > (In reply to comment #12) > > > > > > > > 2.without sleep between each hotplug and hotunplug operation > > > > sometimes(not 100%) can reproduce it,it still need to hotplug/unhotplug about > > > > 1000 times when reproducing. > > > > > > This is not a realistic usage scenario test. PCI device hotplug occurs > > > asynchronous to the device_del command, so you could very well be trying to add > > > the device back before it's been removed. All hotplug testing should currently > > > be done with a delay between each operation. > > > > if delay 1 second or 2 seconds between each operation. testing get the same > > result(about 1000 times). > > Is it also a segfault? Can you run in gdb and provide the backtrace to see if > it's the same as Comment 0? Sorry my previous comments confuse you, clarification. works well after 1000 times hot plug/unplug with your build.
verify bug with qemu-kvm-0.12.1.2-2.231.el6 qemu and guest work well. so this bug is fixed.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: Run a guest and then hot-plug/hot-unplug USB controller more than 1000 times. Consequence: Qemu-kvm core dumps Fix: Implemented unregistering of MMIO BARs. The BARs were present and never unregistered which caused leak. Results: Qemu-kvm keeps running and USB controller hot-plug and hot-unplug keeps working.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2012-0746.html