Bug 738519
Summary: | Core dump when hotplug/hotunplug usb controller more than 1000 times | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | FuXiangChun <xfu> |
Component: | qemu-kvm | Assignee: | Alex Williamson <alex.williamson> |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 6.2 | CC: | acathrow, juzhang, knoel, michen, minovotn, mkenneth, qzhou, shu, tburke, virt-maint |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | qemu-kvm-0.12.1.2-2.231.el6 | Doc Type: | Bug Fix |
Doc Text: |
Cause:
Run a guest and then hot-plug/hot-unplug USB controller more than 1000 times.
Consequence:
Qemu-kvm core dumps
Fix:
Implemented unregistering of MMIO BARs. The BARs were present and never unregistered which caused leak.
Results:
Qemu-kvm keeps running and USB controller hot-plug and hot-unplug keeps working.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2012-06-20 11:34:24 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
FuXiangChun
2011-09-15 04:48:30 UTC
"device_add driver=pci-assign host=00:1d.0 id=usb100 iommu=1" That looks more a pci passthru than a usb emulation issue, reassigning ... added sleep 5 seconds between hotplug and hot-unplug, and add sleep 5 seconds before every times hot-plug as well. but it is still core dump Can you please provide output of 'sudo lspci -vvv -s 1d.0' to identify the USB device being used? (In reply to comment #4) > Can you please provide output of 'sudo lspci -vvv -s 1d.0' to identify the USB > device being used? # lspci -vvv -s 00:1a.0 00:1a.0 USB Controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 04) (prog-if 20 [EHCI]) Subsystem: Dell Device 0498 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 16 Region 0: Memory at dad70000 (32-bit, non-prefetchable) [size=1K] Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] Debug port: BAR=1 offset=00a0 Capabilities: [98] PCI Advanced Features AFCap: TP+ FLR+ AFCtrl: FLR- AFStatus: TP- Kernel driver in use: ehci_hcd (In reply to comment #5) > (In reply to comment #4) > > Can you please provide output of 'sudo lspci -vvv -s 1d.0' to identify the USB > > device being used? > > # lspci -vvv -s 00:1a.0 > 00:1a.0 USB Controller: Intel Corporation 6 Series/C200 Series Chipset Family Comment 0 indicates device 00:1d.0 is being used, can you please confirm which device caused the problem, or maybe they both can trigger the bug? Thanks. (In reply to comment #6) > (In reply to comment #5) > > (In reply to comment #4) > > > Can you please provide output of 'sudo lspci -vvv -s 1d.0' to identify the USB > > > device being used? > > > > # lspci -vvv -s 00:1a.0 > > 00:1a.0 USB Controller: Intel Corporation 6 Series/C200 Series Chipset Family > > Comment 0 indicates device 00:1d.0 is being used, can you please confirm which > device caused the problem, or maybe they both can trigger the bug? Thanks. sorry, just confirmed it again. device 00:1d.0 is being used. # lspci -vvv -s 00:1d.0 00:1d.0 USB Controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 04) (prog-if 20 [EHCI]) Subsystem: Dell Device 0498 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 17 Region 0: Memory at dad50000 (32-bit, non-prefetchable) [size=1K] Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] Debug port: BAR=1 offset=00a0 Capabilities: [98] PCI Advanced Features AFCap: TP+ FLR+ AFCtrl: FLR- AFStatus: TP- Kernel driver in use: ehci_hcd Please re-test with this qemu-kvm rpm: https://brewweb.devel.redhat.com/taskinfo?taskID=4012380 I was able to reproduce the result, but not the exact scenario you describe in comment 0. The bug I found is a resource leak that results in a segfault once we overflow an internal resource. The inconsistency with your report is that this will occur at ~500 hotplug/unplug operations, not 1000 or 2000 as indicated here. Were you only able to get these high counts when not using a sleep between each hotplug and hotunplug operation? In comment 3 you indicate you added a sleep 5 for each, did you then get a failure after approximately 500 operations? (In reply to comment #10) > Please re-test with this qemu-kvm rpm: > > https://brewweb.devel.redhat.com/taskinfo?taskID=4012380 > > I was able to reproduce the result, but not the exact scenario you describe in > comment 0. The bug I found is a resource leak that results in a segfault once > we overflow an internal resource. The inconsistency with your report is that > this will occur at ~500 hotplug/unplug operations, not 1000 or 2000 as > indicated here. Were you only able to get these high counts when not using a > sleep between each hotplug and hotunplug operation? In comment 3 you indicate > you added a sleep 5 for each, did you then get a failure after approximately > 500 operations? Sorry, so late reply to you. since I cann't reproduce this bug except SandBridge host. I will as soon as possible to take SandBridge host and re-test this bug. (In reply to comment #10) > Please re-test with this qemu-kvm rpm: > > https://brewweb.devel.redhat.com/taskinfo?taskID=4012380 > > I was able to reproduce the result, but not the exact scenario you describe in > comment 0. The bug I found is a resource leak that results in a segfault once > we overflow an internal resource. The inconsistency with your report is that > this will occur at ~500 hotplug/unplug operations, not 1000 or 2000 as > indicated here. Were you only able to get these high counts when not using a > sleep between each hotplug and hotunplug operation? In comment 3 you indicate > you added a sleep 5 for each, did you then get a failure after approximately > 500 operations? testing scenarios: 1.I re-tested this bug with below qemu. test result: qemu works well(no core dump) https://brewweb.devel.redhat.com/taskinfo?taskID=4012380 2.without sleep between each hotplug and hotunplug operation sometimes(not 100%) can reproduce it,it still need to hotplug/unhotplug about 1000 times when reproducing. (In reply to comment #12) > > 2.without sleep between each hotplug and hotunplug operation > sometimes(not 100%) can reproduce it,it still need to hotplug/unhotplug about > 1000 times when reproducing. This is not a realistic usage scenario test. PCI device hotplug occurs asynchronous to the device_del command, so you could very well be trying to add the device back before it's been removed. All hotplug testing should currently be done with a delay between each operation. (In reply to comment #13) > (In reply to comment #12) > > > > 2.without sleep between each hotplug and hotunplug operation > > sometimes(not 100%) can reproduce it,it still need to hotplug/unhotplug about > > 1000 times when reproducing. > > This is not a realistic usage scenario test. PCI device hotplug occurs > asynchronous to the device_del command, so you could very well be trying to add > the device back before it's been removed. All hotplug testing should currently > be done with a delay between each operation. if delay 1 second or 2 seconds between each operation. testing get the same result(about 1000 times). (In reply to comment #14) > (In reply to comment #13) > > (In reply to comment #12) > > > > > > 2.without sleep between each hotplug and hotunplug operation > > > sometimes(not 100%) can reproduce it,it still need to hotplug/unhotplug about > > > 1000 times when reproducing. > > > > This is not a realistic usage scenario test. PCI device hotplug occurs > > asynchronous to the device_del command, so you could very well be trying to add > > the device back before it's been removed. All hotplug testing should currently > > be done with a delay between each operation. > > if delay 1 second or 2 seconds between each operation. testing get the same > result(about 1000 times). Is it also a segfault? Can you run in gdb and provide the backtrace to see if it's the same as Comment 0? (In reply to comment #16) > (In reply to comment #14) > > (In reply to comment #13) > > > (In reply to comment #12) > > > > > > > > 2.without sleep between each hotplug and hotunplug operation > > > > sometimes(not 100%) can reproduce it,it still need to hotplug/unhotplug about > > > > 1000 times when reproducing. > > > > > > This is not a realistic usage scenario test. PCI device hotplug occurs > > > asynchronous to the device_del command, so you could very well be trying to add > > > the device back before it's been removed. All hotplug testing should currently > > > be done with a delay between each operation. > > > > if delay 1 second or 2 seconds between each operation. testing get the same > > result(about 1000 times). > > Is it also a segfault? Can you run in gdb and provide the backtrace to see if > it's the same as Comment 0? Sorry my previous comments confuse you, clarification. works well after 1000 times hot plug/unplug with your build. verify bug with qemu-kvm-0.12.1.2-2.231.el6 qemu and guest work well. so this bug is fixed. Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: Run a guest and then hot-plug/hot-unplug USB controller more than 1000 times. Consequence: Qemu-kvm core dumps Fix: Implemented unregistering of MMIO BARs. The BARs were present and never unregistered which caused leak. Results: Qemu-kvm keeps running and USB controller hot-plug and hot-unplug keeps working. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2012-0746.html |