RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 738519 - Core dump when hotplug/hotunplug usb controller more than 1000 times
Summary: Core dump when hotplug/hotunplug usb controller more than 1000 times
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm
Version: 6.2
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Alex Williamson
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-09-15 04:48 UTC by FuXiangChun
Modified: 2013-01-10 00:19 UTC (History)
10 users (show)

Fixed In Version: qemu-kvm-0.12.1.2-2.231.el6
Doc Type: Bug Fix
Doc Text:
Cause: Run a guest and then hot-plug/hot-unplug USB controller more than 1000 times. Consequence: Qemu-kvm core dumps Fix: Implemented unregistering of MMIO BARs. The BARs were present and never unregistered which caused leak. Results: Qemu-kvm keeps running and USB controller hot-plug and hot-unplug keeps working.
Clone Of:
Environment:
Last Closed: 2012-06-20 11:34:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2012:0746 0 normal SHIPPED_LIVE qemu-kvm bug fix and enhancement update 2012-06-19 19:31:48 UTC

Description FuXiangChun 2011-09-15 04:48:30 UTC
Description of problem:
boot guest and hotplug/hotunplug usb controller >1000 times. qemu will core dump. 

Version-Release number of selected component (if applicable):
host info:
# uname -r
2.6.32-191.el6.x86_64
# rpm -qa|grep kvm
qemu-kvm-0.12.1.2-2.188.el6.x86_64

guest info:
rhel6.2 (64 bit)

How reproducible:
always

Steps to Reproduce:
1.unbind a usb controller in host
2.boot guest without usb controller
/usr/libexec/qemu-kvm  -m 4G -smp 4 -netdev tap,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:94:a3:8b -uuid 7c73a852-c316-4d61-b913-9dde17367a30  -drive file=/dev/migrate/data2,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-blk-pci0 -boot c -spice disable-ticketing,port=5911  -vga qxl -qmp tcp:0:6666,server,nowait 

3.hotplug/hotunplug usb controller 2000 times
  (1)device_add driver=pci-assign host=00:1d.0 id=usb100 iommu=1
  (2)device_del id=usb100

Actual results:
qemu core dump

Expected results:
guest work well

Additional info:

bt trace message:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7037700 (LWP 31970)]
0x0000000000470cc4 in slow_bar_readl (opaque=0x2157298, addr=44) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/device-assignment.c:195

(gdb) bt
#0  0x0000000000470cc4 in slow_bar_readl (opaque=0x2157298, addr=44) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/device-assignment.c:195
#1  0x00000000004eca2c in cpu_physical_memory_rw (addr=<value optimized out>, buf=<value optimized out>, len=4, is_write=0) at /usr/src/debug/qemu-kvm-0.12.1.2/exec.c:3546
#2  0x000000000042bd1c in handle_mmio (env=0x10903b0) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:868
#3  kvm_run (env=0x10903b0) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1020
#4  0x000000000042c009 in kvm_cpu_exec (env=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1699
#5  0x000000000042ce5f in kvm_main_loop_cpu (_env=0x10903b0) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1968
#6  ap_main_loop (_env=0x10903b0) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2018
#7  0x000000340f6077e1 in start_thread () from /lib64/libpthread.so.0
#8  0x000000340eee578d in clone () from /lib64/libc.so.6

Comment 2 Gerd Hoffmann 2011-09-15 14:26:06 UTC
"device_add driver=pci-assign host=00:1d.0 id=usb100 iommu=1"

That looks more a pci passthru than a usb emulation issue, reassigning ...

Comment 3 FuXiangChun 2011-09-16 10:39:19 UTC
added sleep 5 seconds between hotplug and hot-unplug, and add sleep 5 seconds before every times hot-plug as well.  but it is still core dump

Comment 4 Alex Williamson 2011-09-20 19:03:49 UTC
Can you please provide output of 'sudo lspci -vvv -s 1d.0' to identify the USB device being used?

Comment 5 FuXiangChun 2011-09-21 01:42:50 UTC
(In reply to comment #4)
> Can you please provide output of 'sudo lspci -vvv -s 1d.0' to identify the USB
> device being used?

# lspci -vvv -s 00:1a.0
00:1a.0 USB Controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 04) (prog-if 20 [EHCI])
	Subsystem: Dell Device 0498
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 16
	Region 0: Memory at dad70000 (32-bit, non-prefetchable) [size=1K]
	Capabilities: [50] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [58] Debug port: BAR=1 offset=00a0
	Capabilities: [98] PCI Advanced Features
		AFCap: TP+ FLR+
		AFCtrl: FLR-
		AFStatus: TP-
	Kernel driver in use: ehci_hcd

Comment 6 Alex Williamson 2011-09-21 02:23:02 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > Can you please provide output of 'sudo lspci -vvv -s 1d.0' to identify the USB
> > device being used?
> 
> # lspci -vvv -s 00:1a.0
> 00:1a.0 USB Controller: Intel Corporation 6 Series/C200 Series Chipset Family

Comment 0 indicates device 00:1d.0 is being used, can you please confirm which device caused the problem, or maybe they both can trigger the bug?  Thanks.

Comment 7 FuXiangChun 2011-09-21 02:45:35 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > (In reply to comment #4)
> > > Can you please provide output of 'sudo lspci -vvv -s 1d.0' to identify the USB
> > > device being used?
> > 
> > # lspci -vvv -s 00:1a.0
> > 00:1a.0 USB Controller: Intel Corporation 6 Series/C200 Series Chipset Family
> 
> Comment 0 indicates device 00:1d.0 is being used, can you please confirm which
> device caused the problem, or maybe they both can trigger the bug?  Thanks.

sorry, just confirmed it again. device 00:1d.0 is being used.

# lspci -vvv -s 00:1d.0
00:1d.0 USB Controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 04) (prog-if 20 [EHCI])
    Subsystem: Dell Device 0498
    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
    Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0
    Interrupt: pin A routed to IRQ 17
    Region 0: Memory at dad50000 (32-bit, non-prefetchable) [size=1K]
    Capabilities: [50] Power Management version 2
        Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
        Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [58] Debug port: BAR=1 offset=00a0
    Capabilities: [98] PCI Advanced Features
        AFCap: TP+ FLR+
        AFCtrl: FLR-
        AFStatus: TP-
    Kernel driver in use: ehci_hcd

Comment 10 Alex Williamson 2012-02-06 21:00:00 UTC
Please re-test with this qemu-kvm rpm:

https://brewweb.devel.redhat.com/taskinfo?taskID=4012380

I was able to reproduce the result, but not the exact scenario you describe in comment 0.  The bug I found is a resource leak that results in a segfault once we overflow an internal resource.  The inconsistency with your report is that this will occur at ~500 hotplug/unplug operations, not 1000 or 2000 as indicated here.  Were you only able to get these high counts when not using a sleep between each hotplug and hotunplug operation?  In comment 3 you indicate you added a sleep 5 for each, did you then get a failure after approximately 500 operations?

Comment 11 FuXiangChun 2012-02-13 09:38:33 UTC
(In reply to comment #10)
> Please re-test with this qemu-kvm rpm:
> 
> https://brewweb.devel.redhat.com/taskinfo?taskID=4012380
> 
> I was able to reproduce the result, but not the exact scenario you describe in
> comment 0.  The bug I found is a resource leak that results in a segfault once
> we overflow an internal resource.  The inconsistency with your report is that
> this will occur at ~500 hotplug/unplug operations, not 1000 or 2000 as
> indicated here.  Were you only able to get these high counts when not using a
> sleep between each hotplug and hotunplug operation?  In comment 3 you indicate
> you added a sleep 5 for each, did you then get a failure after approximately
> 500 operations?

Sorry, so late reply to you.  since I cann't reproduce this bug except SandBridge host. I will as soon as possible to take SandBridge host and re-test this bug.

Comment 12 FuXiangChun 2012-02-16 05:30:07 UTC
(In reply to comment #10)
> Please re-test with this qemu-kvm rpm:
> 
> https://brewweb.devel.redhat.com/taskinfo?taskID=4012380
> 
> I was able to reproduce the result, but not the exact scenario you describe in
> comment 0.  The bug I found is a resource leak that results in a segfault once
> we overflow an internal resource.  The inconsistency with your report is that
> this will occur at ~500 hotplug/unplug operations, not 1000 or 2000 as
> indicated here.  Were you only able to get these high counts when not using a
> sleep between each hotplug and hotunplug operation?  In comment 3 you indicate
> you added a sleep 5 for each, did you then get a failure after approximately
> 500 operations?

testing scenarios:
1.I re-tested this bug with below qemu. test result: qemu works well(no core dump)
 https://brewweb.devel.redhat.com/taskinfo?taskID=4012380
  
2.without sleep between each hotplug and hotunplug operation
 sometimes(not 100%) can reproduce it,it still need to hotplug/unhotplug about 1000 times when reproducing.

Comment 13 Alex Williamson 2012-02-16 05:47:04 UTC
(In reply to comment #12)
> 
> 2.without sleep between each hotplug and hotunplug operation
>  sometimes(not 100%) can reproduce it,it still need to hotplug/unhotplug about
> 1000 times when reproducing.

This is not a realistic usage scenario test.  PCI device hotplug occurs asynchronous to the device_del command, so you could very well be trying to add the device back before it's been removed.  All hotplug testing should currently be done with a delay between each operation.

Comment 14 FuXiangChun 2012-02-16 07:36:36 UTC
(In reply to comment #13)
> (In reply to comment #12)
> > 
> > 2.without sleep between each hotplug and hotunplug operation
> >  sometimes(not 100%) can reproduce it,it still need to hotplug/unhotplug about
> > 1000 times when reproducing.
> 
> This is not a realistic usage scenario test.  PCI device hotplug occurs
> asynchronous to the device_del command, so you could very well be trying to add
> the device back before it's been removed.  All hotplug testing should currently
> be done with a delay between each operation.

if delay 1 second or 2 seconds between each operation. testing get the same result(about 1000 times).

Comment 16 Alex Williamson 2012-02-16 13:20:40 UTC
(In reply to comment #14)
> (In reply to comment #13)
> > (In reply to comment #12)
> > > 
> > > 2.without sleep between each hotplug and hotunplug operation
> > >  sometimes(not 100%) can reproduce it,it still need to hotplug/unhotplug about
> > > 1000 times when reproducing.
> > 
> > This is not a realistic usage scenario test.  PCI device hotplug occurs
> > asynchronous to the device_del command, so you could very well be trying to add
> > the device back before it's been removed.  All hotplug testing should currently
> > be done with a delay between each operation.
> 
> if delay 1 second or 2 seconds between each operation. testing get the same
> result(about 1000 times).

Is it also a segfault?  Can you run in gdb and provide the backtrace to see if it's the same as Comment 0?

Comment 19 FuXiangChun 2012-02-17 01:28:51 UTC
(In reply to comment #16)
> (In reply to comment #14)
> > (In reply to comment #13)
> > > (In reply to comment #12)
> > > > 
> > > > 2.without sleep between each hotplug and hotunplug operation
> > > >  sometimes(not 100%) can reproduce it,it still need to hotplug/unhotplug about
> > > > 1000 times when reproducing.
> > > 
> > > This is not a realistic usage scenario test.  PCI device hotplug occurs
> > > asynchronous to the device_del command, so you could very well be trying to add
> > > the device back before it's been removed.  All hotplug testing should currently
> > > be done with a delay between each operation.
> > 
> > if delay 1 second or 2 seconds between each operation. testing get the same
> > result(about 1000 times).
> 
> Is it also a segfault?  Can you run in gdb and provide the backtrace to see if
> it's the same as Comment 0?

Sorry my previous comments confuse you, clarification. works well after 1000 times hot plug/unplug with your build.

Comment 20 FuXiangChun 2012-02-17 09:46:10 UTC
verify bug with qemu-kvm-0.12.1.2-2.231.el6
qemu and guest work well.

so this bug is fixed.

Comment 22 Michal Novotny 2012-05-03 17:38:28 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause:
Run a guest and then hot-plug/hot-unplug USB controller more than 1000 times.

Consequence:
Qemu-kvm core dumps

Fix:
Implemented unregistering of MMIO BARs. The BARs were present and never unregistered which caused leak.

Results:
Qemu-kvm keeps running and USB controller hot-plug and hot-unplug keeps working.

Comment 23 errata-xmlrpc 2012-06-20 11:34:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0746.html


Note You need to log in before you can comment on or make changes to this bug.