Bug 1170871

Summary: qemu core dumped when unhotplug gpu card assigned to guest
Product: Red Hat Enterprise Linux 7 Reporter: Lin Chen <linchen>
Component: qemu-kvm-rhevAssignee: Alex Williamson <alex.williamson>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 7.1CC: hhuang, juzhang, linchen, lmiksik, michen, mrezanin, virt-maint, xfu
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.1.2-22.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1180942 (view as bug list) Environment:
Last Closed: 2015-03-05 09:59:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1180942    

Description Lin Chen 2014-12-05 03:21:08 UTC
Description of problem:
Boot a guest with a assigned gpu card, then unhotplug the gpu card inside qemu. Qemu core dumped.

Version-Release number of selected component (if applicable):
inside host:
  uname  -r
  3.10.0-211.el7.x86_64
  rpm -qa |grep qemu
  qemu-kvm-rhev-2.1.2-14.el7.x86_64


How reproducible:
100%

Steps to Reproduce:
1.Boot a guest with a assigned gpu card:
/usr/libexec/qemu-kvm ... -device vfio-pci,host=06:00.0,id=GPU-k1,addr=06.0

2.unhotplug the gpu card inside qemu
(qemu) device_del GPU-k1

Actual results:
qemu core dumped and get info as follows:
(gdb) bt
#0  0x000055555640f070 in ?? ()
#1  0x00005555556e131d in qemu_devices_reset () at vl.c:1840
#2  qemu_system_reset (report=report@entry=true) at vl.c:1853
#3  0x00005555555dcbb3 in main_loop_should_exit () at vl.c:1984
#4  main_loop () at vl.c:2024
#5  main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4607


Expected results:
unhotplug the gpu card successfully.

Additional info:

Comment 2 Lin Chen 2014-12-05 04:46:59 UTC
Additional, it is only for Linux guest ,and inside guest:
uname  -r
2.6.32-584.el6.x86_64

Comment 3 Alex Williamson 2014-12-05 14:08:34 UTC
Linux guests do not support GPU hot unplug, can this be reproduced with a Windows guest?  Please reproduce with the -debug qemu-kvm package.

Comment 4 Lin Chen 2014-12-08 09:25:54 UTC
(In reply to Alex Williamson from comment #3)
> Linux guests do not support GPU hot unplug, can this be reproduced with a
> Windows guest?  Please reproduce with the -debug qemu-kvm package.
Hi Alex,

1.For QE, even if Linux guests do not support GPU hot unplug, qemu shouldn't core dump.

2.QE tested it with a Windows guest and didn't hit the same issue.

3.where to download the -debug qemu-kvm package? There is only -debuginfo package inside brewweb.

Thanks.

Comment 5 Alex Williamson 2014-12-08 14:55:38 UTC
(In reply to Lin Chen from comment #4)
> (In reply to Alex Williamson from comment #3)
> > Linux guests do not support GPU hot unplug, can this be reproduced with a
> > Windows guest?  Please reproduce with the -debug qemu-kvm package.
> Hi Alex,
> 
> 1.For QE, even if Linux guests do not support GPU hot unplug, qemu shouldn't
> core dump.
> 
> 2.QE tested it with a Windows guest and didn't hit the same issue.
> 
> 3.where to download the -debug qemu-kvm package? There is only -debuginfo
> package inside brewweb.

Yes, debuginfo is what I mean so that we can get a more complete backtrace.

Comment 8 Alex Williamson 2015-01-12 05:01:00 UTC
Fixed by qemu.git b3e27c3aee8f5a96debfe0346e9c0e3a641a8516

A fairly effective test for this is to install the debuginfo package and run gdb on qemu before removing the device.  Set a breakpoint on vfio_intx_interrupt.  After removing the device with device_del or libvirt tools, vfio_intx_interrupt should not be called.  In the failing case, it continues to be called with the opaque data for the deleted device.  Since the error is continued use of an fd with freed data, reliable test cases can be temporary.

Comment 10 Miroslav Rezanina 2015-01-26 06:52:08 UTC
Fix included in qemu-kvm-rhev-2.1.2-22.el7

Comment 12 FuXiangChun 2015-01-27 12:37:59 UTC
QE tested bug with the latest qemu-kvm-rhev-2.1.2-22.el7.x86_64. The following are test scenarios and result.

S1. Assigned one GPU device(k1 or k2) to guest. then unhotplug it.

For windows(win7sp1 64bit) guest. qemu and guest work well

For RHEL7.1 guest. qemu and guest work well(guest kernel 3.10.0-226.el7.x86_64)

S2. Assigned two GPU devices(k1 and k2) to guest. then unhotplug one of them and restart guest.  

For windows(win7sp1 64big) guest. qemu and guest work well

For RHEL7.1 guest
result:guest kernel panic. I filed a new bug 1186194 to track it.

Comment 15 errata-xmlrpc 2015-03-05 09:59:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0624.html