Bug 822373

Summary: libvirtd will crash when tight loop of hotplug/unplug PCI device to guest without managed=yes
Product: Red Hat Enterprise Linux 6 Reporter: EricLee <bili>
Component: libvirtAssignee: Peter Krempa <pkrempa>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.3CC: acathrow, ajia, dallan, dyasny, dyuan, mzhan, rwu, syeghiay, veillard, weizhan, whuang
Target Milestone: rcKeywords: ZStream
Target Release: 6.3   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-0.9.13-2.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-21 07:14:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 836916    
Attachments:
Description Flags
libvirtd crash log none

Description EricLee 2012-05-17 08:01:58 UTC
Created attachment 585138 [details]
libvirtd crash log

Description of problem:
libvirtd will crash when tight loop of hotplug/unplug PCI device to guest without managed=yes

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-0.12.1.2-2.292.el6.x86_64
libvirt-0.9.10-20.el6.x86_64
kernel-2.6.32-272.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
Setup

1. Enable VT-d on your host                                              
2. edit the /boot/grub/grub.conf like this
Add the kernel option 'intel_iommu=on'
Reset machine.
3. Run command:
    modprobe -r kvm_intel
    modprobe -r kvm
    modprobe kvm allow_unsafe_assigned_interrupts=1
    modprobe kvm_intel

Actions

1. Start a guest (RHEL6)
2. # lspci |grep Eth
3. Select one PCI device
# lspci -n | grep 02:00.0
4. # virsh nodedev-dettach pci_0000_02_00_0
 Detach both of nic device in the same bus:
   # virsh nodedev-dettach pci_0000_02_00_1
5. # virsh nodedev-reset pci_0000_02_00_0
6. # virsh nodedev-dumpxml pci_0000_02_00_0
7. Edit vtd.xml (according to the bus,slot,function number you got from nodedev-dumpxml command)

   <hostdev mode='subsystem' type='pci'>
            <source>
                <address bus='0x02' slot='0' function='0'/>
            </source>
    </hostdev>

8.run the following script:
   # cat script.sh
   #!/bin/sh
   for ((i=0; i < 300; i++))
   do
   echo $i
   virsh attach-device test vtd.xml
   sleep 5
   virsh detach-device test vtd.xml
   sleep 5
   done
   # sh script.sh

Actual results:
First 1,2 or more times will succeed.
But for 3 or later:
3 
Device attached successfully

error: Failed to detach device from vtd.xml
error: End of file while reading data: Input/output error

4
error: Failed to reconnect to the hypervisor
error: no valid connection
error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused

and 
# service libvirtd status
libvirtd dead but pid file exists


Expected results:
loop hotplug/unhotplug successfully and libvirtd is alive

Additional info:

Comment 6 Peter Krempa 2012-05-23 08:29:30 UTC
Fixed upstream:

commit db19417fc012416639c2230e5f19717b84245ce5
Author: Peter Krempa <pkrempa>
Date:   Mon May 21 16:31:53 2012 +0200

    qemu_hotplug: Don't free the PCI device structure after hot-unplug
    
    The pciDevice structure corresponding to the device being hot-unplugged
    was freed after it was "stolen" from activeList. The pointer was still
    used for eg-inactive list. This patch removes the free of the structure
    and frees it only if reset fails on the device.

Comment 15 EricLee 2012-07-05 12:40:18 UTC
Verified pass with versions:
# rpm -qa libvirt qemu-kvm kernel
libvirt-0.9.13-2.el6.x86_64
qemu-kvm-0.12.1.2-2.295.el6.x86_64
kernel-2.6.32-279.el6.x86_64

As the steps of Description.

300 times loop attach/detach all successfully.

So set status to VERIFIED.

Comment 16 errata-xmlrpc 2013-02-21 07:14:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0276.html