Bug 736214

Summary:	PCI device can not be reattach to host automatically if attach-device failed with "managed=yes"
Product:	Red Hat Enterprise Linux 6	Reporter:	weizhang <weizhan>
Component:	libvirt	Assignee:	Osier Yang <jyang>
Status:	CLOSED ERRATA	QA Contact:	Virtualization Bugs <virt-bugs>
Severity:	medium	Docs Contact:
Priority:	high
Version:	6.2	CC:	ajia, dallan, dyuan, eblake, jyang, mvadkert, mzhan, rwu, veillard, ydu
Target Milestone:	rc	Keywords:	Regression
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	libvirt-0.9.4-18.el6	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	746355 (view as bug list)		Environment:
Last Closed:	2011-12-06 11:28:33 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	746355, 748554

Description weizhang 2011-09-07 03:34:18 UTC

Description of problem:
PCI device can not be reattach to host automatically if attach-device failed with "managed=yes", also need manually do # virsh nodedev-reattach pci_0000_00_19_0

Version-Release number of selected component (if applicable):
kernel-2.6.32-192.el6.x86_64
qemu-kvm-0.12.1.2-2.184.el6.x86_64
libvirt-0.9.4-7.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Turn OFF VT-d in bios
2. do # modprobe -r kvm_intel; modprobe -r kvm; modprobe kvm allow_unsafe_assigned_interrupts=1 ;modprobe kvm_intel
3. Start a guest
4. Attach pci device
cat nodedev.xml
          <hostdev mode='subsystem' type='pci' managed='yes'>
            <source>
              <address bus='0' slot='25' function='0'/>
            </source>
          </hostdev>
# virsh attach-device <guest> nodedev.xml
will report error like
error: Failed to attach device from nodedev.xml
error: internal error unable to execute QEMU command 'device_add': Device 'pci-assign' could not be initialized

5. Check if the pci device is auto reattached to host


Actual results:
the pci device can not be reattached to host automatically

Expected results:
the pci device can be reattached to host automatically

Additional info:
on libvirt-0.8.7-18, with the same steps and the same qemu-kvm and kernel version, the pci device can be reattach automatically, so it is a regression bug.

Comment 6 Osier Yang 2011-09-28 04:34:26 UTC

Snip from c#14 of https://bugzilla.redhat.com/show_bug.cgi?id=736437

<snip>
as far as I understand, it should not be problem of libvirt, as we
do the reprobing like:

#  echo 0000\:00\:19.0 >  /sys/bus/pci/drivers_probe

With playing on my own network card, we reattach the device to host and reprobe
the driver indeed when detaching a device or destroy a domain, or fails on
attaching a device, etc. But the result is unpredictable, even reprobing
manually using command like above can't get the original driver for the device
sometimes.

So, IMO the problem might be from kernel, or we 
</snip>

Comment 7 Osier Yang 2011-09-28 06:16:20 UTC

Found the problem, qemu-kvm maps the PCI bar(s) on host when doing PCI passthrough, however it doesn't clean up it even more than 20 mins after monitor command "device_del" was executed successfully. This cause the kernel fails on probing the original driver for the device. (libvirt uses command like "# echo 0000:00:19.0 > /sys/bus/pci/drivers_probe" to reprobe the driver, the command will be success even if it fails on find the original driver actually. So libvirt returns success too with the device doesn't showup on host).

Libvirt loops 100 times, and each time sleeps 100 * 1000 ms to wait qemu-kvm clean up the mapping. However, it looks like qemu-kvm won't clean up it forever. (seems so ;-))

If the guest is destroyed, the mapping will be cleaned up immediately, thus the device driver will be found successfully, and one can reattach the device to host.

XML of hostdev (NIC):
==================

  <hostdev mode='subsystem' type='pci' managed='yes'>
    <source>
      <address domain='0x0000' bus='0x00' slot='0x19' function='0x0'/>
    </source>
  </hostdev>

Demo of the problem:
===================

# cat /proc/iomem | grep kvm -B 2
#

# virsh start test
Domain test started

# cat /proc/iomem | grep kvm -B 2
# 

# virsh attach-device test hostdev.xml 
Device attached successfully

# cat /proc/iomem | grep kvm -B 2
      f2500000-f2501fff : iwlagn
  f2600000-f261ffff : 0000:00:19.0
    f2600000-f261ffff : kvm_assigned_device
--
  f2624000-f2624fff : 0000:00:03.3
  f2625000-f2625fff : 0000:00:19.0
    f2625000-f2625fff : kvm_assigned_device

# virsh detach-device test hostdev.xml 
Device detached successfully

# date
Wed Sep 28 13:41:18 CST 2011

# cat /proc/iomem | grep kvm -B 2
      f2500000-f2501fff : iwlagn
  f2600000-f261ffff : 0000:00:19.0
    f2600000-f261ffff : kvm_assigned_device
--
  f2624000-f2624fff : 0000:00:03.3
  f2625000-f2625fff : 0000:00:19.0
    f2625000-f2625fff : kvm_assigned_device

# date
Wed Sep 28 13:42:32 CST 2011

# cat /proc/iomem | grep kvm -B 2
      f2500000-f2501fff : iwlagn
  f2600000-f261ffff : 0000:00:19.0
    f2600000-f261ffff : kvm_assigned_device
--
  f2624000-f2624fff : 0000:00:03.3
  f2625000-f2625fff : 0000:00:19.0
    f2625000-f2625fff : kvm_assigned_device

# sleep 600; date; cat /proc/iomem | grep kvm -B 2
Wed Sep 28 13:53:20 CST 2011
      f2500000-f2501fff : iwlagn
  f2600000-f261ffff : 0000:00:19.0
    f2600000-f261ffff : kvm_assigned_device
--
  f2624000-f2624fff : 0000:00:03.3
  f2625000-f2625fff : 0000:00:19.0
    f2625000-f2625fff : kvm_assigned_device

# date
Wed Sep 28 14:07:50 CST 2011

# cat /proc/iomem | grep kvm -B 2
      f2500000-f2501fff : iwlagn
  f2600000-f261ffff : 0000:00:19.0
    f2600000-f261ffff : kvm_assigned_device
--
  f2624000-f2624fff : 0000:00:03.3
  f2625000-f2625fff : 0000:00:19.0
    f2625000-f2625fff : kvm_assigned_device

# echo 0000\:00\:19.0 >  /sys/bus/pci/drivers_probe

# ifconfig eth0
eth0: error fetching interface information: Device not found

# virsh destroy test
Domain test destroyed

# cat /proc/iomem | grep kvm -B 2
#

# echo 0000\:00\:19.0 >  /sys/bus/pci/drivers_probe

# ifconfig eth0
eth0      Link encap:Ethernet  HWaddr XX:XX:XX:XX:XX:XX  
          inet6 addr: fe80::21f:16ff:fe2e:74c4/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 b)  TX bytes:586 (586.0 b)
          Interrupt:20 Memory:f2600000-f2620000

Comment 8 Osier Yang 2011-09-28 06:19:31 UTC

(In reply to comment #7)
> Found the problem, qemu-kvm maps the PCI bar(s) on host when doing PCI
> passthrough, however it doesn't clean up it even more than 20 mins after
> monitor command "device_del" was executed successfully. This cause the kernel
> fails on probing the original driver for the device. (libvirt uses command like
> "# echo 0000:00:19.0 > /sys/bus/pci/drivers_probe" to reprobe the driver, the
> command will be success even if it fails on find the original driver actually.
> So libvirt returns success too with the device doesn't showup on host).
> 
> Libvirt loops 100 times, and each time sleeps 100 * 1000 ms to wait qemu-kvm
> clean up the mapping. 

After that, libvirt will force to probe the original driver.

> However, it looks like qemu-kvm won't clean up it forever. (seems so ;-))

Comment 9 Osier Yang 2011-09-28 07:19:07 UTC

(In reply to comment #7)
> Found the problem, qemu-kvm maps the PCI bar(s) on host when doing PCI
> passthrough, however it doesn't clean up it even more than 20 mins after
> monitor command "device_del" was executed successfully. This cause the kernel
> fails on probing the original driver for the device. (libvirt uses command like
> "# echo 0000:00:19.0 > /sys/bus/pci/drivers_probe" to reprobe the driver, the
> command will be success even if it fails on find the original driver actually.
> So libvirt returns success too with the device doesn't showup on host).
> 
> Libvirt loops 100 times, and each time sleeps 100 * 1000 ms to wait qemu-kvm
> clean up the mapping. However, it looks like qemu-kvm won't clean up it
> forever. (seems so ;-))
> 
> If the guest is destroyed, the mapping will be cleaned up immediately, thus the
> device driver will be found successfully, and one can reattach the device to
> host.

Forget about above for this bug, I was confused with 736437 completely, commented to wrong place.

Patch for this BZ is posted to upstream:

https://www.redhat.com/archives/libvir-list/2011-September/msg01101.html

Comment 10 Osier Yang 2011-10-11 13:51:52 UTC

patch v2 posted to upstream

http://www.redhat.com/archives/libvir-list/2011-October/msg00359.html

Comment 12 Eric Blake 2011-10-14 21:48:39 UTC

*** Bug 736437 has been marked as a duplicate of this bug. ***

Comment 13 Eric Blake 2011-10-14 21:55:11 UTC

In testing this bug, I found a virt-manager bug.  If you use the virt-manager gui to attempt the hostdev hotplug, then virt-manager manually calls virNodeDeviceDettach prior to attempting the hotplug, even though it uses managed=yes for the hotplug.  This has the unfortunate result that the device is _already_ bound to pci-stub before the hotplug attempt, so even if the hotplug attempt fails, libvirt is now faithfully restoring the device to it's pre-hotplug state (still pci-stub), but virt-manager never calls virNodeDeviceReAttach after a failed hotplug.

Either virt-manager should use virNodeDeviceReAttach on failure, or since it is already relying on managed=yes, it should not be using virNodeDeviceDettach in the first place.  I suspect that the real solution may be somewhere in between: first assume that it is talking to a newer server, and try managed=yes, but if the server is pre-0.6.1 (where managed=yes is ignored but the attempt will fail), then fall back to using virNodeDevice{De,ReA}ttach.

Comment 14 Eric Blake 2011-10-14 22:43:40 UTC

In POST:
http://post-office.corp.redhat.com/archives/rhvirt-patches/2011-October/msg00564.html

Comment 17 yanbing du 2011-10-19 06:46:35 UTC

Rpm packages version:
libvirt-0.9.4-18.el6.x86_64
qemu-kvm-0.12.1.2-2.199.el6.x86_64
kernel-2.6.32-211.el6.x86_64

Following the reproduce in bug description, PCI device can reattach to host automatically if attach-device failed with "managed=yes", So bug verified.

Comment 18 errata-xmlrpc 2011-12-06 11:28:33 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1513.html