RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 614811 - Reattach PCI device which is in use by guest to host will cause host restart
Summary: Reattach PCI device which is in use by guest to host will cause host restart
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt
Version: 6.0
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Osier Yang
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On: 667609
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-07-15 10:19 UTC by dyuan
Modified: 2011-07-12 03:04 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-06-17 18:10:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description dyuan 2010-07-15 10:19:54 UTC
Description of problem:

hotplug/passthrough a VF to guest, then reattach the VF on host, host will restart.

Version-Release number of selected component (if applicable):
libvirt-0.8.1-15.el6
kernel-2.6.32-49.el6


How reproducible:
always

Steps to Reproduce:
1. hotplug/passthrough a VF to guest

<hostdev mode='subsystem' type='pci' managed='yes'>
    <source>
       <address bus='0' slot='16' function='0'/>
    </source>
</hostdev>

2. when the guest is running, reattach the VF to host
# virsh nodedev-reattach pci_0000_03_10_0
Device pci_0000_03_10_0 re-attached

  
Actual results:
host restart.

Expected results:
it should pop up a fail warining info, such as 'device is in use'

Additional info:

Comment 2 RHEL Program Management 2010-07-15 14:17:47 UTC
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release. It has
been denied for the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Comment 3 Dave Allan 2010-08-19 19:18:02 UTC
I'm removing SR-IOV from the summary as this behavior is not specific to SR-IOV devices.

Comment 4 Osier Yang 2010-12-08 02:30:02 UTC
Hi dyuan

Would you like to see if there are some useful logs in /var/log/messages?

- Osier

Comment 5 dyuan 2010-12-13 08:55:22 UTC
libvirt-0.8.1-27.el6
qemu-kvm-0.12.1.2-2.113.el6
kernel-2.6.32-71.el6

Tested with the normal pci, the host didn't reboot and I'll re-test it with VF.

# cat /var/log/messages
Dec 13 16:52:23 dhcp-93-197 kernel: e1000e 0000:00:19.0: BAR 0: can't reserve mem region [0xfc300000-0xfc31ffff]
Dec 13 16:52:23 dhcp-93-197 kernel: e1000e: probe of 0000:00:19.0 failed with error -16

Comment 6 dyuan 2010-12-13 10:58:43 UTC
Tested with VF, the host reboot and no output in /var/log/messages.

Before reattach the pci device, need to confirm the pci device is using by guest.

-----------------------------------
Additional info for comment 5:

Dec 13 18:54:12 dhcp-93-197 kernel: e1000e 0000:00:19.0: BAR 0: can't reserve
mem region [0xfc300000-0xfc31ffff]
Dec 13 18:54:12 dhcp-93-197 kernel: e1000e: probe of 0000:00:19.0 failed with
error -16
Dec 13 18:54:13 dhcp-93-197 kernel: Uhhuh. NMI received for unknow reason b1 on CPU 0.
Dec 13 18:54:13 dhcp-93-197 kernel: You have some hardware problem, likely on the PCI bus.
Dec 13 18:54:13 dhcp-93-197 kernel: Dazed and confused, but trying to contunue
Dec 13 18:54:13 dhcp-93-197 kernel: DRHD: handling fault status reg 3
Dec 13 18:54:13 dhcp-93-197 kernel: DMAR:[DMA Write] Request device [00:19.0] fault addr 34b16000
Dec 13 18:54:13 dhcp-93-197 kernel: DMAR:[fault reason 02] Present bit in context entry is clear

and the 'driver' disappeared in dir /sys/bus/pci/devices/0000\:00\:19.0/

Comment 7 Osier Yang 2011-01-06 08:27:18 UTC
We can't determine if a PCI device is in use or not from libvirt side.

Even when the device is bond to some drivers, e.g. pcistub, it's probly not used actually, for example, 1) attach device to guest, 2) start guest, 3) destroy guest without detaching the device. Then the device is still bond to pcistub driver, but not used actually. And actually, libvirt bind the PCI device to pcistub (for qemu-kvm), and pciback(for xen) when dettaching the device from host. e.g. 

1) after dettaching the device from host.
[root@dhcp-92-51 ~]# ls /sys/bus/pci/devices/0000\:03\:10.0/ -l
total 0
-rw-r--r--. 1 root root  4096 2011-01-06 02:37 broken_parity_status
-r--r--r--. 1 root root  4096 2011-01-06 02:30 class
-rw-r--r--. 1 root root  4096 2011-01-06 02:30 config
-r--r--r--. 1 root root  4096 2011-01-06 02:30 device
lrwxrwxrwx. 1 root root     0 2011-01-06 03:12 driver -> ../../../../bus/pci/drivers/igbvf

[root@dhcp-92-51 ~]# ls /sys/bus/pci/devices/0000\:03\:10.0/ -l
total 0
-rw-r--r--. 1 root root  4096 2011-01-06 02:37 broken_parity_status
-r--r--r--. 1 root root  4096 2011-01-06 02:30 class
-rw-r--r--. 1 root root  4096 2011-01-06 02:30 config
-r--r--r--. 1 root root  4096 2011-01-06 02:30 device
lrwxrwxrwx. 1 root root     0 2011-01-06 03:14 driver -> ../../../../bus/pci/drivers/pci-stub


2) after destroying the guest without dettaching the device

[root@dhcp-92-51 ~]# virsh attach-device x86_64 osier/vf.xml
Device attached successfully

[root@dhcp-92-51 ~]# ls /sys/bus/pci/devices/0000\:03\:10.0/ -l
total 0
-rw-r--r--. 1 root root  4096 2011-01-06 02:37 broken_parity_status
-r--r--r--. 1 root root  4096 2011-01-06 02:30 class
-rw-r--r--. 1 qemu qemu  4096 2011-01-06 02:30 config
-r--r--r--. 1 root root  4096 2011-01-06 02:30 device
lrwxrwxrwx. 1 root root     0 2011-01-06 03:14 driver -> ../../../../bus/pci/drivers/pci-stub

[root@dhcp-92-51 ~]# virsh destroy x86_64
Domain x86_64 destroyed

[root@dhcp-92-51 ~]# ls /sys/bus/pci/devices/0000\:03\:10.0/ -l
total 0
-rw-r--r--. 1 root root  4096 2011-01-06 02:37 broken_parity_status
-r--r--r--. 1 root root  4096 2011-01-06 02:30 class
-rw-r--r--. 1 root root  4096 2011-01-06 02:30 config
-r--r--r--. 1 root root  4096 2011-01-06 02:30 device
lrwxrwxrwx. 1 root root     0 2011-01-06 03:14 driver -> ../../../../bus/pci/drivers/pci-stub

So, that means we can't determine if the device is in use or not by check "drivers". And also we can't determine it by "enable", it's just a counter to indicate how many times the device has been enabled, I can't find any other 
else method then.

It's unlike with xen, we could lookup if the PCI device is using by some domain via xenstore (http://www.redhat.com/archives/libvir-list/2011-January/msg00131.html, though this patch is not ready actually, but it demonstrates the method at least), we can't get any useful information from KVM/sysfs, so that we couldn't do anymore from libvirt side to resolve this problem.

We need something like "using" in sysfs for PCI device to indicate if the device is in use or not.

So, reassign to kernel.

Comment 8 Osier Yang 2011-01-06 08:36:00 UTC
bug created against kernel: https://bugzilla.redhat.com/show_bug.cgi?id=667609


Note You need to log in before you can comment on or make changes to this bug.