Bug 768857 - RFE: support <hostdev> <rom bar='on|off'/>
Summary: RFE: support <hostdev> <rom bar='on|off'/>
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Virtualization Tools
Classification: Community
Component: virt-manager
Version: unspecified
Hardware: All
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Cole Robinson
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-12-19 07:52 UTC by Stefan Assmann
Modified: 2014-07-06 19:31 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-02-10 19:26:03 UTC


Attachments (Terms of Use)
pci-unbind-reset-bind.sh (261 bytes, text/plain)
2012-01-05 10:40 UTC, Stefan Assmann
no flags Details
dmesg.txt (60.29 KB, text/plain)
2012-01-09 10:49 UTC, Stefan Assmann
no flags Details
virt-manager.jpg (44.36 KB, image/jpeg)
2012-01-13 14:28 UTC, Stefan Assmann
no flags Details

Description Stefan Assmann 2011-12-19 07:52:50 UTC
Description of problem:
dell-pet410-04.lab.bos.redhat.com has 2 Intel NICs (1x82576, 1x82580).
When I assign the 82576 NIC to a kvm guest in ~80% of the time the guest does not boot at all. Nothing is observed on the guests serial console. However assigning the 82580 NIC works all the time.

Version-Release number of selected component (if applicable):
kernel-2.6.32-131.0.15.el6.x86_64
libvirt-0.8.7-18.el6.x86_64

How reproducible:
often

Steps to Reproduce: dell-pet410-04.lab.bos.redhat.com
1. assign NIC to kvm guest via xml
<hostdev mode='subsystem' type='pci' managed='yes'>
  <source>
    <address domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
  </source>
</hostdev>
2. virsh start guest
  
Actual results:
guest often does not boot

Expected results:
guest boots all the time

Additional info:

Comment 2 Don Dutile (Red Hat) 2012-01-03 22:02:04 UTC
what versions of qemu-kvm & libvirt did you test with ?

Comment 3 Stefan Assmann 2012-01-04 08:30:27 UTC
qemu-kvm-0.12.1.2-2.160.el6.x86_64
libvirt-0.8.7-18.el6.x86_64

Comment 4 Stefan Assmann 2012-01-04 12:17:57 UTC
After starting the guest and waiting a while I found this in syslog
kvm: 2550: cpu0 guest string pio down

Also the guest seems to be paused
virsh list --all
 Id Name                 State
----------------------------------
  1 rhel5-64-kvm         paused

Comment 5 Stefan Assmann 2012-01-04 13:54:17 UTC
I've manually upgraded the following packages to the 6.2 versions
glibc-2.12-1.47.el6.x86_64.rpm
glibc-common-2.12-1.47.el6.x86_64.rpm
glibc-devel-2.12-1.47.el6.x86_64.rpm
glibc-headers-2.12-1.47.el6.x86_64.rpm
libvirt-0.9.4-23.el6.x86_64.rpm
libvirt-client-0.9.4-23.el6.x86_64.rpm
libvirt-python-0.9.4-23.el6.x86_64.rpm
netcf-libs-0.1.9-2.el6.x86_64.rpm
qemu-img-0.12.1.2-2.209.el6.x86_64.rpm
qemu-kvm-0.12.1.2-2.209.el6.x86_64.rpm
sgabios-bin-0-0.3.20110621svn.el6.noarch.rpm
spice-server-0.8.2-5.el6.x86_64.rpm

Still the same, guest does not start.

Comment 6 Stefan Assmann 2012-01-04 13:56:17 UTC
Also when the guest switches from running to paused I sometimes get
single mode not supported
single mode not supported
level sensitive irq not supported
level sensitive irq not supported
in dmesg

Comment 7 Don Dutile (Red Hat) 2012-01-04 20:51:00 UTC
I logged into the test machine; 04:00.[0,1] are the 82580's, not the 82576;
the latter are 02:00.[0,1].

root.bos.redhat.com:~> lspci | grep Ethernet
01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5716 Gigabit Ethernet (rev 20)
01:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5716 Gigabit Ethernet (rev 20)
02:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
02:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
04:00.0 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01)
04:00.1 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01)


So, if the description is correct, then 82576 do work, but 82580's do not.
I don't have experience with 82580's, but the problem could be reset-related.
To check, do a loop test that unbind's the driver to the 82580, resets it,
(do these steps by echo-ing to appropriate, device-specific sysfs files under /sys/bus/pci/devices/<BDF>/[driver/unbind,reset] ) and then re-bind the driver
back to the device.  if the device wedges after a number of these loops, then
the device has a reset/re-config issue (which is typical bug found while doing a device-assignment).

Comment 8 Stefan Assmann 2012-01-05 10:40:07 UTC
Created attachment 550878 [details]
pci-unbind-reset-bind.sh

Sorry Don my mistake. This problem here is the 82576 NIC not the 82580.
I tried what you suggested and did a unbind/reset/bind loop with 1000 iterations and that worked flawless.
Attaching the script I used.

To avoid any further misunderstanding the problem occurs with the following device 0000:02:00.0.

Comment 9 Stefan Assmann 2012-01-05 11:34:29 UTC
Very weird, after the whole unbind/reset/bind looping passthrough of both NICs seems to work. I tried several times now, including a reboot of the machine.

Don, should we close this and see if it happens again? I'd like to know why it didn't work before but I guess there's not much we can do now.

Comment 10 Don Dutile (Red Hat) 2012-01-06 22:40:48 UTC
(In reply to comment #9)
> Very weird, after the whole unbind/reset/bind looping passthrough of both NICs
> seems to work. I tried several times now, including a reboot of the machine.
> 
> Don, should we close this and see if it happens again? I'd like to know why it
> didn't work before but I guess there's not much we can do now.

What does your kernel boot cmdline look like?
attach full boot-up dmesg log.

Comment 11 Stefan Assmann 2012-01-09 10:49:03 UTC
Created attachment 551533 [details]
dmesg.txt

root.bos.redhat.com:~> cat /proc/cmdline 
ro root=/dev/mapper/vg_dellpet41004-lv_root rd_LVM_LV=vg_dellpet41004/lv_root rd_LVM_LV=vg_dellpet41004/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYTABLE=us console=ttyS0,115200n81 ignore_loglevel no_console_suspend intel_iommu=on crashkernel=128M

Comment 12 Stefan Assmann 2012-01-09 11:50:27 UTC
I just tried to passthrough the 82576 to a guest and the guest first seemed stuck again but then started after 1-2 minutes... Don, I'll ping you so we can have a look at this together.

In /var/log/libvirt/libvirtd.log I found
06:27:04.486: 2044: error : daemonStreamEvent:208 : stream had I/O failure
06:27:04.486: 2044: error : virFDStreamUpdateCallback:111 : internal error stream is not open
06:27:05.437: 2044: error : qemuMonitorIO:583 : internal error End of file from monitor
06:35:45.510: 2044: error : daemonStreamHandleAbort:590 : stream aborted at client request
06:36:51.415: 2044: error : daemonStreamHandleAbort:590 : stream aborted at client request
06:47:13.709: 2044: error : daemonStreamEvent:208 : stream had I/O failure
06:47:13.710: 2044: error : virFDStreamUpdateCallback:111 : internal error stream is not open
06:47:14.775: 2044: error : qemuMonitorIO:583 : internal error End of file from monitor

Comment 13 Don Dutile (Red Hat) 2012-01-10 17:15:39 UTC
(In reply to comment #12)
> I just tried to passthrough the 82576 to a guest and the guest first seemed
> stuck again but then started after 1-2 minutes... Don, I'll ping you so we can
> have a look at this together.
> 
> In /var/log/libvirt/libvirtd.log I found
> 06:27:04.486: 2044: error : daemonStreamEvent:208 : stream had I/O failure
> 06:27:04.486: 2044: error : virFDStreamUpdateCallback:111 : internal error
> stream is not open
> 06:27:05.437: 2044: error : qemuMonitorIO:583 : internal error End of file from
> monitor
> 06:35:45.510: 2044: error : daemonStreamHandleAbort:590 : stream aborted at
> client request
> 06:36:51.415: 2044: error : daemonStreamHandleAbort:590 : stream aborted at
> client request
> 06:47:13.709: 2044: error : daemonStreamEvent:208 : stream had I/O failure
> 06:47:13.710: 2044: error : virFDStreamUpdateCallback:111 : internal error
> stream is not open
> 06:47:14.775: 2044: error : qemuMonitorIO:583 : internal error End of file from
> monitor

With a device assigned to the guest, the guest is trying to do pxe boot
(can see this if you use virt-manager on host);  without the assigned device, 
no PXEboot -- I don't understand why that happens.
The 1->2 minute delay you are seeing is pxeboot waiting for a selection, and then times out to do a local boot.

Comment 14 Stefan Assmann 2012-01-13 14:25:02 UTC
I think I know what's going on now. The 82576 NIC has an option ROM for PXE boot. Now when the device is passed to the guest libvirt or kvm (not sure which) see this option ROM and decide to boot from it! The idea is not so bad but I don't see any option on virt-manager to disable this behaviour.
However I found the <rom bar='off'/> tag that I added to the xml manually and now the option ROM just gets ignored and everything works as expected.

I would suggest to add an option in virt-manager to disable any PCI option ROM detected for PCI devices that are passed to the guest.

Comment 15 Stefan Assmann 2012-01-13 14:28:14 UTC
Created attachment 555077 [details]
virt-manager.jpg

This might be a good place to add the "disable PCI option ROM" option.

Comment 16 Andy Gospodarek 2012-01-13 16:42:52 UTC
Sounds like we should change the component to virt-manager.

Comment 18 Cole Robinson 2014-02-10 19:26:03 UTC
Upstream now:

commit 82754ddc84041ece7a8462e1b14860eda4ea022b
Author: Cole Robinson <crobinso>
Date:   Mon Feb 10 14:24:22 2014 -0500

    Expose hostdev rombar in UI and cli (bz 768857)


Note You need to log in before you can comment on or make changes to this bug.