Bug 654208 - [SR-IOV]VF device can not start on 32bit Windows2008 SP2
[SR-IOV]VF device can not start on 32bit Windows2008 SP2
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kvm (Show other bugs)
5.7
x86_64 Linux
low Severity medium
: rc
: 5.8
Assigned To: Alex Williamson
Virtualization Bugs
: Triaged
Depends On: 613892
Blocks: Rhel5KvmTier2 688932 656751
  Show dependency treegraph
 
Reported: 2010-11-17 02:26 EST by Chao Yang
Modified: 2013-03-19 13:27 EDT (History)
25 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
When using PCI device assignment with a 32bit Microsoft Windows 2008 guest on an AMD-based host system, the assigned device may fail to work if it relies on MSI or MSI-X based interrupts. The reason for this is that the 32bit version of Microsoft Windows 2008 does not enable MSI based interrupts for the family of processor exposed to the guest. To work around this problem, the user may wish to move to a RHEL6 host, use a 64bit version of the guest operating system, or employ a wrapper script to modify the processor family exposed to the guest as follows (Note this is only for 32bit Windows guests): 1) Create wrapper script $ cat /usr/libexec/qemu-kvm.family16 #!/bin/sh ARGS=$@ echo $ARGS | grep -q ' -cpu ' if [ $? -eq 0 ]; then for model in $(/usr/libexec/qemu-kvm -cpu ? \ | sed 's|^x86||g' | tr -d [:blank:]); do ARGS=$(echo $ARGS | \ sed "s|-cpu $model|-cpu $model,family=16|g") done else ARGS="$ARGS -cpu qemu64,family=16" fi echo "$0: exec /usr/libexec/qemu-kvm $ARGS" >&2 exec /usr/libexec/qemu-kvm $ARGS 2) Make script executable $ chmod 755 /usr/libexec/qemu-kvm.family16 3) Set selinux permissions $ restorecon /usr/libexec/qemu-kvm.family16 4) Update guest XML to use the new wrapper $ virsh edit $GUEST Replace: <emulator>/usr/libexec/qemu-kvm</emulator> With: <emulator>/usr/libexec/qemu-kvm.family16</emulator>
Story Points: ---
Clone Of: 613892
: 656751 (view as bug list)
Environment:
Last Closed: 2011-12-22 13:06:15 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
error message (72.44 KB, image/png)
2010-11-17 03:43 EST, Chao Yang
no flags Details

  None (edit)
Description Chao Yang 2010-11-17 02:26:04 EST
I also hit this issue on win2008-32bit guest. Error message please have a look at attachment.

# rpm -qa|grep kvm
kvm-tools-83-207.el5
etherboot-zroms-kvm-5.4.4-13.el5
etherboot-roms-kvm-5.4.4-13.el5
kmod-kvm-debug-83-207.el5
kvm-debuginfo-83-207.el5
kvm-83-207.el5
kvm-qemu-img-83-207.el5
kmod-kvm-83-207.el5

host kernel:
# uname -r
2.6.18-230.el5

QEMU CLI:
# /usr/libexec/qemu-kvm -no-hpet -rtc-td-hack -usbdevice tablet -startdate now -name windows2008-32 -smp 4 -m 4G -boot c  -drive file=/root/zhangjunyi/win2008_32_virtio.qcow2,media=disk,if=virtio,cache=none,format=qcow2,werror=stop,boot=on -vnc :10  -cpu qemu64 -M rhel5.6.0 -notify all -balloon virtio -monitor stdio -net none -pcidevice host=09:10.5


+++ This bug was initially created as a clone of Bug #613892 +++

Description of problem:
With RHEL6 Beta2, Kawela VF can be assigned to 32bit Windows 2008 SP2 with
qemu-kvm cmdline, device manager show "yellow bang" and it's said "This device can not start. (Code 10)". VF device can not work on 32bit Windows 2k8 SP2.

Version-Release number of selected component (if applicable):
rhel6-beta2 2.6.32-37.el6.x86_64 


How reproducible:
Always

Steps to Reproduce:
1. Install 32bit Windows 2008 SP2 and assign Kawela VF to it
2. Install driver from
http://downloadcenter.intel.com/T8Clearance.aspx?sType=&agr=Y&ProductID=&DwnldID=18720&url=/18720/a08/PROWin32.exe&PrdMap=&strOSs=&OSFullName=&lang=eng
3. Reboot guest and check network connection


Actual results:
VF can not get IP

Expected results:
VF can work on Windows SP2.


Additional info:

--- Additional comment from xudong.hao@intel.com on 2010-07-13 02:29:30 EDT ---

Created attachment 431354 [details]
vf device can not start on guest

--- Additional comment from pm-rhel@redhat.com on 2010-07-13 02:41:19 EDT ---

Since this issue was entered in bugzilla, the release flag has been
set to ? to ensure that it is properly evaluated for this release.

--- Additional comment from ddutile@redhat.com on 2010-07-13 10:25:20 EDT ---

Please provide:

-- host kernel version

-- guest startup cmdline (if use qemu-kvm directly) or xml spec (if you virsh)

--- Additional comment from alex.williamson@redhat.com on 2010-07-13 15:18:13 EDT ---

Created attachment 431573 [details]
cannot find enough free resources

I get a different error message, see the attached image.  This same error happens both with current rhel6 bits and upstream kvm.  Looking at the log files doesn't seem to suggest a PCI BAR resource issue.  Can we get some help to understand what the driver is looking for that it can't find resources for?

--- Additional comment from ashish.n.shah@intel.com on 2010-07-13 17:07:18 EDT ---

When we debugged here there is no error in the driver that we can see. It seems like the OS is having some issues with getting resources required for the device (such as MSI-X.) The same driver is working correct on Xen Server so we are not certain what the difference is between the two systems that would cause this.

--- Additional comment from xudong.hao@intel.com on 2010-07-13 21:08:05 EDT ---

(In reply to comment #3)
> -- host kernel version
> 
kernel 2.6.32-37.el6.x86_64.
qemu-kvm-0.12.1.2

> -- guest startup cmdline (if use qemu-kvm directly) or xml spec (if you virsh)    
qemu-kvm command line: 
/usr/libexec/qemu-kvm -m 1024 -smp 2 -net none -hda /var/lib/libvirt/images/win2k8.img  -pcidevice host=01:10.0

--- Additional comment from alex.williamson@redhat.com on 2010-07-14 16:59:34 EDT ---

Can you provide both an 'lspci -vvv' and an 'ls -l
/sys/bus/pci/devices/0000:00:xx.y/' from a linux guest with the device assigned
to it running on xenserver?  Maybe we can spot something different in the
features or config space they're exposing.

--- Additional comment from alex.williamson@redhat.com on 2010-07-25 10:12:57 EDT ---

I've noticed that 32bit Windows on kvm typically does not use MSI interrupts.  However, if I boot the guest with '-cpu host' MSI will be used and the 82576 VF works.  Is Windows looking for specific processor flags to enable MSI interrupt support?

--- Additional comment from alex.williamson@redhat.com on 2010-07-25 11:57:25 EDT ---

Looks like 32bit Windows doesn't enable MSI support until family 6, model 13 processor revisions, so if you boot using -cpu qemu64,model=13 the VF works as expected.

--- Additional comment from yang.z.zhang@intel.com on 2010-07-25 21:50:47 EDT ---

(In reply to comment #9)
> Looks like 32bit Windows doesn't enable MSI support until family 6, model 13
> processor revisions, so if you boot using -cpu qemu64,model=13 the VF works as
> expected.    
yeah, with the arg "-cpu qemu64,model=13", the VF can work well.

--- Additional comment from dlaor@redhat.com on 2010-07-26 08:15:19 EDT ---

Can you retest with one of the following option (most suitable to the host):
-cpu Penryn or Nehalem or Conroe? They should have family=6, model =15

--- Additional comment from dlaor@redhat.com on 2010-07-26 08:16:47 EDT ---

(In reply to comment #11)
> Can you retest with one of the following option (most suitable to the host):
> -cpu Penryn or Nehalem or Conroe? They should have family=6, model =15    

Oops, the AMD models have model=15 while the ones above have model 6.
We should change that.

--- Additional comment from john.cooper@redhat.com on 2010-07-26 15:43:01 EDT ---

These fields inspire headaches.  Currently for the new
models we use the Intel and AMD reviewed values of:

AMD Opteron_G1/G2/G3:

   family = "15"
   model = "6"

Intel Conroe/Penryn/Nehalem + qemu64:

   family = "6"
   model = "2"

So the Intel cpu model CPUID provided "model" fields
and those of qemu64 require the prospective change.
Awaiting input from Intel on this.

--- Additional comment from yang.z.zhang@intel.com on 2010-07-26 22:18:07 EDT ---

(In reply to comment #11)
> Can you retest with one of the following option (most suitable to the host):
> -cpu Penryn or Nehalem or Conroe? They should have family=6, model =15    

I have retest with those args. Unfortunately, VF can not work with it.

--- Additional comment from john.cooper@redhat.com on 2010-07-27 22:02:10 EDT ---

Feedback form Intel (mail attached for reference).
The recommendation summary for cpuid "model" is:

    Conroe: 15
    Penryn: 23
    Nehalem: 26

Concerning qemu64, empirically the model value needs
to be at least 13, with a rhel5-equivalent legacy
model added for compatibility.

--- Additional comment from john.cooper@redhat.com on 2010-07-27 23:32:31 EDT ---

From: "Dugger, Donald D" <donald.d.dugger@intel.com>
To: john cooper <john.cooper@redhat.com>
CC: Bill Burns <bburns@redhat.com>, "Nakajima, Jun" <jun.nakajima@intel.com>,
        "Yu, Wilfred" <wilfred.yu@intel.com>
Date: Mon, 26 Jul 2010 18:43:33 -0700
Subject: RE: Need Intel input on CPUID model..

John-

Yeah, this whole issue of virtualizing the family/model (which we have to
do) and how it exposes unexpected issues is pretty icky (note, I would
contend that the Windows code is wrong, there's no relationship between
MSI support and the CPU model but the reality is we have to get the Windows
guest to work).

We talked this issue over in our team meeting today and the bottom line
is that using Family 6, Model 13 when identifying a virtual CPU with either
Conroe or Penryn or Nehalem capabilities should work just fine.  Those 3
CPUs all have model numbers greater than 13 (Conroe = 15, Penryn = 23,
Nehalem = 26) so 13 will certainly work as a least common denominator for
them.


--
Don Dugger
"Censeo Toto nos in Kansa esse decisse." - D. Gale
Ph: 303/443-3786

-----Original Message-----
From: john cooper [mailto:john.cooper@redhat.com]
Sent: Monday, July 26, 2010 1:10 PM
To: Dugger, Donald D
Cc: john cooper; Bill Burns
Subject: Need Intel input on CPUID model..

Don,
    We have a bug with SR-IOV where the CPUID
family:model must be at least 6:13 for a 32bit
windows guest to enable MSI.  So we're considering
to make such a change for the new Conroe/Penryn/Nehalem
models we've discussed with you folks.

However currently we're using 6:2 for family:model
as advised by Intel for a least-common-denominator
in the respective classes.  As such we're a bit
hesitant to make a change without feedback either
way from you.

--- Additional comment from yongkang.you@intel.com on 2010-07-29 04:19:14 EDT ---

Remove the NeedInfo request for xudong.

--- Additional comment from ehabkost@redhat.com on 2010-07-29 14:21:04 EDT ---

Patch(es) posted for review and queued. Changing status to POST
http://post-office.corp.redhat.com/archives/rhvirt-patches/2010-July/msg01012.html

--- Additional comment from ehabkost@redhat.com on 2010-08-03 11:09:02 EDT ---


Patches acked:
* Patch: Correct cpuid flags and "model" fields, V2
  (Message-Id: <4C53C46D.5090008@redhat.com>)
  - Acked-by: Juan Quintela <quintela@redhat.com>
  - Acked-by: Alex Williamson <alex.williamson@redhat.com>
  - Acked-by: Jes Sorensen <Jes.Sorensen@redhat.com>

--- Additional comment from ehabkost@redhat.com on 2010-08-03 11:53:46 EDT ---

Fix included on qemu-kvm-0.12.1.2-2.107.el6

--- Additional comment from releng-rhel@redhat.com on 2010-08-05 12:50:39 EDT ---

Fixed in 'qemu-kvm-0.12.1.2-2.107.el6'. 'qemu-kvm-0.12.1.2-2.108.el6' included in compose 'RHEL6.0-20100805.0'.
Moving to ON_QA.

--- Additional comment from yang.z.zhang@intel.com on 2010-08-17 03:11:51 EDT ---

Verified this bug with rhel6 snap10, and PASSED.

libvirt-0.8.1-21.el6.x86_64
qemu-kvm-tools-0.12.1.2-2.108.el6.x86_64
qemu-kvm-0.12.1.2-2.108.el6.x86_64
kernel-2.6.32-59.el6.x86_64

--- Additional comment from kcao@redhat.com on 2010-08-17 03:25:44 EDT ---

(In reply to comment #22)
> Verified this bug with rhel6 snap10, and PASSED.
> 
> libvirt-0.8.1-21.el6.x86_64
> qemu-kvm-tools-0.12.1.2-2.108.el6.x86_64
> qemu-kvm-0.12.1.2-2.108.el6.x86_64
> kernel-2.6.32-59.el6.x86_64

--- Additional comment from releng-rhel@redhat.com on 2010-11-10 16:26:31 EST ---

Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.
Comment 1 Chao Yang 2010-11-17 02:56:42 EST
And win2008 64bit didn't hit this issue, can get ip successfully
Comment 2 yang 2010-11-17 03:05:05 EST
Did you have a try with "-cpu qemu64,model=13"?
Comment 3 yang 2010-11-17 03:13:38 EST
what error message you see?
"This device can not start. (Code 10)" or "cannot find enough free resources"
Comment 4 Chao Yang 2010-11-17 03:41:40 EST
(In reply to comment #2)
> Did you have a try with "-cpu qemu64,model=13"?

Tried with "-cpu qemu64,model=13", fail to get ip.

# /usr/libexec/qemu-kvm -no-hpet -rtc-td-hack -usbdevice tablet -startdate now -name windows2008-32 -smp 4 -m 4G -boot c  -drive file=/root/zhangjunyi/win2008_32_virtio.qcow2,media=disk,if=virtio,cache=none,format=qcow2,werror=stop,boot=on -vnc :10  -cpu qemu64,model=13 -M rhel5.6.0 -notify all -balloon virtio -monitor stdio -net none -pcidevice host=09:10.1
(In reply to comment #3)
> what error message you see?
> "This device can not start. (Code 10)" or "cannot find enough free resources"

got "cannot find enough free resources" message. Please see attachment for detail.
And here are problem details:
Description:
Windows are able to successfully install device driver software, but the driver software encountered a problem when it tried to run. The problem code is 12
Comment 5 Chao Yang 2010-11-17 03:43:16 EST
Created attachment 461015 [details]
error message
Comment 6 Alex Williamson 2010-11-17 17:20:48 EST
chayang, can you please double check your results with -cpu qemu64,model=13?  I installed a win2k8 sp2 32bit guest on rhel5.6, installed the Intel driver and see the same error code 12 you report.  If I shutdown the VM and restart it with the -cpu qemu64,model=13 option, the VF NIC works correctly.  Adding John, because I'll re-assign this to him if you can verify my results.
Comment 7 Chao Yang 2010-11-18 01:26:09 EST
(In reply to comment #6)
> chayang, can you please double check your results with -cpu qemu64,model=13?  I
> installed a win2k8 sp2 32bit guest on rhel5.6, installed the Intel driver and
> see the same error code 12 you report.  If I shutdown the VM and restart it
> with the -cpu qemu64,model=13 option, the VF NIC works correctly.  Adding John,
> because I'll re-assign this to him if you can verify my results.

Alex, I double checked with -cpu qemu64,model=13, hit again. 
I upgraded my win2K8 sp1 32bit guest to sp2 by installing Windows6.0-KB948465-X86.exe and tried with -cpu qemu64,model=13 again, also hit. Shutdown the VM and restart it with -cpu qemu64,model=13 option, it still reports error code 12.
Comment 8 Alex Williamson 2010-11-18 12:00:42 EST
(In reply to comment #7)
> 
> Alex, I double checked with -cpu qemu64,model=13, hit again. 
> I upgraded my win2K8 sp1 32bit guest to sp2 by installing
> Windows6.0-KB948465-X86.exe and tried with -cpu qemu64,model=13 again, also
> hit. Shutdown the VM and restart it with -cpu qemu64,model=13 option, it still
> reports error code 12.

I'm not sure how to debug this since it works just fine on my system using the -cpu flags.  Is there anything in dmesg on the host?  Are there any error messages printed from qemu-kvm while the guest is running?  Have you tried uninstalling and reinstalling the latest Intel driver (15.7 iirc).  Confirm the host is booted with intel_iommu=on and that the PF NICs on the host are both configured up.  Perhaps try a fresh install of windows 2008 server sp2 32bit and a clean install of the Intel driver.
Comment 9 yang 2010-11-18 12:46:32 EST
it also works well for me with -cpu flag
Comment 10 Chao Yang 2010-11-18 23:25:36 EST
(In reply to comment #8)
> (In reply to comment #7)
> > 
> > Alex, I double checked with -cpu qemu64,model=13, hit again. 
> > I upgraded my win2K8 sp1 32bit guest to sp2 by installing
> > Windows6.0-KB948465-X86.exe and tried with -cpu qemu64,model=13 again, also
> > hit. Shutdown the VM and restart it with -cpu qemu64,model=13 option, it still
> > reports error code 12.
> 
> I'm not sure how to debug this since it works just fine on my system using the
> -cpu flags.  Is there anything in dmesg on the host?  Are there any error
> messages printed from qemu-kvm while the guest is running?  Have you tried
> uninstalling and reinstalling the latest Intel driver (15.7 iirc).  Confirm the
> host is booted with intel_iommu=on and that the PF NICs on the host are both
> configured up.  Perhaps try a fresh install of windows 2008 server sp2 32bit
> and a clean install of the Intel driver.

1. Is there anything in dmesg on the host? 
   igb 0000:09:00.1: 0 vfs allocated
igb 0000:09:00.1: Intel(R) Gigabit Ethernet Network Connection
igb 0000:09:00.1: eth1: (PCIe:2.5Gb/s:Width x4) 00:1b:21:36:79:f1
igb 0000:09:00.1: eth1: PBA No: e43709-003
igb 0000:09:00.1: Using MSI-X interrupts. 4 rx queue(s), 1 tx queue(s)
ADDRCONF(NETDEV_UP): eth0: link is not ready
ACPI: PCI interrupt for device 0000:09:00.1 disabled
ACPI: PCI interrupt for device 0000:09:00.0 disabled
Intel(R) Gigabit Ethernet Network Driver - version 2.1.0-k2-1
Copyright (c) 2007-2009 Intel Corporation.
PCI: Enabling device 0000:09:00.0 (0000 -> 0002)
ACPI: PCI Interrupt 0000:09:00.0[A] -> Link [LN48] -> GSI 48 (level, high) -> IRQ 51
PCI: Setting latency timer of device 0000:09:00.0 to 64
igb 0000:09:00.0: 4 vfs allocated
Intel(R) Virtual Function Network Driver - version 1.0.0-k0-1
Copyright (c) 2009 Intel Corporation.
igb 0000:09:00.0: Intel(R) Gigabit Ethernet Network Connection
igb 0000:09:00.0: eth0: (PCIe:2.5Gb/s:Width x4) 00:1b:21:36:79:f0
igb 0000:09:00.0: eth0: PBA No: e43709-003
igb 0000:09:00.0: Using MSI-X interrupts. 2 rx queue(s), 1 tx queue(s)
PCI: Enabling device 0000:09:00.1 (0000 -> 0002)
ACPI: PCI Interrupt 0000:09:00.1[B] -> Link [LN49] -> GSI 49 (level, high) -> IRQ 107
PCI: Setting latency timer of device 0000:09:00.1 to 64
WARNING: at drivers/pci/msi.c:1107 pci_enable_msix()

Call Trace:
 [<ffffffff8016dd3e>] pci_enable_msix+0x114/0x3c6
 [<ffffffff802282fa>] pci_conf1_read+0xcc/0xd7
 [<ffffffff80015f2a>] __bitmap_weight+0x65/0x76
 [<ffffffff882bcae8>] :igb:igb_init_interrupt_scheme+0xbd/0x31e
 [<ffffffff882bfb5e>] :igb:igb_probe+0x470/0xc8d
 [<ffffffff8008c850>] __wake_up_common+0x3e/0x68
 [<ffffffff801614e3>] pci_device_probe+0x104/0x184
 [<ffffffff801cb823>] driver_probe_device+0x52/0xaa
 [<ffffffff801cb952>] __driver_attach+0x65/0xb6
 [<ffffffff801cb8ed>] __driver_attach+0x0/0xb6
 [<ffffffff801cb12a>] bus_for_each_dev+0x43/0x6e
 [<ffffffff801cad66>] bus_add_driver+0x76/0x110
 [<ffffffff801617ff>] __pci_register_driver+0x51/0xa6
 [<ffffffff800a8d1e>] sys_init_module+0xaf/0x1f2
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

igb 0000:09:00.1: 4 vfs allocated
igb 0000:09:00.1: Intel(R) Gigabit Ethernet Network Connection
igb 0000:09:00.1: eth1: (PCIe:2.5Gb/s:Width x4) 00:1b:21:36:79:f1
igb 0000:09:00.1: eth1: PBA No: e43709-003
igb 0000:09:00.1: Using MSI-X interrupts. 2 rx queue(s), 1 tx queue(s)
ADDRCONF(NETDEV_UP): eth0: link is not ready
igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
eth1: no IPv6 routers present
PCI: Enabling device 0000:09:10.1 (0000 -> 0002)
assign device: host bdf = 9:10:1


2. Are there any error
messages printed from qemu-kvm while the guest is running?

#/usr/libexec/qemu-kvm -no-hpet -rtc-td-hack -usbdevice tablet -startdate now -name windows2008-32 -smp 4 -m 4G -boot c  -drive file=/root/zhangjunyi/win2008_32_virtio.qcow2,media=disk,if=virtio,cache=none,format=qcow2,werror=stop,boot=on -vnc :18  -cpu qemu64,model=13 -M rhel5.6.0 -notify all -balloon virtio -monitor stdio -net none -pcidevice host=09:10.1
QEMU 0.9.1 monitor - type 'help' for more information
(qemu) BUG: kvm_destroy_phys_mem: invalid parameters (slot=-1)
BUG: kvm_destroy_phys_mem: invalid parameters (slot=-1)

(qemu) sendkey ctrl-alt-delete
(qemu) 

3. Have you tried uninstalling and reinstalling the latest Intel driver (15.7 iirc)?

Yes, I have tried the latest Intel driver but the same problem still exists.

4. Confirm the host is booted with intel_iommu=on and that the PF NICs on the host are both configured up?
I am using AMD host running RHEL 5.6, amd iommu is on by default. After unbind a vf, I have eth1 configured up

5. Perhaps try a fresh install of windows 2008 server sp2 32bit
and a clean install of the Intel driver?
Ok, I will try a fresh install of windows 2008 server sp2 32bit and install the latest Intel driver
Comment 11 yang 2010-11-19 00:14:51 EST
(In reply to comment #10)
> I am using AMD host running RHEL 5.6, amd iommu is on by default. After unbind
> a vf, I have eth1 configured up
uh, can you have a try with intel platform?
Comment 12 Alex Williamson 2010-11-19 00:30:45 EST
(In reply to comment #10)
> 
> 1. Is there anything in dmesg on the host? 
>    igb 0000:09:00.1: 0 vfs allocated
> igb 0000:09:00.1: Intel(R) Gigabit Ethernet Network Connection
> igb 0000:09:00.1: eth1: (PCIe:2.5Gb/s:Width x4) 00:1b:21:36:79:f1
> igb 0000:09:00.1: eth1: PBA No: e43709-003
> igb 0000:09:00.1: Using MSI-X interrupts. 4 rx queue(s), 1 tx queue(s)
> ADDRCONF(NETDEV_UP): eth0: link is not ready
> ACPI: PCI interrupt for device 0000:09:00.1 disabled
> ACPI: PCI interrupt for device 0000:09:00.0 disabled
> Intel(R) Gigabit Ethernet Network Driver - version 2.1.0-k2-1
> Copyright (c) 2007-2009 Intel Corporation.
> PCI: Enabling device 0000:09:00.0 (0000 -> 0002)
> ACPI: PCI Interrupt 0000:09:00.0[A] -> Link [LN48] -> GSI 48 (level, high) ->
> IRQ 51
> PCI: Setting latency timer of device 0000:09:00.0 to 64
> igb 0000:09:00.0: 4 vfs allocated
> Intel(R) Virtual Function Network Driver - version 1.0.0-k0-1
> Copyright (c) 2009 Intel Corporation.
> igb 0000:09:00.0: Intel(R) Gigabit Ethernet Network Connection
> igb 0000:09:00.0: eth0: (PCIe:2.5Gb/s:Width x4) 00:1b:21:36:79:f0
> igb 0000:09:00.0: eth0: PBA No: e43709-003
> igb 0000:09:00.0: Using MSI-X interrupts. 2 rx queue(s), 1 tx queue(s)
> PCI: Enabling device 0000:09:00.1 (0000 -> 0002)
> ACPI: PCI Interrupt 0000:09:00.1[B] -> Link [LN49] -> GSI 49 (level, high) ->
> IRQ 107
> PCI: Setting latency timer of device 0000:09:00.1 to 64
> WARNING: at drivers/pci/msi.c:1107 pci_enable_msix()
> 
> Call Trace:
>  [<ffffffff8016dd3e>] pci_enable_msix+0x114/0x3c6
>  [<ffffffff802282fa>] pci_conf1_read+0xcc/0xd7
>  [<ffffffff80015f2a>] __bitmap_weight+0x65/0x76
>  [<ffffffff882bcae8>] :igb:igb_init_interrupt_scheme+0xbd/0x31e
>  [<ffffffff882bfb5e>] :igb:igb_probe+0x470/0xc8d
>  [<ffffffff8008c850>] __wake_up_common+0x3e/0x68
>  [<ffffffff801614e3>] pci_device_probe+0x104/0x184
>  [<ffffffff801cb823>] driver_probe_device+0x52/0xaa
>  [<ffffffff801cb952>] __driver_attach+0x65/0xb6
>  [<ffffffff801cb8ed>] __driver_attach+0x0/0xb6
>  [<ffffffff801cb12a>] bus_for_each_dev+0x43/0x6e
>  [<ffffffff801cad66>] bus_add_driver+0x76/0x110
>  [<ffffffff801617ff>] __pci_register_driver+0x51/0xa6
>  [<ffffffff800a8d1e>] sys_init_module+0xaf/0x1f2
>  [<ffffffff8005d28d>] tracesys+0xd5/0xe0

This doesn't look good.  Above we see function 1 initialized with 0 vfs, function 1 with 4, then something bad happens setting up msix here, and then function 1 is re-initialized with 4 vfs.  Does this card even work on the host?  Please verify that you can make use of the ethX device associated with the VF in the host before trying to assign it to the guest.

> igb 0000:09:00.1: 4 vfs allocated
> igb 0000:09:00.1: Intel(R) Gigabit Ethernet Network Connection
> igb 0000:09:00.1: eth1: (PCIe:2.5Gb/s:Width x4) 00:1b:21:36:79:f1
> igb 0000:09:00.1: eth1: PBA No: e43709-003
> igb 0000:09:00.1: Using MSI-X interrupts. 2 rx queue(s), 1 tx queue(s)
> ADDRCONF(NETDEV_UP): eth0: link is not ready
> igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
> eth1: no IPv6 routers present
> PCI: Enabling device 0000:09:10.1 (0000 -> 0002)
> assign device: host bdf = 9:10:1
> 
> 
> 2. Are there any error
> messages printed from qemu-kvm while the guest is running?
> 
> #/usr/libexec/qemu-kvm -no-hpet -rtc-td-hack -usbdevice tablet -startdate now
> -name windows2008-32 -smp 4 -m 4G -boot c  -drive
> file=/root/zhangjunyi/win2008_32_virtio.qcow2,media=disk,if=virtio,cache=none,format=qcow2,werror=stop,boot=on
> -vnc :18  -cpu qemu64,model=13 -M rhel5.6.0 -notify all -balloon virtio
> -monitor stdio -net none -pcidevice host=09:10.1
> QEMU 0.9.1 monitor - type 'help' for more information
> (qemu) BUG: kvm_destroy_phys_mem: invalid parameters (slot=-1)
> BUG: kvm_destroy_phys_mem: invalid parameters (slot=-1)

These BUGs are annoying, but don't hurt anything.

> (qemu) sendkey ctrl-alt-delete
> (qemu) 
> 
> 3. Have you tried uninstalling and reinstalling the latest Intel driver (15.7
> iirc)?
> 
> Yes, I have tried the latest Intel driver but the same problem still exists.
> 
> 4. Confirm the host is booted with intel_iommu=on and that the PF NICs on the
> host are both configured up?
> I am using AMD host running RHEL 5.6, amd iommu is on by default. After unbind
> a vf, I have eth1 configured up

Do you have the same problem if you have eth0 up and assign one the the even numbered VFs to the guest?

> 5. Perhaps try a fresh install of windows 2008 server sp2 32bit
> and a clean install of the Intel driver?
> Ok, I will try a fresh install of windows 2008 server sp2 32bit and install the
> latest Intel driver
Comment 13 Chao Yang 2010-11-19 01:47:06 EST
Alex, AMD host has two 82576 ethernet cards,eth0 and eth1,eth0 is not available.I am sure that eth1 has no problem cause I did the same actions on win2k8 64bit, the vf works fine. 
I will reproduce steps to assign a vf and give you the output of device message for each step. 

1. #rmmod igb;modprobe igb max_vfs=4

igb 0000:09:00.1: IOV Disabled
ACPI: PCI interrupt for device 0000:09:00.1 disabled
igb 0000:09:00.0: IOV Disabled
ACPI: PCI interrupt for device 0000:09:00.0 disabled
Intel(R) Gigabit Ethernet Network Driver - version 2.1.0-k2-1
Copyright (c) 2007-2009 Intel Corporation.
PCI: Enabling device 0000:09:00.0 (0000 -> 0002)
ACPI: PCI Interrupt 0000:09:00.0[A] -> Link [LN48] -> GSI 48 (level, high) -> IRQ 51
PCI: Setting latency timer of device 0000:09:00.0 to 64
igb 0000:09:00.0: 0 vfs allocated
igb 0000:09:00.0: Intel(R) Gigabit Ethernet Network Connection
igb 0000:09:00.0: eth0: (PCIe:2.5Gb/s:Width x4) 00:1b:21:36:79:f0
igb 0000:09:00.0: eth0: PBA No: e43709-003
igb 0000:09:00.0: Using MSI-X interrupts. 4 rx queue(s), 1 tx queue(s)
PCI: Enabling device 0000:09:00.1 (0000 -> 0002)
ACPI: PCI Interrupt 0000:09:00.1[B] -> Link [LN49] -> GSI 49 (level, high) -> IRQ 107
PCI: Setting latency timer of device 0000:09:00.1 to 64
WARNING: at drivers/pci/msi.c:1107 pci_enable_msix()

Call Trace:
 [<ffffffff8016dd3e>] pci_enable_msix+0x114/0x3c6
 [<ffffffff800ddd9c>] alternate_node_alloc+0x70/0x8c
 [<ffffffff882bcae8>] :igb:igb_init_interrupt_scheme+0xbd/0x31e
 [<ffffffff882bfb5e>] :igb:igb_probe+0x470/0xc8d
 [<ffffffff801614e3>] pci_device_probe+0x104/0x184
 [<ffffffff801cb823>] driver_probe_device+0x52/0xaa
 [<ffffffff801cb952>] __driver_attach+0x65/0xb6
 [<ffffffff801cb8ed>] __driver_attach+0x0/0xb6
 [<ffffffff801cb12a>] bus_for_each_dev+0x43/0x6e
 [<ffffffff801cad66>] bus_add_driver+0x76/0x110
 [<ffffffff801617ff>] __pci_register_driver+0x51/0xa6
 [<ffffffff800a8d1e>] sys_init_module+0xaf/0x1f2
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

igb 0000:09:00.1: 0 vfs allocated
igb 0000:09:00.1: Intel(R) Gigabit Ethernet Network Connection
igb 0000:09:00.1: eth1: (PCIe:2.5Gb/s:Width x4) 00:1b:21:36:79:f1
igb 0000:09:00.1: eth1: PBA No: e43709-003
igb 0000:09:00.1: Using MSI-X interrupts. 4 rx queue(s), 1 tx queue(s)
ADDRCONF(NETDEV_UP): eth0: link is not ready


2. check with #lspci|grep 82576, did't output vf infos
lspci|grep 82576
09:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
09:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)

3. run #rmmod igb;modprobe igb max_vfs=4 again

ACPI: PCI interrupt for device 0000:09:00.1 disabled
ACPI: PCI interrupt for device 0000:09:00.0 disabled
Intel(R) Gigabit Ethernet Network Driver - version 2.1.0-k2-1
Copyright (c) 2007-2009 Intel Corporation.
PCI: Enabling device 0000:09:00.0 (0000 -> 0002)
ACPI: PCI Interrupt 0000:09:00.0[A] -> Link [LN48] -> GSI 48 (level, high) -> IRQ 51
PCI: Setting latency timer of device 0000:09:00.0 to 64
igb 0000:09:00.0: 4 vfs allocated
Intel(R) Virtual Function Network Driver - version 1.0.0-k0-1
Copyright (c) 2009 Intel Corporation.
igb 0000:09:00.0: Intel(R) Gigabit Ethernet Network Connection
igb 0000:09:00.0: eth0: (PCIe:2.5Gb/s:Width x4) 00:1b:21:36:79:f0
igb 0000:09:00.0: eth0: PBA No: e43709-003
igb 0000:09:00.0: Using MSI-X interrupts. 2 rx queue(s), 1 tx queue(s)
PCI: Enabling device 0000:09:00.1 (0000 -> 0002)
ACPI: PCI Interrupt 0000:09:00.1[B] -> Link [LN49] -> GSI 49 (level, high) -> IRQ 107
PCI: Setting latency timer of device 0000:09:00.1 to 64
WARNING: at drivers/pci/msi.c:1107 pci_enable_msix()

Call Trace:
 [<ffffffff8016dd3e>] pci_enable_msix+0x114/0x3c6
 [<ffffffff802282fa>] pci_conf1_read+0xcc/0xd7
 [<ffffffff80015f2a>] __bitmap_weight+0x65/0x76
 [<ffffffff882bcae8>] :igb:igb_init_interrupt_scheme+0xbd/0x31e
 [<ffffffff882bfb5e>] :igb:igb_probe+0x470/0xc8d
 [<ffffffff8008c850>] __wake_up_common+0x3e/0x68
 [<ffffffff801614e3>] pci_device_probe+0x104/0x184
 [<ffffffff801cb823>] driver_probe_device+0x52/0xaa
 [<ffffffff801cb952>] __driver_attach+0x65/0xb6
 [<ffffffff801cb8ed>] __driver_attach+0x0/0xb6
 [<ffffffff801cb12a>] bus_for_each_dev+0x43/0x6e
 [<ffffffff801cad66>] bus_add_driver+0x76/0x110
 [<ffffffff801617ff>] __pci_register_driver+0x51/0xa6
 [<ffffffff800a8d1e>] sys_init_module+0xaf/0x1f2
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

igb 0000:09:00.1: 4 vfs allocated
igb 0000:09:00.1: Intel(R) Gigabit Ethernet Network Connection
igb 0000:09:00.1: eth1: (PCIe:2.5Gb/s:Width x4) 00:1b:21:36:79:f1
igb 0000:09:00.1: eth1: PBA No: e43709-003
igb 0000:09:00.1: Using MSI-X interrupts. 2 rx queue(s), 1 tx queue(s)
ADDRCONF(NETDEV_UP): eth0: link is not ready


4. #lspci|grep 82576

lspci|grep 82576
09:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
09:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
09:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
09:10.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
09:10.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
09:10.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
09:10.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
09:10.5 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
09:10.6 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
09:10.7 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)

5. I need to configure eth1 up using #ifconfig eth1 up, cause actions above result in eth1 disappeared on AMD host.And on Intel host,actions above won't cause eth1 disappear
Comment 14 Chao Yang 2010-11-19 02:01:29 EST
sorry, Alex, my mistake, it is one ethernet card with two ports, not two ethernets. And only one port is in use.
Comment 15 Alex Williamson 2010-11-19 13:35:04 EST
(In reply to comment #13)
> Alex, AMD host has two 82576 ethernet cards,eth0 and eth1,eth0 is not
> available.I am sure that eth1 has no problem cause I did the same actions on
> win2k8 64bit, the vf works fine. 
> I will reproduce steps to assign a vf and give you the output of device message
> for each step. 
> 
> 1. #rmmod igb;modprobe igb max_vfs=4
> 
> igb 0000:09:00.1: IOV Disabled
> ACPI: PCI interrupt for device 0000:09:00.1 disabled
> igb 0000:09:00.0: IOV Disabled
> ACPI: PCI interrupt for device 0000:09:00.0 disabled
> Intel(R) Gigabit Ethernet Network Driver - version 2.1.0-k2-1
> Copyright (c) 2007-2009 Intel Corporation.
> PCI: Enabling device 0000:09:00.0 (0000 -> 0002)
> ACPI: PCI Interrupt 0000:09:00.0[A] -> Link [LN48] -> GSI 48 (level, high) ->
> IRQ 51
> PCI: Setting latency timer of device 0000:09:00.0 to 64
> igb 0000:09:00.0: 0 vfs allocated

RHEL5 seems to aggressively reload drivers when they're rmmod'd, so it's probably reloading here with the default max_vfs=0 rather than your modprobe.  It's better to add an /etc/modprobe.d entry to specify the max_vfs options.  The aggressive module reloading might be a separate bug, I don't remember it doing that before.

> 2. check with #lspci|grep 82576, did't output vf infos
> lspci|grep 82576
> 09:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection
> (rev 01)
> 09:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection
> (rev 01)
> 
> 3. run #rmmod igb;modprobe igb max_vfs=4 again
> 
> ACPI: PCI interrupt for device 0000:09:00.1 disabled
> ACPI: PCI interrupt for device 0000:09:00.0 disabled
> Intel(R) Gigabit Ethernet Network Driver - version 2.1.0-k2-1
> Copyright (c) 2007-2009 Intel Corporation.
> PCI: Enabling device 0000:09:00.0 (0000 -> 0002)
> ACPI: PCI Interrupt 0000:09:00.0[A] -> Link [LN48] -> GSI 48 (level, high) ->
> IRQ 51
> PCI: Setting latency timer of device 0000:09:00.0 to 64
> igb 0000:09:00.0: 4 vfs allocated
> Intel(R) Virtual Function Network Driver - version 1.0.0-k0-1
> Copyright (c) 2009 Intel Corporation.
> igb 0000:09:00.0: Intel(R) Gigabit Ethernet Network Connection
> igb 0000:09:00.0: eth0: (PCIe:2.5Gb/s:Width x4) 00:1b:21:36:79:f0
> igb 0000:09:00.0: eth0: PBA No: e43709-003
> igb 0000:09:00.0: Using MSI-X interrupts. 2 rx queue(s), 1 tx queue(s)
> PCI: Enabling device 0000:09:00.1 (0000 -> 0002)
> ACPI: PCI Interrupt 0000:09:00.1[B] -> Link [LN49] -> GSI 49 (level, high) ->
> IRQ 107
> PCI: Setting latency timer of device 0000:09:00.1 to 64
> WARNING: at drivers/pci/msi.c:1107 pci_enable_msix()
> 
> Call Trace:
>  [<ffffffff8016dd3e>] pci_enable_msix+0x114/0x3c6
>  [<ffffffff802282fa>] pci_conf1_read+0xcc/0xd7
>  [<ffffffff80015f2a>] __bitmap_weight+0x65/0x76
>  [<ffffffff882bcae8>] :igb:igb_init_interrupt_scheme+0xbd/0x31e
>  [<ffffffff882bfb5e>] :igb:igb_probe+0x470/0xc8d
>  [<ffffffff8008c850>] __wake_up_common+0x3e/0x68
>  [<ffffffff801614e3>] pci_device_probe+0x104/0x184
>  [<ffffffff801cb823>] driver_probe_device+0x52/0xaa
>  [<ffffffff801cb952>] __driver_attach+0x65/0xb6
>  [<ffffffff801cb8ed>] __driver_attach+0x0/0xb6
>  [<ffffffff801cb12a>] bus_for_each_dev+0x43/0x6e
>  [<ffffffff801cad66>] bus_add_driver+0x76/0x110
>  [<ffffffff801617ff>] __pci_register_driver+0x51/0xa6
>  [<ffffffff800a8d1e>] sys_init_module+0xaf/0x1f2
>  [<ffffffff8005d28d>] tracesys+0xd5/0xe0
> 
> igb 0000:09:00.1: 4 vfs allocated
> igb 0000:09:00.1: Intel(R) Gigabit Ethernet Network Connection
> igb 0000:09:00.1: eth1: (PCIe:2.5Gb/s:Width x4) 00:1b:21:36:79:f1
> igb 0000:09:00.1: eth1: PBA No: e43709-003
> igb 0000:09:00.1: Using MSI-X interrupts. 2 rx queue(s), 1 tx queue(s)
> ADDRCONF(NETDEV_UP): eth0: link is not ready
> 
> 
> 4. #lspci|grep 82576
> 
> lspci|grep 82576
> 09:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection
> (rev 01)
> 09:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection
> (rev 01)
> 09:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
> 09:10.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
> 09:10.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
> 09:10.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
> 09:10.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
> 09:10.5 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
> 09:10.6 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
> 09:10.7 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)

Use modprobe.d and you shouldn't have these issues.

> 5. I need to configure eth1 up using #ifconfig eth1 up, cause actions above
> result in eth1 disappeared on AMD host.And on Intel host,actions above won't
> cause eth1 disappear

The ethX devices associated with the igb device will certainly go away when the igb module is removed on either AMD or Intel.  The difference is more likely something in /etc/sysconfig/network-scripts that causes eth1 to be automatically configured when it reappears on the Intel host that's missing on the AMD host.
Comment 21 Alex Williamson 2010-11-23 13:05:53 EST
Ok, so this seems to be just need the AMD incantation of -cpu qemu64,model=13, which is -cpu qemu64,family=16.  This makes the processor appear as an Opteron class processor, which Windows seems to be willing to enable MSI support for.  I did not do an exhaustive search to find the minimum feature set Windows will accept for enabling MSI.  Re-assigning to John since he's dealt with this problem in RHEL6.
Comment 24 RHEL Product and Program Management 2011-01-11 15:37:08 EST
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated in the
current release, Red Hat is unfortunately unable to address this
request at this time. Red Hat invites you to ask your support
representative to propose this request, if appropriate and relevant,
in the next release of Red Hat Enterprise Linux.
Comment 25 RHEL Product and Program Management 2011-01-11 17:56:39 EST
This request was erroneously denied for the current release of
Red Hat Enterprise Linux.  The error has been fixed and this
request has been re-proposed for the current release.
Comment 31 Alex Williamson 2011-11-01 13:32:26 EDT
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
When using PCI device assignment with a 32bit Microsoft Windows 2008 guest on an AMD-based host system, the assigned device may fail to work if it relies on MSI or MSI-X based interrupts.  The reason for this is that the 32bit version of Microsoft Windows 2008 does not enable MSI based interrupts for the family of processor exposed to the guest.  To work around this problem, the user may wish to move to a RHEL6 host, use a 64bit version of the guest operating system, or employ a wrapper script to modify the processor family exposed to the guest as follows (Note this is only for 32bit Windows guests):

1) Create wrapper script

$ cat /usr/libexec/qemu-kvm.family16
#!/bin/sh

ARGS=$@

echo $ARGS | grep -q ' -cpu '
if [ $? -eq 0 ]; then
    for model in $(/usr/libexec/qemu-kvm -cpu ? \
                   | sed 's|^x86||g' | tr -d [:blank:]); do
        ARGS=$(echo $ARGS | \
               sed "s|-cpu $model|-cpu $model,family=16|g")
    done
else
    ARGS="$ARGS -cpu qemu64,family=16"
fi

echo "$0: exec /usr/libexec/qemu-kvm $ARGS" >&2

exec /usr/libexec/qemu-kvm $ARGS

2) Make script executable

$ chmod 755 /usr/libexec/qemu-kvm.family16

3) Set selinux permissions

$ restorecon /usr/libexec/qemu-kvm.family16

4) Update guest XML to use the new wrapper

$ virsh edit $GUEST

Replace:

<emulator>/usr/libexec/qemu-kvm</emulator>

With:

<emulator>/usr/libexec/qemu-kvm.family16</emulator>
Comment 33 Ronen Hod 2011-11-14 07:39:55 EST
OOPS, we forgot to remove the cond-NACK. Actually, I don't know how to remove it.
Anyhow, the issue is resolved using a Technical Note that Alex already filled.
I returned it to RHEL5.8.

Note You need to log in before you can comment on or make changes to this bug.