Bug 652310 - xen hvmloader: fix off-by-one-bit error when initializing PCI devices
Summary: xen hvmloader: fix off-by-one-bit error when initializing PCI devices
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen
Version: 5.6
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: rc
: ---
Assignee: Michal Novotny
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 514500
TreeView+ depends on / blocked
 
Reported: 2010-11-11 15:59 UTC by Don Dutile (Red Hat)
Modified: 2014-02-02 22:38 UTC (History)
8 users (show)

Fixed In Version: xen-3.0.3-124.el5
Doc Type: Bug Fix
Doc Text:
When initializing PCI devices, due to an off-by-one-bit error in the hvmloader BIOS, fully-virtualized Xen guests with more than 12 PCI devices could not be created. This bug has been fixed, and now up to 28 PCI devices can be attached to a fully-virtualized guest.
Clone Of:
Environment:
Last Closed: 2011-07-21 09:17:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Backport of c/s 22383 (1.23 KB, patch)
2011-01-26 13:26 UTC, Michal Novotny
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:1070 0 normal SHIPPED_LIVE xen bug fix and enhancement update 2011-07-21 09:12:56 UTC

Description Don Dutile (Red Hat) 2010-11-11 15:59:57 UTC
Description of problem:
Limited ability to add devices to HVM xen guests due to off-by-1 error 
in xen hvmloader.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Try to create xen HVM guest with > 12-13 attached PCI devices
2.
3.
  
Actual results:
Fails around 12th device (first 3 PCI devices consumed by emulated devices).

Expected results:
Can attach up to (approx.) 28 PCI devices to xen HVM guest

Additional info:
Need to backport this patch from upstream:
diff -r 7188d1e4b0e1 tools/firmware/hvmloader/hvmloader.c
--- a/tools/firmware/hvmloader/hvmloader.c	Tue Nov 09 12:00:05 2010 +0000
+++ b/tools/firmware/hvmloader/hvmloader.c	Wed Nov 10 13:19:59 2010 +0000
@@ -196,7 +196,7 @@ static void pci_setup(void)
     outb(0x4d1, (uint8_t)(PCI_ISA_IRQ_MASK >> 8));
 
     /* Scan the PCI bus and map resources. */
-    for ( devfn = 0; devfn < 128; devfn++ )
+    for ( devfn = 0; devfn < 256; devfn++ )
     {
         class     = pci_readw(devfn, PCI_CLASS_DEVICE);
         vendor_id = pci_readw(devfn, PCI_VENDOR_ID);
@@ -470,7 +470,7 @@ static int scan_etherboot_nic(uint32_t c
     uint16_t class, vendor_id, device_id;
     int rom_size = 0;
 
-    for ( devfn = 0; (devfn < 128) && !rom_size; devfn++ )
+    for ( devfn = 0; (devfn < 256) && !rom_size; devfn++ )
     {
         class     = pci_readw(devfn, PCI_CLASS_DEVICE);
         vendor_id = pci_readw(devfn, PCI_VENDOR_ID);
@@ -497,7 +497,7 @@ static int pci_load_option_roms(uint32_t
     uint16_t vendor_id, device_id;
     uint8_t devfn, class;
 
-    for ( devfn = 0; devfn < 128; devfn++ )
+    for ( devfn = 0; devfn < 256; devfn++ )
     {
         class     = pci_readb(devfn, PCI_CLASS_DEVICE + 1);
         vendor_id = pci_readw(devfn, PCI_VENDOR_ID);

Comment 4 Michal Novotny 2011-01-26 13:09:51 UTC
(In reply to comment #0)
> 
> Additional info:
> Need to backport this patch from upstream:
> diff -r 7188d1e4b0e1 tools/firmware/hvmloader/hvmloader.c
> --- a/tools/firmware/hvmloader/hvmloader.c Tue Nov 09 12:00:05 2010 +0000
> +++ b/tools/firmware/hvmloader/hvmloader.c Wed Nov 10 13:19:59 2010 +0000
> @@ -196,7 +196,7 @@ static void pci_setup(void)
>      outb(0x4d1, (uint8_t)(PCI_ISA_IRQ_MASK >> 8));
> 
>      /* Scan the PCI bus and map resources. */
> -    for ( devfn = 0; devfn < 128; devfn++ )
> +    for ( devfn = 0; devfn < 256; devfn++ )
>      {
>          class     = pci_readw(devfn, PCI_CLASS_DEVICE);
>          vendor_id = pci_readw(devfn, PCI_VENDOR_ID);

This is the only bit we have there in our codebase so the upstream patch is applicable only for this hunk above.

> @@ -470,7 +470,7 @@ static int scan_etherboot_nic(uint32_t c
>      uint16_t class, vendor_id, device_id;
>      int rom_size = 0;
> 
> -    for ( devfn = 0; (devfn < 128) && !rom_size; devfn++ )
> +    for ( devfn = 0; (devfn < 256) && !rom_size; devfn++ )
>      {
>          class     = pci_readw(devfn, PCI_CLASS_DEVICE);
>          vendor_id = pci_readw(devfn, PCI_VENDOR_ID);
> @@ -497,7 +497,7 @@ static int pci_load_option_roms(uint32_t
>      uint16_t vendor_id, device_id;
>      uint8_t devfn, class;
> 
> -    for ( devfn = 0; devfn < 128; devfn++ )
> +    for ( devfn = 0; devfn < 256; devfn++ )
>      {
>          class     = pci_readb(devfn, PCI_CLASS_DEVICE + 1);
>          vendor_id = pci_readw(devfn, PCI_VENDOR_ID);

We don't even have scan_etherboot_nic()/pci_load_option_roms() functions or anything similar to them. I'm working on patch to add at least the hunk above we have.

Michal

Comment 5 Michal Novotny 2011-01-26 13:26:35 UTC
Created attachment 475386 [details]
Backport of c/s 22383

This is the backport of c/s 22383 from upstream but for the hunk that's in our codebase as well.

But unfortunately I don't have hardware for proper testing so Don, could you please test it using RPMs from:

http://people.redhat.com/minovotn/xen/

Thanks,
Michal

Comment 9 Don Dutile (Red Hat) 2011-03-24 21:54:44 UTC
tested and passed.

Comment 10 Qixiang Wan 2011-04-21 12:56:53 UTC
hi Don, 
Does the 'pci devices' mean 'qemu emulated pci device' or 'pass-through pci device' or both of them?

With xen 120 build (which doesn't include the fix), I can boot up hvm guest with 8 qemu emulated rtl8139 nics + 14 pass-through Intel 82576 VFs + 2 pass-through Broadcom interfaces. So there is 30 pci devices in guest totally, I'm not clear with how to reproduce the issue and verify the fix.

[guest ~] $ lspci

00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.2 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 01)
00:02.0 VGA compatible controller: Cirrus Logic GD 5446
00:03.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
00:04.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
00:05.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
00:06.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
00:07.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
00:08.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
00:09.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
00:0a.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
00:0b.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
00:0c.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
00:0d.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
00:0e.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
00:0f.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
00:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
00:11.0 SCSI storage controller: XenSource, Inc. Xen Platform Device (rev 01)
00:12.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 20)
00:13.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 20)
00:14.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 20)
00:15.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 20)
00:16.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 20)
00:17.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 20)
00:18.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 20)
00:19.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 20)

Comment 11 Paolo Bonzini 2011-04-21 13:51:57 UTC
Do the devices above 00:0f.0 work?

Comment 12 Qixiang Wan 2011-04-21 16:03:13 UTC
(In reply to comment #11)
> Do the devices above 00:0f.0 work?
There is no link attached to the NIC, so haven't got a chance to try it.

However, I think I have reproduced the defect. The defect should only have effect on RHEL6 pv_ops kernel (RHEL6.0 with xen_pv_hvm=enable or RHEL6.1 default). so can't be reproduced with RHEL5 or RHEL6.0 without xen_pv_hvm enabled.

before the fix, boot up the guest with 14 VFs (no issue with 12 VFs) will get the calltrace as:
-----------------------------------------------------------------------
ERST: Table is not found!
xen-platform-pci 0000:00:11.0: can't derive routing for PCI INT A
xen-platform-pci 0000:00:11.0: PCI INT A: no GSI
IRQ handler type mismatch for IRQ 0
current handler: timer
Pid: 1, comm: swapper Not tainted 2.6.32-131.0.1.el6.x86_64 #1
Call Trace:
 [<ffffffff810d79e2>] ? __setup_irq+0x382/0x3c0
 [<ffffffff810d8114>] ? request_threaded_irq+0x154/0x2f0
 [<ffffffff812fa420>] ? do_hvm_evtchn_intr+0x0/0x20
 [<ffffffff814cb295>] ? platform_pci_init+0x1e4/0x325
 [<ffffffff81280ae7>] ? local_pci_probe+0x17/0x20
 [<ffffffff81281cd1>] ? pci_device_probe+0x101/0x120
 [<ffffffff8133b6c2>] ? driver_sysfs_add+0x62/0x90
 [<ffffffff8133b860>] ? driver_probe_device+0xa0/0x2a0
 [<ffffffff8133bb0b>] ? __driver_attach+0xab/0xb0
 [<ffffffff8133ba60>] ? __driver_attach+0x0/0xb0
 [<ffffffff8133aac4>] ? bus_for_each_dev+0x64/0x90
 [<ffffffff8133b5fe>] ? driver_attach+0x1e/0x20
 [<ffffffff8133af00>] ? bus_add_driver+0x200/0x300
 [<ffffffff8133be36>] ? driver_register+0x76/0x140
 [<ffffffff81281f36>] ? __pci_register_driver+0x56/0xd0
 [<ffffffff81bf25f8>] ? platform_pci_module_init+0x0/0x38
 [<ffffffff81bf2616>] ? platform_pci_module_init+0x1e/0x38
 [<ffffffff81bf1135>] ? erst_init+0x0/0x25a
 [<ffffffff8100204c>] ? do_one_initcall+0x3c/0x1d0
 [<ffffffff81bbd884>] ? kernel_init+0x29d/0x2f9
 [<ffffffff8100c1ca>] ? child_rip+0xa/0x20
 [<ffffffff81bbd5e7>] ? kernel_init+0x0/0x2f9
 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
request_irq failed err=-16
xen-platform-pci 0000:00:11.0: can't derive routing for PCI INT A
xen-platform-pci: probe of 0000:00:11.0 failed with error -16
-----------------------------------------------------------------------

Then tried with 129 , the error was disappeared. and patched xen-120 build with the patch (xen-fix-off-by-one-error-on-init-pci-device.patch) , confirmed the fix resolved the above error.

Comment 15 Qixiang Wan 2011-04-22 05:00:31 UTC
change to VERIFIED per comment #12.

Comment 16 Paolo Bonzini 2011-04-22 08:22:54 UTC
Indeed the platform-pci device is one of those allocated above 0000:0f.0 and it doesn't work without the patch.  Great, thanks!

Comment 17 Tomas Capek 2011-07-13 13:27:42 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
When initializing PCI devices, due to an off-by-one-bit error in the hvmloader utility, HVM guest with more than 12 PCI devices could not be created. This bug has been fixed, and now up to 28 PCI devices can be attached to a HVM guest.

Comment 18 Paolo Bonzini 2011-07-13 14:54:24 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-When initializing PCI devices, due to an off-by-one-bit error in the hvmloader utility, HVM guest with more than 12 PCI devices could not be created. This bug has been fixed, and now up to 28 PCI devices can be attached to a HVM guest.+When initializing PCI devices, due to an off-by-one-bit error in the hvmloader BIOS, fully-virtualized Xen guests with more than 12 PCI devices could not be created. This bug has been fixed, and now up to 28 PCI devices can be attached to a fully-virtualized guest.

Comment 19 errata-xmlrpc 2011-07-21 09:17:28 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-1070.html

Comment 20 errata-xmlrpc 2011-07-21 11:59:33 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-1070.html


Note You need to log in before you can comment on or make changes to this bug.