Bug 599065 - PCI passthrough w/ shared IRQ broken
PCI passthrough w/ shared IRQ broken
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen (Show other bugs)
5.5
x86_64 Linux
low Severity urgent
: rc
: ---
Assigned To: Don Dutile
Red Hat Kernel QE team
:
Depends On:
Blocks: 514490
  Show dependency treegraph
 
Reported: 2010-06-02 11:45 EDT by Tamas Vincze
Modified: 2011-07-29 10:36 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-07-29 06:35:50 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
lspci -v (14.84 KB, text/plain)
2010-06-02 11:45 EDT, Tamas Vincze
no flags Details

  None (edit)
Description Tamas Vincze 2010-06-02 11:45:47 EDT
Created attachment 419079 [details]
lspci -v

I have a USB controller that I attached to a PV domU using PCI passthrough.
Unfortunately VT-d is not supported by the chipset.
It has 3 IRQs that are shared with dom0 devices.
After a few hours the interrupts get disabled in dom0 and domU.

The passed through device:

04:00.0 USB Controller: NEC Corporation USB (rev 43) (prog-if 10 [OHCI])
	Subsystem: NEC Corporation Hama USB 2.0 CardBus
	Flags: bus master, medium devsel, latency 32, IRQ 16
	Memory at fc300000 (32-bit, non-prefetchable) [size=4K]
	Capabilities: [40] Power Management version 2

04:00.1 USB Controller: NEC Corporation USB (rev 43) (prog-if 10 [OHCI])
	Subsystem: NEC Corporation Hama USB 2.0 CardBus
	Flags: bus master, medium devsel, latency 32, IRQ 17
	Memory at fc301000 (32-bit, non-prefetchable) [size=4K]
	Capabilities: [40] Power Management version 2

04:00.2 USB Controller: NEC Corporation USB 2.0 (rev 04) (prog-if 20 [EHCI])
	Subsystem: NEC Corporation USB 2.0
	Flags: bus master, medium devsel, latency 132, IRQ 18
	Memory at fc302000 (32-bit, non-prefetchable) [size=256]
	Capabilities: [40] Power Management version 2

IRQs 16, 17 and 18 are shared, see lspci output.

=== dom0 dmesg ===

irq 17: nobody cared (try booting with the "irqpoll" option)

Call Trace:
 <IRQ>  [<ffffffff802b3e43>] __report_bad_irq+0x30/0x7d
 [<ffffffff802b407a>] note_interrupt+0x1ea/0x22b
 [<ffffffff802b3572>] __do_IRQ+0xbd/0x103
 [<ffffffff8029043f>] _local_bh_enable+0x61/0xc5
 [<ffffffff8026df48>] do_IRQ+0xe7/0xf5
 [<ffffffff803b3ae7>] evtchn_do_upcall+0x13b/0x1fb
 [<ffffffff802608d6>] do_hypervisor_callback+0x1e/0x2c
 <EOI>  [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000
 [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000
 [<ffffffff8026f4eb>] raw_safe_halt+0x84/0xa8
 [<ffffffff8026ca80>] xen_idle+0x38/0x4a
 [<ffffffff8024b0aa>] cpu_idle+0x97/0xba
 [<ffffffff8064cb0f>] start_kernel+0x21f/0x224
 [<ffffffff8064c1e5>] _sinittext+0x1e5/0x1eb

handlers:
[<ffffffff803e7cb2>] (usb_hcd_irq+0x0/0x55)
[<ffffffff803e7cb2>] (usb_hcd_irq+0x0/0x55)
Disabling IRQ #17

=== domU dmesg ===

irq 18: nobody cared (try booting with the "irqpoll" option)

Call Trace:
 <IRQ>  [<ffffffff802b3e43>] __report_bad_irq+0x30/0x7d
 [<ffffffff802b407a>] note_interrupt+0x1ea/0x22b
 [<ffffffff802b3572>] __do_IRQ+0xbd/0x103
 [<ffffffff8029043f>] _local_bh_enable+0x61/0xc5
 [<ffffffff8026df48>] do_IRQ+0xe7/0xf5
 [<ffffffff803b3ae7>] evtchn_do_upcall+0x13b/0x1fb
 [<ffffffff802608d6>] do_hypervisor_callback+0x1e/0x2c
 <EOI>  [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000
 [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000
 [<ffffffff8026f4eb>] raw_safe_halt+0x84/0xa8
 [<ffffffff8026ca80>] xen_idle+0x38/0x4a
 [<ffffffff8024b0aa>] cpu_idle+0x97/0xba
 [<ffffffff8064cb0f>] start_kernel+0x21f/0x224
 [<ffffffff8064c1e5>] _sinittext+0x1e5/0x1eb

handlers:
[<ffffffff803e7cb2>] (usb_hcd_irq+0x0/0x55)
Disabling IRQ #18

irq 16: nobody cared (try booting with the "irqpoll" option)

Call Trace:
 <IRQ>  [<ffffffff802b3e43>] __report_bad_irq+0x30/0x7d
 [<ffffffff802b407a>] note_interrupt+0x1ea/0x22b
 [<ffffffff802b3572>] __do_IRQ+0xbd/0x103
 [<ffffffff8029043f>] _local_bh_enable+0x61/0xc5
 [<ffffffff8026df48>] do_IRQ+0xe7/0xf5
 [<ffffffff803b3ae7>] evtchn_do_upcall+0x13b/0x1fb
 [<ffffffff802608d6>] do_hypervisor_callback+0x1e/0x2c
 <EOI>  [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000
 [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000
 [<ffffffff8026f4eb>] raw_safe_halt+0x84/0xa8
 [<ffffffff8026ca80>] xen_idle+0x38/0x4a
 [<ffffffff8024b0aa>] cpu_idle+0x97/0xba
 [<ffffffff8064cb0f>] start_kernel+0x21f/0x224
 [<ffffffff8064c1e5>] _sinittext+0x1e5/0x1eb

handlers:
[<ffffffff803e7cb2>] (usb_hcd_irq+0x0/0x55)
Disabling IRQ #16


Initial Xen IRQ info:
(XEN)     IRQ 16 Vec144: type=IO-APIC-level   status=00000010 in-flight=0 domain-list=0(----),3(----),
(XEN)     IRQ 17 Vec152: type=IO-APIC-level   status=00000010 in-flight=0 domain-list=0(----),3(----),
(XEN)     IRQ 18 Vec160: type=IO-APIC-level   status=00000010 in-flight=0 domain-list=0(----),3(----),

Afterwards:
(XEN)     IRQ 16 Vec144: type=IO-APIC-level   status=00000010 in-flight=0 domain-list=0(----),
(XEN)     IRQ 17 Vec152: type=IO-APIC-level   status=00000010 in-flight=0 domain-list=3(----),
(XEN)     IRQ 18 Vec160: type=IO-APIC-level   status=00000010 in-flight=0 domain-list=0(----),


Fortunately the whole system didn't crash this time, but it happened previously that the LSI disk controllers IRQ got disabled in dom0 that required a hardware reset.
Comment 1 Tamas Vincze 2010-06-02 11:49:27 EDT
Possible solution?
http://lists.xensource.com/archives/html/xen-devel/2010-02/msg00832.html

diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
index e138053..923de2e 100644
--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -25,7 +25,7 @@ static int xen_pcifront_enable_irq(struct pci_dev *dev)
        if (dev->irq < 0)
                return -EINVAL;
 
-       rc = xen_allocate_pirq(dev->irq, 0, "pcifront");
+       rc = xen_allocate_pirq(dev->irq, 1 /* share */, "pcifront");
        if (rc < 0) {
                dev_warn(&dev->dev, "Xen PCI IRQ: %d, failed to register:%d\n",
                         dev->irq, rc);
Comment 2 Don Dutile 2010-06-16 15:03:37 EDT
 More information needed:

(a) guest kernel version ?
    ... and pls provide details of dom0 (kernel version, xen(tools) version).

(b) what tree is the patch listed in c#1 from ?
    -- arch/x86/pci/xen.c  is _not_ in latest xen tree nor in latest linux tree.
    -- appears the file may only exist in Jeremy's xen/master tree,
       and would only be valid for rhel6, _if_ the whole file was backported into
       rhel6.  

cc-ing Intel partner in case they can add more info as well.
Comment 3 Tamas Vincze 2010-06-16 15:55:33 EDT
a) Both dom0 and the guest are 2.6.18-194.3.1.el5xen
b) Haven't checked the patch further.

I added noirqdebug to both the dom0 and domU kernel command lines and that fixed the problem: the interrupts no longer get disabled, but probably still aren't handled properly.
Comment 4 Tamas Vincze 2010-06-16 15:58:04 EDT
dom0 has xen-3.0.3-105.el5_5.2
Comment 10 Laszlo Ersek 2011-07-29 10:36:02 EDT
Justification for the WONTFIX resolution:

Passing through a device that shares an interrupt with other dom0/host devices, or with devices assigned to other guests, is not supported for security reasons. Such configurations are therefore not subject to targeted testing either.

The proposed fix is based on upstream (2.6.3x), whose interrupt dispatching code differs significantly from that of RHEL-5.

Note You need to log in before you can comment on or make changes to this bug.