Bug 599065
Summary: | PCI passthrough w/ shared IRQ broken | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Tamas Vincze <tom> | ||||
Component: | kernel-xen | Assignee: | Don Dutile (Red Hat) <ddutile> | ||||
Status: | CLOSED WONTFIX | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 5.5 | CC: | ddugger, ddutile, drjones, lersek, xen-maint | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-07-29 10:35:50 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 514490 | ||||||
Attachments: |
|
Possible solution? http://lists.xensource.com/archives/html/xen-devel/2010-02/msg00832.html diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c index e138053..923de2e 100644 --- a/arch/x86/pci/xen.c +++ b/arch/x86/pci/xen.c @@ -25,7 +25,7 @@ static int xen_pcifront_enable_irq(struct pci_dev *dev) if (dev->irq < 0) return -EINVAL; - rc = xen_allocate_pirq(dev->irq, 0, "pcifront"); + rc = xen_allocate_pirq(dev->irq, 1 /* share */, "pcifront"); if (rc < 0) { dev_warn(&dev->dev, "Xen PCI IRQ: %d, failed to register:%d\n", dev->irq, rc); More information needed: (a) guest kernel version ? ... and pls provide details of dom0 (kernel version, xen(tools) version). (b) what tree is the patch listed in c#1 from ? -- arch/x86/pci/xen.c is _not_ in latest xen tree nor in latest linux tree. -- appears the file may only exist in Jeremy's xen/master tree, and would only be valid for rhel6, _if_ the whole file was backported into rhel6. cc-ing Intel partner in case they can add more info as well. a) Both dom0 and the guest are 2.6.18-194.3.1.el5xen b) Haven't checked the patch further. I added noirqdebug to both the dom0 and domU kernel command lines and that fixed the problem: the interrupts no longer get disabled, but probably still aren't handled properly. dom0 has xen-3.0.3-105.el5_5.2 Justification for the WONTFIX resolution: Passing through a device that shares an interrupt with other dom0/host devices, or with devices assigned to other guests, is not supported for security reasons. Such configurations are therefore not subject to targeted testing either. The proposed fix is based on upstream (2.6.3x), whose interrupt dispatching code differs significantly from that of RHEL-5. |
Created attachment 419079 [details] lspci -v I have a USB controller that I attached to a PV domU using PCI passthrough. Unfortunately VT-d is not supported by the chipset. It has 3 IRQs that are shared with dom0 devices. After a few hours the interrupts get disabled in dom0 and domU. The passed through device: 04:00.0 USB Controller: NEC Corporation USB (rev 43) (prog-if 10 [OHCI]) Subsystem: NEC Corporation Hama USB 2.0 CardBus Flags: bus master, medium devsel, latency 32, IRQ 16 Memory at fc300000 (32-bit, non-prefetchable) [size=4K] Capabilities: [40] Power Management version 2 04:00.1 USB Controller: NEC Corporation USB (rev 43) (prog-if 10 [OHCI]) Subsystem: NEC Corporation Hama USB 2.0 CardBus Flags: bus master, medium devsel, latency 32, IRQ 17 Memory at fc301000 (32-bit, non-prefetchable) [size=4K] Capabilities: [40] Power Management version 2 04:00.2 USB Controller: NEC Corporation USB 2.0 (rev 04) (prog-if 20 [EHCI]) Subsystem: NEC Corporation USB 2.0 Flags: bus master, medium devsel, latency 132, IRQ 18 Memory at fc302000 (32-bit, non-prefetchable) [size=256] Capabilities: [40] Power Management version 2 IRQs 16, 17 and 18 are shared, see lspci output. === dom0 dmesg === irq 17: nobody cared (try booting with the "irqpoll" option) Call Trace: <IRQ> [<ffffffff802b3e43>] __report_bad_irq+0x30/0x7d [<ffffffff802b407a>] note_interrupt+0x1ea/0x22b [<ffffffff802b3572>] __do_IRQ+0xbd/0x103 [<ffffffff8029043f>] _local_bh_enable+0x61/0xc5 [<ffffffff8026df48>] do_IRQ+0xe7/0xf5 [<ffffffff803b3ae7>] evtchn_do_upcall+0x13b/0x1fb [<ffffffff802608d6>] do_hypervisor_callback+0x1e/0x2c <EOI> [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000 [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000 [<ffffffff8026f4eb>] raw_safe_halt+0x84/0xa8 [<ffffffff8026ca80>] xen_idle+0x38/0x4a [<ffffffff8024b0aa>] cpu_idle+0x97/0xba [<ffffffff8064cb0f>] start_kernel+0x21f/0x224 [<ffffffff8064c1e5>] _sinittext+0x1e5/0x1eb handlers: [<ffffffff803e7cb2>] (usb_hcd_irq+0x0/0x55) [<ffffffff803e7cb2>] (usb_hcd_irq+0x0/0x55) Disabling IRQ #17 === domU dmesg === irq 18: nobody cared (try booting with the "irqpoll" option) Call Trace: <IRQ> [<ffffffff802b3e43>] __report_bad_irq+0x30/0x7d [<ffffffff802b407a>] note_interrupt+0x1ea/0x22b [<ffffffff802b3572>] __do_IRQ+0xbd/0x103 [<ffffffff8029043f>] _local_bh_enable+0x61/0xc5 [<ffffffff8026df48>] do_IRQ+0xe7/0xf5 [<ffffffff803b3ae7>] evtchn_do_upcall+0x13b/0x1fb [<ffffffff802608d6>] do_hypervisor_callback+0x1e/0x2c <EOI> [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000 [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000 [<ffffffff8026f4eb>] raw_safe_halt+0x84/0xa8 [<ffffffff8026ca80>] xen_idle+0x38/0x4a [<ffffffff8024b0aa>] cpu_idle+0x97/0xba [<ffffffff8064cb0f>] start_kernel+0x21f/0x224 [<ffffffff8064c1e5>] _sinittext+0x1e5/0x1eb handlers: [<ffffffff803e7cb2>] (usb_hcd_irq+0x0/0x55) Disabling IRQ #18 irq 16: nobody cared (try booting with the "irqpoll" option) Call Trace: <IRQ> [<ffffffff802b3e43>] __report_bad_irq+0x30/0x7d [<ffffffff802b407a>] note_interrupt+0x1ea/0x22b [<ffffffff802b3572>] __do_IRQ+0xbd/0x103 [<ffffffff8029043f>] _local_bh_enable+0x61/0xc5 [<ffffffff8026df48>] do_IRQ+0xe7/0xf5 [<ffffffff803b3ae7>] evtchn_do_upcall+0x13b/0x1fb [<ffffffff802608d6>] do_hypervisor_callback+0x1e/0x2c <EOI> [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000 [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000 [<ffffffff8026f4eb>] raw_safe_halt+0x84/0xa8 [<ffffffff8026ca80>] xen_idle+0x38/0x4a [<ffffffff8024b0aa>] cpu_idle+0x97/0xba [<ffffffff8064cb0f>] start_kernel+0x21f/0x224 [<ffffffff8064c1e5>] _sinittext+0x1e5/0x1eb handlers: [<ffffffff803e7cb2>] (usb_hcd_irq+0x0/0x55) Disabling IRQ #16 Initial Xen IRQ info: (XEN) IRQ 16 Vec144: type=IO-APIC-level status=00000010 in-flight=0 domain-list=0(----),3(----), (XEN) IRQ 17 Vec152: type=IO-APIC-level status=00000010 in-flight=0 domain-list=0(----),3(----), (XEN) IRQ 18 Vec160: type=IO-APIC-level status=00000010 in-flight=0 domain-list=0(----),3(----), Afterwards: (XEN) IRQ 16 Vec144: type=IO-APIC-level status=00000010 in-flight=0 domain-list=0(----), (XEN) IRQ 17 Vec152: type=IO-APIC-level status=00000010 in-flight=0 domain-list=3(----), (XEN) IRQ 18 Vec160: type=IO-APIC-level status=00000010 in-flight=0 domain-list=0(----), Fortunately the whole system didn't crash this time, but it happened previously that the LSI disk controllers IRQ got disabled in dom0 that required a hardware reset.