Bug 509207

Summary: VT-d BUG() during normal traffic in ixgbe device
Product: Red Hat Enterprise Linux 5 Reporter: Chris Wright <chrisw>
Component: kernelAssignee: Chris Wright <chrisw>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: low Docs Contact:
Priority: high    
Version: 5.4CC: agospoda, bburns, ddutile, dzickus, jbao, kzhang, mjenner, mwagner, syeghiay
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-02 08:18:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
iova fix none

Description Chris Wright 2009-07-01 18:51:48 UTC
Description of problem:

We're hitting a BUG() in VT-d code triggered as soon as sending any amount of real traffic through the ixgbe device (normal ping is not sufficient).

Version-Release number of selected component (if applicable):

kernel-15x (seemed to come w/ the recent ixgbe update)

How reproducible:

Every time

Steps to Reproduce:
1. boot kernel with intel_iommu=on
2. generate traffic (say netperf TCP_STREAM test) over ixgbe device
3. BUG()
  
Actual results:

Kills the host

Expected results:

Traffic completes, no BUG()

Additional info:

----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at drivers/pci/intel-iommu.c:1521
invalid opcode: 0000 [1] SMP 
last sysfs file: /devices/pci0000:00/0000:00:00.0/resource
CPU 0 
Modules linked in: ipv6 xfrm_nalgo crypto_api autofs4 hidp rfcomm l2cap bluetooth sunrpc cpufreq_ondemand acpi_cpufreq freq_table dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport ksm(U) kvm_intel(U) kvm(U) joydev ixgbe igb i2c_i801 sr_mod cdrom 8021q i2c_core shpchp sg pcspkr serio_raw dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 0, comm: swapper Tainted: G      2.6.18-156.el5 #1
RIP: 0010:[<ffffffff80160f65>]  [<ffffffff80160f65>] domain_page_mapping+0x9a/0xff
RSP: 0018:ffffffff8043dd60  EFLAGS: 00010206
RAX: 00000001b3899002 RBX: 00000000001aacc4 RCX: ffff8101bf9fc80c
RDX: 00000001b3898000 RSI: ffff8101b3898ff0 RDI: ffff8101bf9fc808
RBP: 00000001aacc4000 R08: 0000000000000002 R09: 00000001aacc4000
R10: 0000000000000002 R11: ffff8101c54ea000 R12: 00000000001aacc5
R13: 0000000000000002 R14: ffff8101bf9fc7c0 R15: 00000000001aacc4
FS:  0000000000000000(0000) GS:ffffffff803c1000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000627d768 CR3: 00000001a2436000 CR4: 00000000000026e0
Process swapper (pid: 0, threadinfo ffffffff803f2000, task ffffffff80300ae0)
Stack:  00000000efffe000 0000000000001000 00000001aacc4000 ffff8101bf9fc7c0
 00000000efffe000 ffff8101bae6c980 0000000000000002 ffffffff80162c52
 ffff8101c54ea000 ffff8101bfd80c40 0000000000000000 ffffc20010210fd8
Call Trace:
 <IRQ>  [<ffffffff80162c52>] __intel_map_single+0xf0/0x171
 [<ffffffff8822d2a3>] :ixgbe:ixgbe_alloc_rx_buffers+0x10c/0x23b
 [<ffffffff8822e14e>] :ixgbe:ixgbe_clean_rx_irq+0x736/0x781
 [<ffffffff88231fa2>] :ixgbe:ixgbe_clean_rxonly+0x7e/0x126
 [<ffffffff8000cf00>] net_rx_action+0xaa/0x1e3
 [<ffffffff80012a0c>] __do_softirq+0x89/0x133
 [<ffffffff8005f2fc>] call_softirq+0x1c/0x28
 [<ffffffff8006db03>] do_softirq+0x2c/0x85
 [<ffffffff8005ec8e>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff80198419>] acpi_processor_idle+0x274/0x463
 [<ffffffff8019840f>] acpi_processor_idle+0x26a/0x463
 [<ffffffff801981a5>] acpi_processor_idle+0x0/0x463
 [<ffffffff801981a5>] acpi_processor_idle+0x0/0x463
 [<ffffffff800499b7>] cpu_idle+0x95/0xb8
 [<ffffffff803fd801>] start_kernel+0x220/0x225
 [<ffffffff803fd22f>] _sinittext+0x22f/0x236


Code: 0f 0b 68 18 75 2b 80 c2 f1 05 48 89 ea 48 81 e2 00 f0 ff ff 
RIP  [<ffffffff80160f65>] domain_page_mapping+0x9a/0xff
 RSP <ffffffff8043dd60>
 <0>Kernel panic - not syncing: Fatal exception

Comment 1 Chris Wright 2009-07-01 21:44:25 UTC
Created attachment 350205 [details]
iova fix

Here's the fix, it just went upstream.

Comment 3 RHEL Program Management 2009-07-02 02:52:33 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 8 Don Zickus 2009-07-14 20:58:10 UTC
in kernel-2.6.18-158.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 18 errata-xmlrpc 2009-09-02 08:18:34 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html