RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 713458 - intel-iommu: missing flush prior to removing domains + avoid broken vm/si domain unlinking
Summary: intel-iommu: missing flush prior to removing domains + avoid broken vm/si dom...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.1
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: rc
: ---
Assignee: Frantisek Hrbata
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On: 705441
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-06-15 13:17 UTC by RHEL Program Management
Modified: 2011-07-12 21:13 UTC (History)
13 users (show)

Fixed In Version: kernel-2.6.32-131.6.1.el6
Doc Type: Bug Fix
Doc Text:
A previously introduced update intended to prevent IOMMU (I/O Memory Management Unit) domain exhaustion introduced two regressions. The first regression was a race where a domain pointer could be freed while a lazy flush algorithm still had a reference to it, eventually causing kernel panic. The second regression was an erroneous reference removal for identity mapped and VM IOMMU domains, causing I/O errors. Both of these regressions could only be triggered on Intel based platforms, supporting VT-d, booted with the intel_iommu=on boot option. With this update, the underlying source code of the intel-iommu driver has been modified to resolve both of these problems. A forced flush is now used to avoid the lazy use after free issue, and extra checks have been added to avoid the erroneous reference removal.
Clone Of:
Environment:
Last Closed: 2011-07-12 21:13:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0928 0 normal SHIPPED_LIVE Moderate: kernel security and bug fix update 2011-07-12 21:11:55 UTC

Description RHEL Program Management 2011-06-15 13:17:46 UTC
This bug has been copied from bug #705441 and has been proposed
to be backported to 6.1 z-stream (EUS).

Comment 6 Chao Yang 2011-07-01 07:22:38 UTC
I have reproduced this issue on a sandy bridge host using test case from bz706004. After detach the USB controller, host kernel panic.

Host info:
# uname -r
2.6.32-131.0.15.el6.x86_64
# cat /proc/cpuinfo 
...
processor	: 7
vendor_id	: GenuineIntel
cpu family	: 6
model		: 42
model name	: Intel(R) Xeon(R) CPU E31280 @ 3.50GHz
...
# lspci | grep -i usb
00:1a.0 USB Controller: Intel Corporation 6 Series Chipset Family USB Enhanced Host Controller #2 (rev 04)
00:1d.0 USB Controller: Intel Corporation 6 Series Chipset Family USB Enhanced Host Controller #1 (rev 04)
# virsh nodedev-list | grep pci
..
pci_0000_00_1d_0
...
# virsh nodedev-dettach pci_0000_00_1d_0
Device pci_0000_00_1d_0 dettached

------------[ cut here ]------------
kernel BUG at mm/slab.c:3067!
invalid opcode: 0000 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/cpu7/topology/thread_siblings
CPU 1 
Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables iptable_filter ipt_REJECT xt_CHECKSUM ip_tables sunrpc cpufreq_ondemand acpi_cpufreq freq_table bridge stp llc ipv6 dm_mirror dm_region_hash dm_log vhost_net macvtap mac]

Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables iptable_filter ipt_REJECT xt_CHECKSUM ip_tables sunrpc cpufreq_ondemand acpi_cpufreq freq_table bridge stp llc ipv6 dm_mirror dm_region_hash dm_log vhost_net macvtap mac]
Pid: 2501, comm: sshd Not tainted 2.6.32-131.0.15.el6.x86_64 #1 Precision T1600
RIP: 0010:[<ffffffff8115a1d4>]  [<ffffffff8115a1d4>] cache_alloc_refill+0x1e4/0x240
RSP: 0018:ffff8802224934a8  EFLAGS: 00010046
RAX: 0000000000000016 RBX: ffff8802205a1f00 RCX: 000000000000003b
RDX: ffff880220596000 RSI: ffff880224307240 RDI: ffff880220194000
RBP: ffff880222493508 R08: ffff880220596000 R09: 0000000000000000
R10: 00000000000fff18 R11: 0000000000000000 R12: ffff88022058d800
R13: ffff880224307240 R14: 0000000000000016 R15: ffff880220194000
FS:  00007f7f9801e7c0(0000) GS:ffff880028220000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc728130000 CR3: 00000002224ae000 CR4: 00000000000426e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process sshd (pid: 2501, threadinfo ffff880222492000, task ffff88021f8e4040)
Stack:
 ffff880222493518 000000001f8e4040 ffff880224307280 00041220205990e0
<0> ffff880224307260 ffff880224307250 ffffffff8100bc8e ffff88021f8e4040
<0> 0000000000000020 ffff8802205a1f00 0000000000000020 0000000000000246
Call Trace:
 [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20
 [<ffffffff8115ac6f>] kmem_cache_alloc+0x15f/0x190
 [<ffffffff81299949>] alloc_iova_mem+0x49/0x60
 [<ffffffff81296e07>] alloc_iova+0x27/0x240
 [<ffffffff81298d85>] intel_alloc_iova+0xb5/0xe0
 [<ffffffff8129ba0e>] __intel_map_single+0xbe/0x210
 [<ffffffff8129bba1>] intel_map_page+0x41/0x50
 [<ffffffffa0218f87>] e1000_xmit_frame+0xa37/0xf30 [e1000e]
 [<ffffffff8141e758>] dev_hard_start_xmit+0x2c8/0x3f0
 [<ffffffff81439d0a>] sch_direct_xmit+0x15a/0x1c0
 [<ffffffff81423098>] dev_queue_xmit+0x388/0x4d0
 [<ffffffffa044a400>] ? br_dev_queue_push_xmit+0x0/0xa0 [bridge]
 [<ffffffffa044a46c>] br_dev_queue_push_xmit+0x6c/0xa0 [bridge]
 [<ffffffffa044a4f8>] br_forward_finish+0x58/0x60 [bridge]
 [<ffffffffa044a690>] __br_deliver+0x60/0x70 [bridge]
 [<ffffffff814b256c>] ? packet_rcv+0x5c/0x440
 [<ffffffffa044a6d5>] br_deliver+0x35/0x40 [bridge]
 [<ffffffffa044944c>] br_dev_xmit+0xbc/0x100 [bridge]
 [<ffffffff8141e758>] dev_hard_start_xmit+0x2c8/0x3f0
 [<ffffffff814230e6>] dev_queue_xmit+0x3d6/0x4d0
 [<ffffffff814583bc>] ip_finish_output+0x13c/0x310
 [<ffffffff81458648>] ip_output+0xb8/0xc0
 [<ffffffff8145790f>] ? __ip_local_out+0x9f/0xb0
 [<ffffffff81457945>] ip_local_out+0x25/0x30
 [<ffffffff81457e20>] ip_queue_xmit+0x190/0x420
 [<ffffffff8146cc71>] tcp_transmit_skb+0x3f1/0x790
 [<ffffffff8146efe7>] tcp_write_xmit+0x1e7/0x9e0
 [<ffffffff8146f970>] __tcp_push_pending_frames+0x30/0xe0
 [<ffffffff8145ef4e>] tcp_push+0x6e/0x90
 [<ffffffff8145ff58>] tcp_sendmsg+0x668/0xa30
 [<ffffffff8140e601>] sock_aio_write+0x151/0x160
 [<ffffffff8117241a>] do_sync_write+0xfa/0x140
 [<ffffffff8108e160>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81211cff>] ? selinux_file_permission+0xbf/0x150
 [<ffffffff812051a6>] ? security_file_permission+0x16/0x20
 [<ffffffff811727e4>] vfs_write+0x184/0x1a0
 [<ffffffff810d1b62>] ? audit_syscall_entry+0x272/0x2a0
 [<ffffffff81173151>] sys_write+0x51/0x90
 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
Code: 89 ff e8 a0 8c 11 00 eb 99 66 0f 1f 44 00 00 41 c7 45 60 01 00 00 00 4d 8b 7d 20 4c 39 7d c0 0f 85 f2 fe ff ff eb 84 0f 0b eb fe <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 eb f4 8b 55 ac 8b 75 bc 31 
RIP  [<ffffffff8115a1d4>] cache_alloc_refill+0x1e4/0x240
 RSP <ffff8802224934a8>
---[ end trace 3da40b0b0e947786 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Pid: 2501, comm: sshd Tainted: G      D    ----------------   2.6.32-131.0.15.el6.x86_64 #1
Call Trace:
 [<ffffffff814dac28>] ? panic+0x78/0x143
 [<ffffffff814dec82>] ? oops_end+0xf2/0x100
 [<ffffffff8100f2fb>] ? die+0x5b/0x90
 [<ffffffff814de544>] ? do_trap+0xc4/0x160
 [<ffffffff8100ceb5>] ? do_invalid_op+0x95/0xb0
 [<ffffffff8115a1d4>] ? cache_alloc_refill+0x1e4/0x240
 [<ffffffff8100bf5b>] ? invalid_op+0x1b/0x20
 [<ffffffff8115a1d4>] ? cache_alloc_refill+0x1e4/0x240
 [<ffffffff8115a14b>] ? cache_alloc_refill+0x15b/0x240
 [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20
 [<ffffffff8115ac6f>] ? kmem_cache_alloc+0x15f/0x190
 [<ffffffff81299949>] ? alloc_iova_mem+0x49/0x60
 [<ffffffff81296e07>] ? alloc_iova+0x27/0x240
 [<ffffffff81298d85>] ? intel_alloc_iova+0xb5/0xe0
 [<ffffffff8129ba0e>] ? __intel_map_single+0xbe/0x210
 [<ffffffff8129bba1>] ? intel_map_page+0x41/0x50
 [<ffffffffa0218f87>] ? e1000_xmit_frame+0xa37/0xf30 [e1000e]
 [<ffffffff8141e758>] ? dev_hard_start_xmit+0x2c8/0x3f0
 [<ffffffff81439d0a>] ? sch_direct_xmit+0x15a/0x1c0
 [<ffffffff81423098>] ? dev_queue_xmit+0x388/0x4d0
 [<ffffffffa044a400>] ? br_dev_queue_push_xmit+0x0/0xa0 [bridge]
 [<ffffffffa044a46c>] ? br_dev_queue_push_xmit+0x6c/0xa0 [bridge]
 [<ffffffffa044a4f8>] ? br_forward_finish+0x58/0x60 [bridge]
 [<ffffffffa044a690>] ? __br_deliver+0x60/0x70 [bridge]
 [<ffffffff814b256c>] ? packet_rcv+0x5c/0x440
 [<ffffffffa044a6d5>] ? br_deliver+0x35/0x40 [bridge]
 [<ffffffffa044944c>] ? br_dev_xmit+0xbc/0x100 [bridge]
 [<ffffffff8141e758>] ? dev_hard_start_xmit+0x2c8/0x3f0
 [<ffffffff814230e6>] ? dev_queue_xmit+0x3d6/0x4d0
 [<ffffffff814583bc>] ? ip_finish_output+0x13c/0x310
 [<ffffffff81458648>] ? ip_output+0xb8/0xc0
 [<ffffffff8145790f>] ? __ip_local_out+0x9f/0xb0
 [<ffffffff81457945>] ? ip_local_out+0x25/0x30
 [<ffffffff81457e20>] ? ip_queue_xmit+0x190/0x420
 [<ffffffff8146cc71>] ? tcp_transmit_skb+0x3f1/0x790
 [<ffffffff8146efe7>] ? tcp_write_xmit+0x1e7/0x9e0
 [<ffffffff8146f970>] ? __tcp_push_pending_frames+0x30/0xe0
 [<ffffffff8145ef4e>] ? tcp_push+0x6e/0x90
 [<ffffffff8145ff58>] ? tcp_sendmsg+0x668/0xa30
 [<ffffffff8140e601>] ? sock_aio_write+0x151/0x160
 [<ffffffff8117241a>] ? do_sync_write+0xfa/0x140
 [<ffffffff8108e160>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81211cff>] ? selinux_file_permission+0xbf/0x150
 [<ffffffff812051a6>] ? security_file_permission+0x16/0x20
 [<ffffffff811727e4>] ? vfs_write+0x184/0x1a0
 [<ffffffff810d1b62>] ? audit_syscall_entry+0x272/0x2a0
 [<ffffffff81173151>] ? sys_write+0x51/0x90
 [<ffffffff8100b172>] ? system_call_fastpath+0x16/0x1b
panic occurred, switching back to text console


---------------------------------------

Verified on same host using test cases from bugs 706001 & 706004
# uname -r
2.6.32-131.8.1.el6.x86_64

1. After detach the USB controller, host works well. I have tested for 3 times, no panic occurs.
# virsh nodedev-dettach pci_0000_00_1d_0
ehci_hcd 0000:00:1d.0: remove, state 4
usb usb2: USB disconnect, address 1
usb 2-1: USB disconnect, address 2
ehci_hcd 0000:00:1d.0: USB bus 2 deregistered
ehci_hcd 0000:00:1d.0: PCI INT A disabled
pci-stub 0000:00:1d.0: claimed by stub
Device pci_0000_00_1d_0 dettached
# virsh nodedev-reattach  pci_0000_00_1d_0
ehci_hcd 0000:00:1d.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
ehci_hcd 0000:00:1d.0: EHCI Host Controller
ehci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 2
ehci_hcd 0000:00:1d.0: debug port 2
ehci_hcd 0000:00:1d.0: irq 17, io mem 0xd8b50000
ehci_hcd 0000:00:1d.0: USB 2.0 started, EHCI 1.00
usb usb2: New USB device found, idVendor=1d6b, idProduct=0002
usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb2: Product: EHCI Host Controller
usb usb2: Manufacturer: Linux 2.6.32-131.8.1.el6.x86_64 ehci_hcd
usb usb2: SerialNumber: 0000:00:1d.0
usb usb2: configuration #1 chosen from 1 choice
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 3 ports detected
Device pci_0000_00_1d_0 re-attached
2. Repeatedly assign/de-assign 82574 over 600 times, neither kernel panic nor DMAR errors occurs.
# cat attach-detach-in-loop 
#!/bin/bash

for i in $(seq 1000)
do
echo "the $i times: " 
virsh attach-device 1 attach-detach-82574.xml
sleep 8
virsh detach-device 1 attach-detach-82574.xml 
sleep 8
done
# cat attach-detach-82574.xml 
<hostdev mode='subsystem' type='pci' managed='yes'>
     <source>
          <address bus='0x03' slot='0x00' function='0x00'/>
     </source>
</hostdev>



Based on above, this issue has been fixed. Moving to VERIFIED.

Comment 7 Martin Prpič 2011-07-12 11:39:48 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
A previously introduced update intended to prevent IOMMU (I/O Memory Management Unit) domain exhaustion introduced two regressions. The first regression was a race where a domain pointer could be freed while a lazy flush algorithm still had a reference to it, eventually causing kernel panic. The second regression was an erroneous reference removal for identity mapped and VM IOMMU domains, causing I/O errors. Both of these regressions could only be triggered on Intel based platforms, supporting VT-d, booted with the intel_iommu=on boot option. With this update, the underlying source code of the intel-iommu driver has been modified to resolve both of these problems. A forced flush is now used to avoid the lazy use after free issue, and extra checks have been added to avoid the erroneous reference removal.

Comment 8 errata-xmlrpc 2011-07-12 21:13:18 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0928.html


Note You need to log in before you can comment on or make changes to this bug.