Bug 713458

Summary:	intel-iommu: missing flush prior to removing domains + avoid broken vm/si domain unlinking
Product:	Red Hat Enterprise Linux 6	Reporter:	RHEL Program Management <pm-rhel>
Component:	kernel	Assignee:	Frantisek Hrbata <fhrbata>
Status:	CLOSED ERRATA	QA Contact:	Red Hat Kernel QE team <kernel-qe>
Severity:	high	Docs Contact:
Priority:	urgent
Version:	6.1	CC:	alex.williamson, arozansk, chayang, chrisw, ddutile, dhoward, gcosta, jpirko, juzhang, jwest, kzhang, pm-eus, yang.z.zhang
Target Milestone:	rc	Keywords:	ZStream
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	kernel-2.6.32-131.6.1.el6	Doc Type:	Bug Fix
Doc Text:	A previously introduced update intended to prevent IOMMU (I/O Memory Management Unit) domain exhaustion introduced two regressions. The first regression was a race where a domain pointer could be freed while a lazy flush algorithm still had a reference to it, eventually causing kernel panic. The second regression was an erroneous reference removal for identity mapped and VM IOMMU domains, causing I/O errors. Both of these regressions could only be triggered on Intel based platforms, supporting VT-d, booted with the intel_iommu=on boot option. With this update, the underlying source code of the intel-iommu driver has been modified to resolve both of these problems. A forced flush is now used to avoid the lazy use after free issue, and extra checks have been added to avoid the erroneous reference removal.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-07-12 21:13:18 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	705441
Bug Blocks:

Description RHEL Program Management 2011-06-15 13:17:46 UTC

This bug has been copied from bug #705441 and has been proposed
to be backported to 6.1 z-stream (EUS).

Comment 6 Chao Yang 2011-07-01 07:22:38 UTC

I have reproduced this issue on a sandy bridge host using test case from bz706004. After detach the USB controller, host kernel panic.

Host info:
# uname -r
2.6.32-131.0.15.el6.x86_64
# cat /proc/cpuinfo 
...
processor	: 7
vendor_id	: GenuineIntel
cpu family	: 6
model		: 42
model name	: Intel(R) Xeon(R) CPU E31280 @ 3.50GHz
...
# lspci | grep -i usb
00:1a.0 USB Controller: Intel Corporation 6 Series Chipset Family USB Enhanced Host Controller #2 (rev 04)
00:1d.0 USB Controller: Intel Corporation 6 Series Chipset Family USB Enhanced Host Controller #1 (rev 04)
# virsh nodedev-list | grep pci
..
pci_0000_00_1d_0
...
# virsh nodedev-dettach pci_0000_00_1d_0
Device pci_0000_00_1d_0 dettached

------------[ cut here ]------------
kernel BUG at mm/slab.c:3067!
invalid opcode: 0000 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/cpu7/topology/thread_siblings
CPU 1 
Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables iptable_filter ipt_REJECT xt_CHECKSUM ip_tables sunrpc cpufreq_ondemand acpi_cpufreq freq_table bridge stp llc ipv6 dm_mirror dm_region_hash dm_log vhost_net macvtap mac]

Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables iptable_filter ipt_REJECT xt_CHECKSUM ip_tables sunrpc cpufreq_ondemand acpi_cpufreq freq_table bridge stp llc ipv6 dm_mirror dm_region_hash dm_log vhost_net macvtap mac]
Pid: 2501, comm: sshd Not tainted 2.6.32-131.0.15.el6.x86_64 #1 Precision T1600
RIP: 0010:[<ffffffff8115a1d4>]  [<ffffffff8115a1d4>] cache_alloc_refill+0x1e4/0x240
RSP: 0018:ffff8802224934a8  EFLAGS: 00010046
RAX: 0000000000000016 RBX: ffff8802205a1f00 RCX: 000000000000003b
RDX: ffff880220596000 RSI: ffff880224307240 RDI: ffff880220194000
RBP: ffff880222493508 R08: ffff880220596000 R09: 0000000000000000
R10: 00000000000fff18 R11: 0000000000000000 R12: ffff88022058d800
R13: ffff880224307240 R14: 0000000000000016 R15: ffff880220194000
FS:  00007f7f9801e7c0(0000) GS:ffff880028220000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc728130000 CR3: 00000002224ae000 CR4: 00000000000426e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process sshd (pid: 2501, threadinfo ffff880222492000, task ffff88021f8e4040)
Stack:
 ffff880222493518 000000001f8e4040 ffff880224307280 00041220205990e0
<0> ffff880224307260 ffff880224307250 ffffffff8100bc8e ffff88021f8e4040
<0> 0000000000000020 ffff8802205a1f00 0000000000000020 0000000000000246
Call Trace:
 [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20
 [<ffffffff8115ac6f>] kmem_cache_alloc+0x15f/0x190
 [<ffffffff81299949>] alloc_iova_mem+0x49/0x60
 [<ffffffff81296e07>] alloc_iova+0x27/0x240
 [<ffffffff81298d85>] intel_alloc_iova+0xb5/0xe0
 [<ffffffff8129ba0e>] __intel_map_single+0xbe/0x210
 [<ffffffff8129bba1>] intel_map_page+0x41/0x50
 [<ffffffffa0218f87>] e1000_xmit_frame+0xa37/0xf30 [e1000e]
 [<ffffffff8141e758>] dev_hard_start_xmit+0x2c8/0x3f0
 [<ffffffff81439d0a>] sch_direct_xmit+0x15a/0x1c0
 [<ffffffff81423098>] dev_queue_xmit+0x388/0x4d0
 [<ffffffffa044a400>] ? br_dev_queue_push_xmit+0x0/0xa0 [bridge]
 [<ffffffffa044a46c>] br_dev_queue_push_xmit+0x6c/0xa0 [bridge]
 [<ffffffffa044a4f8>] br_forward_finish+0x58/0x60 [bridge]
 [<ffffffffa044a690>] __br_deliver+0x60/0x70 [bridge]
 [<ffffffff814b256c>] ? packet_rcv+0x5c/0x440
 [<ffffffffa044a6d5>] br_deliver+0x35/0x40 [bridge]
 [<ffffffffa044944c>] br_dev_xmit+0xbc/0x100 [bridge]
 [<ffffffff8141e758>] dev_hard_start_xmit+0x2c8/0x3f0
 [<ffffffff814230e6>] dev_queue_xmit+0x3d6/0x4d0
 [<ffffffff814583bc>] ip_finish_output+0x13c/0x310
 [<ffffffff81458648>] ip_output+0xb8/0xc0
 [<ffffffff8145790f>] ? __ip_local_out+0x9f/0xb0
 [<ffffffff81457945>] ip_local_out+0x25/0x30
 [<ffffffff81457e20>] ip_queue_xmit+0x190/0x420
 [<ffffffff8146cc71>] tcp_transmit_skb+0x3f1/0x790
 [<ffffffff8146efe7>] tcp_write_xmit+0x1e7/0x9e0
 [<ffffffff8146f970>] __tcp_push_pending_frames+0x30/0xe0
 [<ffffffff8145ef4e>] tcp_push+0x6e/0x90
 [<ffffffff8145ff58>] tcp_sendmsg+0x668/0xa30
 [<ffffffff8140e601>] sock_aio_write+0x151/0x160
 [<ffffffff8117241a>] do_sync_write+0xfa/0x140
 [<ffffffff8108e160>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81211cff>] ? selinux_file_permission+0xbf/0x150
 [<ffffffff812051a6>] ? security_file_permission+0x16/0x20
 [<ffffffff811727e4>] vfs_write+0x184/0x1a0
 [<ffffffff810d1b62>] ? audit_syscall_entry+0x272/0x2a0
 [<ffffffff81173151>] sys_write+0x51/0x90
 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
Code: 89 ff e8 a0 8c 11 00 eb 99 66 0f 1f 44 00 00 41 c7 45 60 01 00 00 00 4d 8b 7d 20 4c 39 7d c0 0f 85 f2 fe ff ff eb 84 0f 0b eb fe <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 eb f4 8b 55 ac 8b 75 bc 31 
RIP  [<ffffffff8115a1d4>] cache_alloc_refill+0x1e4/0x240
 RSP <ffff8802224934a8>
---[ end trace 3da40b0b0e947786 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Pid: 2501, comm: sshd Tainted: G      D    ----------------   2.6.32-131.0.15.el6.x86_64 #1
Call Trace:
 [<ffffffff814dac28>] ? panic+0x78/0x143
 [<ffffffff814dec82>] ? oops_end+0xf2/0x100
 [<ffffffff8100f2fb>] ? die+0x5b/0x90
 [<ffffffff814de544>] ? do_trap+0xc4/0x160
 [<ffffffff8100ceb5>] ? do_invalid_op+0x95/0xb0
 [<ffffffff8115a1d4>] ? cache_alloc_refill+0x1e4/0x240
 [<ffffffff8100bf5b>] ? invalid_op+0x1b/0x20
 [<ffffffff8115a1d4>] ? cache_alloc_refill+0x1e4/0x240
 [<ffffffff8115a14b>] ? cache_alloc_refill+0x15b/0x240
 [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20
 [<ffffffff8115ac6f>] ? kmem_cache_alloc+0x15f/0x190
 [<ffffffff81299949>] ? alloc_iova_mem+0x49/0x60
 [<ffffffff81296e07>] ? alloc_iova+0x27/0x240
 [<ffffffff81298d85>] ? intel_alloc_iova+0xb5/0xe0
 [<ffffffff8129ba0e>] ? __intel_map_single+0xbe/0x210
 [<ffffffff8129bba1>] ? intel_map_page+0x41/0x50
 [<ffffffffa0218f87>] ? e1000_xmit_frame+0xa37/0xf30 [e1000e]
 [<ffffffff8141e758>] ? dev_hard_start_xmit+0x2c8/0x3f0
 [<ffffffff81439d0a>] ? sch_direct_xmit+0x15a/0x1c0
 [<ffffffff81423098>] ? dev_queue_xmit+0x388/0x4d0
 [<ffffffffa044a400>] ? br_dev_queue_push_xmit+0x0/0xa0 [bridge]
 [<ffffffffa044a46c>] ? br_dev_queue_push_xmit+0x6c/0xa0 [bridge]
 [<ffffffffa044a4f8>] ? br_forward_finish+0x58/0x60 [bridge]
 [<ffffffffa044a690>] ? __br_deliver+0x60/0x70 [bridge]
 [<ffffffff814b256c>] ? packet_rcv+0x5c/0x440
 [<ffffffffa044a6d5>] ? br_deliver+0x35/0x40 [bridge]
 [<ffffffffa044944c>] ? br_dev_xmit+0xbc/0x100 [bridge]
 [<ffffffff8141e758>] ? dev_hard_start_xmit+0x2c8/0x3f0
 [<ffffffff814230e6>] ? dev_queue_xmit+0x3d6/0x4d0
 [<ffffffff814583bc>] ? ip_finish_output+0x13c/0x310
 [<ffffffff81458648>] ? ip_output+0xb8/0xc0
 [<ffffffff8145790f>] ? __ip_local_out+0x9f/0xb0
 [<ffffffff81457945>] ? ip_local_out+0x25/0x30
 [<ffffffff81457e20>] ? ip_queue_xmit+0x190/0x420
 [<ffffffff8146cc71>] ? tcp_transmit_skb+0x3f1/0x790
 [<ffffffff8146efe7>] ? tcp_write_xmit+0x1e7/0x9e0
 [<ffffffff8146f970>] ? __tcp_push_pending_frames+0x30/0xe0
 [<ffffffff8145ef4e>] ? tcp_push+0x6e/0x90
 [<ffffffff8145ff58>] ? tcp_sendmsg+0x668/0xa30
 [<ffffffff8140e601>] ? sock_aio_write+0x151/0x160
 [<ffffffff8117241a>] ? do_sync_write+0xfa/0x140
 [<ffffffff8108e160>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81211cff>] ? selinux_file_permission+0xbf/0x150
 [<ffffffff812051a6>] ? security_file_permission+0x16/0x20
 [<ffffffff811727e4>] ? vfs_write+0x184/0x1a0
 [<ffffffff810d1b62>] ? audit_syscall_entry+0x272/0x2a0
 [<ffffffff81173151>] ? sys_write+0x51/0x90
 [<ffffffff8100b172>] ? system_call_fastpath+0x16/0x1b
panic occurred, switching back to text console


---------------------------------------

Verified on same host using test cases from bugs 706001 & 706004
# uname -r
2.6.32-131.8.1.el6.x86_64

1. After detach the USB controller, host works well. I have tested for 3 times, no panic occurs.
# virsh nodedev-dettach pci_0000_00_1d_0
ehci_hcd 0000:00:1d.0: remove, state 4
usb usb2: USB disconnect, address 1
usb 2-1: USB disconnect, address 2
ehci_hcd 0000:00:1d.0: USB bus 2 deregistered
ehci_hcd 0000:00:1d.0: PCI INT A disabled
pci-stub 0000:00:1d.0: claimed by stub
Device pci_0000_00_1d_0 dettached
# virsh nodedev-reattach  pci_0000_00_1d_0
ehci_hcd 0000:00:1d.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
ehci_hcd 0000:00:1d.0: EHCI Host Controller
ehci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 2
ehci_hcd 0000:00:1d.0: debug port 2
ehci_hcd 0000:00:1d.0: irq 17, io mem 0xd8b50000
ehci_hcd 0000:00:1d.0: USB 2.0 started, EHCI 1.00
usb usb2: New USB device found, idVendor=1d6b, idProduct=0002
usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb2: Product: EHCI Host Controller
usb usb2: Manufacturer: Linux 2.6.32-131.8.1.el6.x86_64 ehci_hcd
usb usb2: SerialNumber: 0000:00:1d.0
usb usb2: configuration #1 chosen from 1 choice
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 3 ports detected
Device pci_0000_00_1d_0 re-attached
2. Repeatedly assign/de-assign 82574 over 600 times, neither kernel panic nor DMAR errors occurs.
# cat attach-detach-in-loop 
#!/bin/bash

for i in $(seq 1000)
do
echo "the $i times: " 
virsh attach-device 1 attach-detach-82574.xml
sleep 8
virsh detach-device 1 attach-detach-82574.xml 
sleep 8
done
# cat attach-detach-82574.xml 
<hostdev mode='subsystem' type='pci' managed='yes'>
     <source>
          <address bus='0x03' slot='0x00' function='0x00'/>
     </source>
</hostdev>



Based on above, this issue has been fixed. Moving to VERIFIED.

Comment 7 Martin Prpič 2011-07-12 11:39:48 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
A previously introduced update intended to prevent IOMMU (I/O Memory Management Unit) domain exhaustion introduced two regressions. The first regression was a race where a domain pointer could be freed while a lazy flush algorithm still had a reference to it, eventually causing kernel panic. The second regression was an erroneous reference removal for identity mapped and VM IOMMU domains, causing I/O errors. Both of these regressions could only be triggered on Intel based platforms, supporting VT-d, booted with the intel_iommu=on boot option. With this update, the underlying source code of the intel-iommu driver has been modified to resolve both of these problems. A forced flush is now used to avoid the lazy use after free issue, and extra checks have been added to avoid the erroneous reference removal.

Comment 8 errata-xmlrpc 2011-07-12 21:13:18 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0928.html