705441 – intel-iommu: missing flush prior to removing domains + avoid broken vm/si domain unlinking

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 705441 - intel-iommu: missing flush prior to removing domains + avoid broken vm/si domain unlinking

Summary: intel-iommu: missing flush prior to removing domains + avoid broken vm/si dom...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Alex Williamson
QA Contact:	WANG Chao
Docs Contact:
URL:
Whiteboard:
Duplicates (4):	706001 706004 710382 730604 (view as bug list)
Depends On:
Blocks:	705455 713458
TreeView+	depends on / blocked

Reported:	2011-05-17 17:17 UTC by Alex Williamson
Modified:	2018-11-14 11:04 UTC (History)
CC List:	15 users (show)
Fixed In Version:	kernel-2.6.32-160.el6
Doc Type:	Bug Fix
Doc Text:	A previously introduced update intended to prevent IOMMU (I/O Memory Management Unit) domain exhaustion introduced two regressions. The first regression was a race where a domain pointer could be freed while a lazy flush algorithm still had a reference to it, eventually causing kernel panic. The second regression was an erroneous reference removal for identity mapped and VM IOMMU domains, causing I/O errors. Both of these regressions could only be triggered on Intel based platforms, supporting VT-d, booted with the intel_iommu=on boot option. With this update, the underlying source code of the intel-iommu driver has been modified to resolve both of these problems. A forced flush is now used to avoid the lazy use after free issue, and extra checks have been added to avoid the erroneous reference removal.
Clone Of:
Clones:	705455 (view as bug list)
Environment:
Last Closed:	2011-12-06 13:31:52 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2011:1530	0	normal	SHIPPED_LIVE	Moderate: Red Hat Enterprise Linux 6 kernel security, bug fix and enhancement update	2011-12-06 01:45:35 UTC

Description Alex Williamson 2011-05-17 17:17:36 UTC

Description of problem:
bz619455 exposed a race between destroying an iommu domain and preforming a lazy flush of DMA unmaps.  This has been shown to potentially cause a kernel oops when devices are unbound from their drivers, such as during a hot unplug.

Version-Release number of selected component (if applicable):
kernel-2.6.32-128.el6

How reproducible:
Unknown, some devices seem to trigger it, others don't.  The problem was not seen by QA during 6.1 testing, which specifically targeted exercising this path for kvm device assignment.

Steps to Reproduce:
1. unbind device from driver on system with VT-d enabled
2.
3.
  
Actual results:
Potential kernel oops from delayed dma unmap flushing after domain is destroyed.

Expected results:
No issues.

Additional info:
Upstream thread - https://lists.linux-foundation.org/pipermail/iommu/2011-May/002617.html

Comment 2 RHEL Program Management 2011-05-17 17:59:34 UTC

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 3 Alex Williamson 2011-05-17 18:10:39 UTC

It looks like the race can be avoided by using the intel_iommu=strict flag, but that also imposes a performance penalty in not batching up unmaps.

Comment 5 Alex Williamson 2011-06-02 01:14:27 UTC

*** Bug 706001 has been marked as a duplicate of this bug. ***

Comment 6 Alex Williamson 2011-06-02 01:14:41 UTC

*** Bug 706004 has been marked as a duplicate of this bug. ***

Comment 7 Alex Williamson 2011-06-02 01:17:54 UTC

QA: please use the test cases from the above bugs (706001 & 706004) when validating this.  706001 is most readily hit by assigning and un-assigning an e1000e NIC to a VM.  706004 is easily reproduced by unbinding an re-binding a device from the snd-hda-intel driver.  Both of these require a VT-d capable system booted with intel_iommu=on.

Comment 8 Alex Williamson 2011-06-03 15:04:07 UTC

*** Bug 710382 has been marked as a duplicate of this bug. ***

Comment 9 Alex Williamson 2011-06-03 15:07:20 UTC

Brew build of posted fix: https://brewweb.devel.redhat.com/taskinfo?taskID=3361081

Comment 12 Aristeu Rozanski 2011-06-27 19:04:07 UTC

Patch(es) available on kernel-2.6.32-160.el6

Comment 14 Martin Prpič 2011-07-12 11:40:05 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
A previously introduced update intended to prevent IOMMU (I/O Memory Management Unit) domain exhaustion introduced two regressions. The first regression was a race where a domain pointer could be freed while a lazy flush algorithm still had a reference to it, eventually causing kernel panic. The second regression was an erroneous reference removal for identity mapped and VM IOMMU domains, causing I/O errors. Both of these regressions could only be triggered on Intel based platforms, supporting VT-d, booted with the intel_iommu=on boot option. With this update, the underlying source code of the intel-iommu driver has been modified to resolve both of these problems. A forced flush is now used to avoid the lazy use after free issue, and extra checks have been added to avoid the erroneous reference removal.

Comment 17 Alex Williamson 2011-10-25 14:58:48 UTC

*** Bug 730604 has been marked as a duplicate of this bug. ***

Comment 18 errata-xmlrpc 2011-12-06 13:31:52 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2011-1530.html

Comment 19 Wojciech 2012-08-17 15:55:04 UTC

I am seeing occasional kernel oopses on our DELL C6220's which seem to be related to this bug

------------[ cut here ]------------
kernel BUG at mm/slab.c:3067!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/uevent
CPU 5
Modules linked in: fuse lmv(U) mgc(U) lustre(U) lov(U) osc(U) lquota(U) mdc(U) fid(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) ip6table_filter act_police cls_basic cls_flow cls_fw cls_u32 sch_tbf sch_prio sch_htb sch_hfsc sch_ingress sch_sfq bridge stp llc xt_statistic xt_time xt_connlimit xt_realm iptable_raw xt_comment xt_recent xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype xt_set ip_set nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_udplite nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_tproxy_core ip6_tables nf_defrag_ipv6 xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY xt_AUDIT ipt_LOG xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables nfsd exportfs acpi_cpufreq freq_table mperf rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ib_cm(U) ib_sa(U) ib_uverbs(U) ib_umad(U) mlx4_ib(U) ib_mad(U) ib_core(U) mlx4_core(U) ext4 mbcache jbd2 dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun kvm wmi microcode dcdbas sg sb_edac edac_core i2c_i801 i2c_core shpchp iTCO_wdt iTCO_vendor_support ioatdma ipv6 sd_mod crc_t10dif ahci igb dca nfs lockd fscache nfs_acl auth_rpcgss sunrpc [last unloaded: scsi_wait_scan]

Pid: 68226, comm: polkit-gnome-au Tainted: G    B   W  ----------------   2.6.32-220.23.1.el6.x86_64 #1 Dell Inc. PowerEdge C6220/0WTH3T
RIP: 0010:[<ffffffff8115e9f4>]  [<ffffffff8115e9f4>] cache_alloc_refill+0x1e4/0x240
RSP: 0018:ffff881789145c90  EFLAGS: 00010046
RAX: 000000000000001c RBX: ffff88207fc50080 RCX: 000000000000005c
RDX: ffff8808f7775000 RSI: ffff88107febaf40 RDI: ffff880dbd3f1000
RBP: ffff881789145cf0 R08: ffff8808f7775000 R09: 0000000000000000
R10: 00000035a6003000 R11: 0000000000000000 R12: ffff88105922c400
R13: ffff88107febaf40 R14: 000000000000001c R15: ffff880dbd3f1000
FS:  00007eff8508f8e0(0000) GS:ffff8800606a0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007eff7bc05000 CR3: 0000001fffa51000 CR4: 00000000000406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process polkit-gnome-au (pid: 68226, threadinfo ffff881789144000, task ffff881b0f2d2ac0)
Stack:
 00000035a6002000 0000000046da1980 ffff88107febaf80 000412d081138cb6
<0> ffff88107febaf60 ffff88107febaf50 ffff881789145d00 0000000000000000
<0> 00000000000000d0 ffff88207fc50080 00000000000000d0 0000000000000246
Call Trace:
 [<ffffffff8115f90f>] kmem_cache_alloc+0x15f/0x190
 [<ffffffff811450a1>] anon_vma_fork+0x61/0xd0
 [<ffffffff8106758e>] dup_mm+0x23e/0x520
 [<ffffffff810685bc>] copy_process+0xcdc/0x13b0
 [<ffffffff8115f932>] ? kmem_cache_alloc+0x182/0x190
 [<ffffffff81068d24>] do_fork+0x94/0x480
 [<ffffffff81193a72>] ? alloc_fd+0x92/0x160
 [<ffffffff81173dc7>] ? fd_install+0x47/0x90
 [<ffffffff8118113f>] ? do_pipe_flags+0xcf/0x130
 [<ffffffff81009598>] sys_clone+0x28/0x30
 [<ffffffff8100b413>] stub_clone+0x13/0x20
 [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
Code: 89 ff e8 00 d5 11 00 eb 99 66 0f 1f 44 00 00 41 c7 45 60 01 00 00 00 4d 8b 7d 20 4c 39 7d c0 0f 85 f2 fe ff ff eb 84 0f 0b eb fe <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 eb f4 8b 55 ac 8b 75 bc 31
RIP  [<ffffffff8115e9f4>] cache_alloc_refill+0x1e4/0x240
 RSP <ffff881789145c90>
---[ end trace 552278685cb3ea33 ]---
Kernel panic - not syncing: Fatal exception
Pid: 68226, comm: polkit-gnome-au Tainted: G    B D W  ----------------   2.6.32-220.23.1.el6.x86_64 #1
Call Trace:
 [<ffffffff814ecb34>] ? panic+0x78/0x143
 [<ffffffff814f0cd4>] ? oops_end+0xe4/0x100
 [<ffffffff8100f26b>] ? die+0x5b/0x90
 [<ffffffff814f05a4>] ? do_trap+0xc4/0x160
 [<ffffffff8100ce35>] ? do_invalid_op+0x95/0xb0
 [<ffffffff8115e9f4>] ? cache_alloc_refill+0x1e4/0x240
 [<ffffffff8114d688>] ? __swap_duplicate+0x68/0x180
 [<ffffffff8114d805>] ? swap_duplicate+0x25/0x60
 [<ffffffff8100bedb>] ? invalid_op+0x1b/0x20
 [<ffffffff8115e9f4>] ? cache_alloc_refill+0x1e4/0x240
 [<ffffffff8115f90f>] ? kmem_cache_alloc+0x15f/0x190
 [<ffffffff811450a1>] ? anon_vma_fork+0x61/0xd0
 [<ffffffff8106758e>] ? dup_mm+0x23e/0x520
 [<ffffffff810685bc>] ? copy_process+0xcdc/0x13b0
 [<ffffffff8115f932>] ? kmem_cache_alloc+0x182/0x190
 [<ffffffff81068d24>] ? do_fork+0x94/0x480
 [<ffffffff81193a72>] ? alloc_fd+0x92/0x160
 [<ffffffff81173dc7>] ? fd_install+0x47/0x90
 [<ffffffff8118113f>] ? do_pipe_flags+0xcf/0x130
 [<ffffffff81009598>] ? sys_clone+0x28/0x30
 [<ffffffff8100b413>] ? stub_clone+0x13/0x20
 [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
------------[ cut here ]------------
WARNING: at arch/x86/kernel/smp.c:117 native_smp_send_reschedule+0x5c/0x60() (Tainted: G    B D W  ----------------  )
Hardware name: PowerEdge C6220
Modules linked in: fuse lmv(U) mgc(U) lustre(U) lov(U) osc(U) lquota(U) mdc(U) fid(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) ip6table_filter act_police cls_basic cls_flow cls_fw cls_u32 sch_tbf sch_prio sch_htb sch_hfsc sch_ingress sch_sfq bridge stp llc xt_statistic xt_time xt_connlimit xt_realm iptable_raw xt_comment xt_recent xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype xt_set ip_set nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_udplite nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_tproxy_core ip6_tables nf_defrag_ipv6 xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY xt_AUDIT ipt_LOG xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables nfsd exportfs acpi_cpufreq freq_table mperf rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ib_cm(U) ib_sa(U) ib_uverbs(U) ib_umad(U) mlx4_ib(U) ib_mad(U) ib_core(U) mlx4_core(U) ext4 mbcache jbd2 dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun kvm wmi microcode dcdbas sg sb_edac edac_core i2c_i801 i2c_core shpchp iTCO_wdt iTCO_vendor_support ioatdma ipv6 sd_mod crc_t10dif ahci igb dca nfs lockd fscache nfs_acl auth_rpcgss sunrpc [last unloaded: scsi_wait_scan]
Pid: 68226, comm: polkit-gnome-au Tainted: G    B D W  ----------------   2.6.32-220.23.1.el6.x86_64 #1
Call Trace:
 <IRQ>  [<ffffffff81069c97>] ? warn_slowpath_common+0x87/0xc0
 [<ffffffff81069cea>] ? warn_slowpath_null+0x1a/0x20
 [<ffffffff8102a36c>] ? native_smp_send_reschedule+0x5c/0x60
 [<ffffffff8104d9c8>] ? resched_task+0x68/0x80
 [<ffffffff81053190>] ? check_preempt_wakeup+0x1c0/0x260
 [<ffffffff8106196b>] ? enqueue_task_fair+0xfb/0x100
 [<ffffffff8104da7c>] ? check_preempt_curr+0x7c/0x90
 [<ffffffff8105e863>] ? try_to_wake_up+0x213/0x3e0
 [<ffffffff81094e90>] ? hrtimer_wakeup+0x0/0x30
 [<ffffffff8105ea85>] ? wake_up_process+0x15/0x20
 [<ffffffff81094eb2>] ? hrtimer_wakeup+0x22/0x30
 [<ffffffff810954ae>] ? __run_hrtimer+0x8e/0x1a0
 [<ffffffff81012b59>] ? read_tsc+0x9/0x20
 [<ffffffff81095856>] ? hrtimer_interrupt+0xe6/0x250
 [<ffffffff814f55fb>] ? smp_apic_timer_interrupt+0x6b/0x9b
 [<ffffffff8100bc13>] ? apic_timer_interrupt+0x13/0x20
 <EOI>  [<ffffffff814ecbdc>] ? panic+0x120/0x143
 [<ffffffff814ecb69>] ? panic+0xad/0x143
 [<ffffffff814f0cd4>] ? oops_end+0xe4/0x100
 [<ffffffff8100f26b>] ? die+0x5b/0x90
 [<ffffffff814f05a4>] ? do_trap+0xc4/0x160
 [<ffffffff8100ce35>] ? do_invalid_op+0x95/0xb0
 [<ffffffff8115e9f4>] ? cache_alloc_refill+0x1e4/0x240
 [<ffffffff8114d688>] ? __swap_duplicate+0x68/0x180
 [<ffffffff8114d805>] ? swap_duplicate+0x25/0x60
 [<ffffffff8100bedb>] ? invalid_op+0x1b/0x20
 [<ffffffff8115e9f4>] ? cache_alloc_refill+0x1e4/0x240
 [<ffffffff8115f90f>] ? kmem_cache_alloc+0x15f/0x190
 [<ffffffff811450a1>] ? anon_vma_fork+0x61/0xd0
 [<ffffffff8106758e>] ? dup_mm+0x23e/0x520
 [<ffffffff810685bc>] ? copy_process+0xcdc/0x13b0
 [<ffffffff8115f932>] ? kmem_cache_alloc+0x182/0x190
 [<ffffffff81068d24>] ? do_fork+0x94/0x480
 [<ffffffff81193a72>] ? alloc_fd+0x92/0x160
 [<ffffffff81173dc7>] ? fd_install+0x47/0x90
 [<ffffffff8118113f>] ? do_pipe_flags+0xcf/0x130
 [<ffffffff81009598>] ? sys_clone+0x28/0x30
 [<ffffffff8100b413>] ? stub_clone+0x13/0x20
 [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
---[ end trace 552278685cb3ea34 ]---
Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 3
Pid: 68225, comm: polkit-gnome-au Tainted: G    B D W  ----------------   2.6.32-220.23.1.el6.x86_64 #1
Call Trace:
 <NMI>  [<ffffffff814ecb34>] ? panic+0x78/0x143
 [<ffffffff810d91fd>] ? watchdog_overflow_callback+0xcd/0xd0
 [<ffffffff8110aaed>] ? __perf_event_overflow+0x9d/0x230
 [<ffffffff8110b0a4>] ? perf_event_overflow+0x14/0x20
 [<ffffffff8101e216>] ? intel_pmu_handle_irq+0x336/0x550
 [<ffffffff814f2716>] ? kprobe_exceptions_notify+0x16/0x430
 [<ffffffff814f11f9>] ? perf_event_nmi_handler+0x39/0xb0
 [<ffffffff814f2d45>] ? notifier_call_chain+0x55/0x80
 [<ffffffff814f2daa>] ? atomic_notifier_call_chain+0x1a/0x20
 [<ffffffff81096d0e>] ? notify_die+0x2e/0x30
 [<ffffffff814f09c3>] ? do_nmi+0x173/0x2b0
 [<ffffffff814f02d0>] ? nmi+0x20/0x30
 [<ffffffff814efb3e>] ? _spin_lock+0x1e/0x30
 <<EOE>>  [<ffffffff8115e89a>] ? cache_alloc_refill+0x8a/0x240
 [<ffffffff8115e96b>] ? cache_alloc_refill+0x15b/0x240
 [<ffffffff8115f90f>] ? kmem_cache_alloc+0x15f/0x190
 [<ffffffff811450a1>] ? anon_vma_fork+0x61/0xd0
 [<ffffffff8106758e>] ? dup_mm+0x23e/0x520
 [<ffffffff810685bc>] ? copy_process+0xcdc/0x13b0
 [<ffffffff8115f932>] ? kmem_cache_alloc+0x182/0x190
 [<ffffffff81068d24>] ? do_fork+0x94/0x480
 [<ffffffff81193a72>] ? alloc_fd+0x92/0x160
 [<ffffffff81173dc7>] ? fd_install+0x47/0x90
 [<ffffffff8118113f>] ? do_pipe_flags+0xcf/0x130
 [<ffffffff81009598>] ? sys_clone+0x28/0x30
 [<ffffffff8100b413>] ? stub_clone+0x13/0x20
 [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 7
Pid: 68685, comm: polkit-gnome-au Tainted: G    B D W  ----------------   2.6.32-220.23.1.el6.x86_64 #1
Call Trace:
 <NMI>  [<ffffffff814ecb34>] ? panic+0x78/0x143
 [<ffffffff810d91fd>] ? watchdog_overflow_callback+0xcd/0xd0
 [<ffffffff8110aaed>] ? __perf_event_overflow+0x9d/0x230
 [<ffffffff8110b0a4>] ? perf_event_overflow+0x14/0x20
 [<ffffffff8101e216>] ? intel_pmu_handle_irq+0x336/0x550
 [<ffffffff814f2716>] ? kprobe_exceptions_notify+0x16/0x430
 [<ffffffff814f11f9>] ? perf_event_nmi_handler+0x39/0xb0
 [<ffffffff814f2d45>] ? notifier_call_chain+0x55/0x80
 [<ffffffff814f2daa>] ? atomic_notifier_call_chain+0x1a/0x20
 [<ffffffff81096d0e>] ? notify_die+0x2e/0x30
 [<ffffffff814f09c3>] ? do_nmi+0x173/0x2b0
 [<ffffffff814f02d0>] ? nmi+0x20/0x30
 [<ffffffff814efb3e>] ? _spin_lock+0x1e/0x30
 <<EOE>>  [<ffffffff8115e89a>] ? cache_alloc_refill+0x8a/0x240
 [<ffffffff8115f90f>] ? kmem_cache_alloc+0x15f/0x190
 [<ffffffff81145232>] ? anon_vma_prepare+0x122/0x160
 [<ffffffff8113c138>] ? handle_pte_fault+0x748/0xb50
 [<ffffffff81051ba3>] ? __wake_up+0x53/0x70
 [<ffffffff8113c724>] ? handle_mm_fault+0x1e4/0x2b0
 [<ffffffff81042c29>] ? __do_page_fault+0x139/0x480
 [<ffffffffa0009890>] ? rpc_free_task+0x50/0x80 [sunrpc]
 [<ffffffffa0009915>] ? rpc_final_put_task+0x55/0x60 [sunrpc]
 [<ffffffffa0009950>] ? rpc_do_put_task+0x30/0x40 [sunrpc]
 [<ffffffffa0009990>] ? rpc_put_task+0x10/0x20 [sunrpc]
 [<ffffffffa0002df8>] ? rpc_call_sync+0x58/0x70 [sunrpc]
 [<ffffffff814f2c8e>] ? do_page_fault+0x3e/0xa0
 [<ffffffff814f0045>] ? page_fault+0x25/0x30
 [<ffffffff81110364>] ? file_read_actor+0x44/0x180
 [<ffffffff81112746>] ? generic_file_aio_read+0x2d6/0x700
 [<ffffffffa00a91da>] ? nfs_file_read+0xca/0x130 [nfs]
 [<ffffffff8117687a>] ? do_sync_read+0xfa/0x140
 [<ffffffff81090d30>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff811427ba>] ? do_mmap_pgoff+0x33a/0x380
 [<ffffffff8120c836>] ? security_file_permission+0x16/0x20
 [<ffffffff81177275>] ? vfs_read+0xb5/0x1a0
 [<ffffffff811773b1>] ? sys_read+0x51/0x90
 [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 6
Pid: 73, comm: events/6 Tainted: G    B D W  ----------------   2.6.32-220.23.1.el6.x86_64 #1
Call Trace:
<NMI>  [<ffffffff814ecb34>] ? panic+0x78/0x143
 [<ffffffff810d91fd>] ? watchdog_overflow_callback+0xcd/0xd0
 [<ffffffff8110aaed>] ? __perf_event_overflow+0x9d/0x230
 [<ffffffff8110b0a4>] ? perf_event_overflow+0x14/0x20
 [<ffffffff8101e216>] ? intel_pmu_handle_irq+0x336/0x550
 [<ffffffff814f2716>] ? kprobe_exceptions_notify+0x16/0x430
 [<ffffffff814f11f9>] ? perf_event_nmi_handler+0x39/0xb0
 [<ffffffff814f2d45>] ? notifier_call_chain+0x55/0x80
 [<ffffffff814f2daa>] ? atomic_notifier_call_chain+0x1a/0x20
 [<ffffffff81096d0e>] ? notify_die+0x2e/0x30
 [<ffffffff814f09c3>] ? do_nmi+0x173/0x2b0
 [<ffffffff814f02d0>] ? nmi+0x20/0x30
 [<ffffffff814efa63>] ? _spin_lock_irq+0x23/0x40
 <<EOE>>  [<ffffffff81160553>] ? drain_array+0x73/0x100
 [<ffffffff8116156e>] ? cache_reap+0x8e/0x260
 [<ffffffff811614e0>] ? cache_reap+0x0/0x260
 [<ffffffff8108b3f0>] ? worker_thread+0x170/0x2a0
 [<ffffffff81090d30>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8108b280>] ? worker_thread+0x0/0x2a0
 [<ffffffff810909c6>] ? kthread+0x96/0xa0
 [<ffffffff8100c14a>] ? child_rip+0xa/0x20
 [<ffffffff81090930>] ? kthread+0x0/0xa0
 [<ffffffff8100c140>] ? child_rip+0x0/0x20

Comment 20 Alex Williamson 2012-08-17 16:24:50 UTC

(In reply to comment #19)
> I am seeing occasional kernel oopses on our DELL C6220's which seem to be
> related to this bug

What makes you think it's related to this bug?  Note that this bug has been closed for 8 months, so you're really better off filing a new bug, possibly referencing this bug if you think it's related.  Thanks.

Comment 21 Jiri Olsa 2012-11-15 11:08:11 UTC

(In reply to comment #19)
> I am seeing occasional kernel oopses on our DELL C6220's which seem to be
> related to this bug
> 
> ------------[ cut here ]------------
> kernel BUG at mm/slab.c:3067!
> invalid opcode: 0000 [#1] SMP

Do you still see this issue, or even have a steps to reproduce?

thanks

Note You need to log in before you can comment on or make changes to this bug.