Bug 690395

Summary: kernel: BUG: soft lockup - CPU#7 stuck for 10s! [ls:14944]
Product: Red Hat Enterprise Linux 5 Reporter: dushy2010
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 5.8   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-26 15:50:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description dushy2010 2011-03-24 08:22:51 UTC
Description of problem:

Our customer is facing soft lockups from sometime now. They are running (RHEL 5.1)2.6.18-128 + patches through 2.6.18-194 kernel on x86_64 arch servers.

Here's the stack trace:
Mar 18 17:17:54 cu0login3 kernel: BUG: soft lockup - CPU#7 stuck for 10s! [ls:14944]
Mar 18 17:17:54 cu0login3 kernel: CPU 7:
Mar 18 17:17:54 cu0login3 kernel: Modules linked in: ecount(U) blcr(U) blcr_imports(U) eeprom(U) openafs(PU) i
pmi_devintf(U) ipmi_si(U) ipmi_msghandler(U) fuse(U) mgc(U) lustre(U) lov(U) mdc(U) osc(U) lquota(U) ko2iblnd(
U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) rdma_ucm(U) ib_ucm(U) ib_sdp(U) rdma_cm(U) iw_c
m(U) ib_addr(U) ib_uverbs(U) ib_umad(U) mlx4_ib(U) ib_mthca(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) ib_sa(U) i
b_mad(U) ib_core(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) ext3(U) jbd(U) dm_mirror(U) dm_log(U) dm_multipath(U)
scsi_dh(U) dm_mod(U) video(U) hwmon(U) backlight(U) sbs(U) i2c_ec(U) button(U) battery(U) asus_acpi(U) acpi_me
mhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) nx_nic(U) shpchp(U) mlx4_core(U) i2c_piix4(U) ehci_hcd(U) ser
io_raw(U) i2c_core(U) ohci_hcd(U) pcspkr(U) nfs(U) nfs_acl(U) fscache(U) lockd(U) sunrpc(U) e1000e(U) bnx2(U)
scsi_transport_fc(U) aacraid(U) sata_nv(U) mptscsih(U) mptbase(U) ata_piix(U) usb_storage(U) sata_svw(U) libat
a(U) cciss(U) sd_mod(U) scsi_mod(U) tg3(U) libphy(U)
Mar 18 17:17:54 cu0login3 kernel: Pid: 14944, comm: ls Tainted: P      2.6.18-128.1.14.el5.8hp.2sp #1
Mar 18 17:17:54 cu0login3 kernel: RIP: 0010:[<ffffffff80065cef>]  [<ffffffff80065cef>] .text.lock.spinlock+0x5
/0x30
Mar 18 17:17:54 cu0login3 kernel: RSP: 0018:ffff8101103fdbc0  EFLAGS: 00000282
Mar 18 17:17:54 cu0login3 kernel: RAX: 0000000000000000 RBX: ffff81016f7bdc48 RCX: 000000000000cdfb
Mar 18 17:17:54 cu0login3 kernel: RDX: ffff810160d3c960 RSI: 000000000000cdfb RDI: ffffffff803da380
Mar 18 17:17:54 cu0login3 kernel: RBP: ffff810213991800 R08: ffff810211e7cc48 R09: 0000000000000282
Mar 18 17:17:54 cu0login3 kernel: R10: 00000000deadbeef R11: 0000000000000088 R12: ffff81016f7bdc48
Mar 18 17:17:54 cu0login3 kernel: R13: ffff8101880cc7c0 R14: ffff8101880cc7c0 R15: 0000000000000000
Mar 18 17:17:54 cu0login3 kernel: FS:  00002b58a737ef40(0000) GS:ffff81023f2f13c0(0000) knlGS:00000000557466c0
Mar 18 17:17:54 cu0login3 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Mar 18 17:17:54 cu0login3 kernel: CR2: 00002b58a6c9d3c0 CR3: 0000000198d24000 CR4: 00000000000006e0
Mar 18 17:17:54 cu0login3 kernel:
Mar 18 17:17:54 cu0login3 kernel: Call Trace:
Mar 18 17:17:54 cu0login3 kernel: [<ffffffff8002ecaf>] prune_dcache+0xe0/0x149
Mar 18 17:17:54 cu0login3 kernel: [<ffffffff8004d960>] shrink_dcache_parent+0x1c/0xe1
Mar 18 17:17:54 cu0login3 kernel: [<ffffffff800e8188>] d_invalidate+0x36/0xc4
Mar 18 17:17:54 cu0login3 kernel: [<ffffffff8000d40b>] do_lookup+0x1a0/0x1e6
Mar 18 17:17:54 cu0login3 kernel: [<ffffffff8000d52a>] file_read_actor+0x25/0x154
Mar 18 17:17:54 cu0login3 kernel: [<ffffffff8000a6d3>] __link_path_walk+0xa01/0xf42
Mar 18 17:17:54 cu0login3 kernel: [<ffffffff8000ef13>] link_path_walk+0x5c/0xe5
Mar 18 17:17:54 cu0login3 kernel: [<ffffffff8000d0ef>] do_path_lookup+0x270/0x2e8
Mar 18 17:17:54 cu0login3 kernel: [<ffffffff80023a64>] __path_lookup_intent_open+0x56/0x97
Mar 18 17:17:54 cu0login3 kernel: [<ffffffff8001b3d6>] open_namei+0x73/0x6d5
Mar 18 17:17:54 cu0login3 kernel: [<ffffffff80067c10>] do_page_fault+0x4fe/0x830
Mar 18 17:17:54 cu0login3 kernel: [<ffffffff800278eb>] do_filp_open+0x1c/0x38
Mar 18 17:17:54 cu0login3 kernel: [<ffffffff8001a19b>] do_sys_open+0x44/0xbe
Mar 18 17:17:54 cu0login3 kernel: [<ffffffff8005e28d>] tracesys+0xd5/0xe0


How reproducible:
Not reproduced

Additional info: 
Is there a patch which can solve this?

Comment 1 Jes Sorensen 2013-02-26 15:50:12 UTC
There is no information in this bug about the system configuration, workload,
etc.

If you see something like this again, please contact your Red Hat Technical
Support representative.