Hide Forgot
Description of problem: During our internal tests of HP Ibrix file system and NFS, we found the Linux kernel lock contention for dcache_lock under low memory pressure. I can submit the system crashdump, if needed - let me know where you want me to upload it. The symptoms are: SYSTEM MAP: ./System.map-2.6.18-194.el5 DEBUG KERNEL: ./vmlinux (2.6.18-194.el5) DUMPFILE: ./vmcore CPUS: 16 DATE: Mon Aug 29 10:49:51 2011 UPTIME: 03:04:02 LOAD AVERAGE: 25.35, 17.47, 11.32 TASKS: 854 NODENAME: ib48-123 RELEASE: 2.6.18-194.el5 VERSION: #1 SMP Tue Mar 16 21:52:39 EDT 2010 MACHINE: x86_64 (2665 Mhz) MEMORY: 47.3 GB PANIC: "Kernel panic - not syncing: hung_task: blocked tasks" PID: 903 COMMAND: "khungtaskd" TASK: ffff810c1fb6d040 [THREAD_INFO: ffff81061f0fe000] CPU: 8 STATE: TASK_RUNNING (PANIC) INFO: task nfsd:11087 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. nfsd D ffffffff80151248 0 11087 1 11098 11094 (L-TLB) ffff810609359900 0000000000000046 ffff810bd1398bc0 ffff81061b420000 0000000000000000 000000000000000a ffff81060e4080c0 ffff810c1fc36820 00000a138f8ff7bc 00000000000157eb ffff81060e4082a8 0000000ebb4eb380 Call Trace: [<ffffffff800322bb>] ip_output+0x29a/0x2dd [<ffffffff80064c6f>] __mutex_lock_slowpath+0x60/0x9b [<ffffffff888eff12>] :ibrix:unreference_internal+0x142/0x280 [<ffffffff80064cb9>] .text.lock.mutex+0xf/0x14 [<ffffffff888f0095>] :ibrix:unreference+0x45/0x60 [<ffffffff888d74c4>] :ibrix:setInodeWithDelegInContextByIndexEx+0x7e4/0x1450 [<ffffffff88889253>] :ibrix:put_delegation+0x643/0x6d0 [<ffffffff888d8d61>] :ibrix:setInodeWithDelegInContextByIndex+0x61/0xa0 [<ffffffff88916c80>] :ibrix:ibrix_aio_write_internal+0x330/0x1c40 [<ffffffff888802ce>] :ibrix:get_ibr_time+0xe/0x30 [<ffffffff888824f3>] :ibrix:ibr_release_resource+0x5f3/0x670 [<ffffffff888d5f1c>] :ibrix:removeSegmentFromContext+0x1bc/0x320 [<ffffffff889ef56d>] :ibrix:j_errorInjector+0x2d/0x3d0 [<ffffffff889187b1>] :ibrix:ibrix_aio_write+0x221/0x260 [<ffffffff88924ea0>] :ibrix:ibrix_file_writev+0xe0/0x130 [<ffffffff8001819f>] do_sync_write+0x0/0x104 [<ffffffff800e1368>] do_readv_writev+0x172/0x291 [<ffffffff8001819f>] do_sync_write+0x0/0x104 [<ffffffff888d6cb2>] :ibrix:release_ds_context_internal+0x202/0x230 [<ffffffff88941106>] :ibrix:ibrix_open_file+0x756/0x770 [<ffffffff89cb4711>] :nfsd:nfsd_vfs_write+0xf2/0x2e1 [<ffffffff889409b0>] :ibrix:ibrix_open_file+0x0/0x770 [<ffffffff8001e9e2>] __dentry_open+0x101/0x1dc [<ffffffff89cb5027>] :nfsd:nfsd_write+0xb5/0xd5 [<ffffffff89cbbb1a>] :nfsd:nfsd3_proc_write+0xea/0x109 [<ffffffff89cb11db>] :nfsd:nfsd_dispatch+0xd8/0x1d6 [<ffffffff88630651>] :sunrpc:svc_process+0x454/0x71b [<ffffffff80065644>] __down_read+0x12/0x92 [<ffffffff89cb15a1>] :nfsd:nfsd+0x0/0x2cb [<ffffffff89cb1746>] :nfsd:nfsd+0x1a5/0x2cb [<ffffffff8005efb1>] child_rip+0xa/0x11 [<ffffffff89cb15a1>] :nfsd:nfsd+0x0/0x2cb [<ffffffff89cb15a1>] :nfsd:nfsd+0x0/0x2cb [<ffffffff8005efa7>] child_rip+0x0/0x11 Kernel panic - not syncing: hung_task: blocked tasks The system was low on memory, and a thread was blocked on a RW lock waiting for other thread to finish write. That other thread was waiting in prune_dcache() for the spinlock: --- <NMI exception stack> --- #3 [ffff81060275d618] .text.lock.spinlock at ffffffff80065bfc (via _spin_lock) #4 [ffff81060275d618] prune_dcache at ffffffff8002ea85 #5 [ffff81060275d648] shrink_dcache_memory at ffffffff800ec75f #6 [ffff81060275d658] shrink_slab at ffffffff8003f7aa #7 [ffff81060275d698] try_to_free_pages at ffffffff800ccbb7 #8 [ffff81060275d728] __alloc_pages at ffffffff8000f47f #9 [ffff81060275d798] grab_cache_page_write_begin at ffffffff800c6d70 #10 [ffff81060275d7c8] generic_file_buffered_write at ffffffff8000fed3 #11 [ffff81060275d8b8] __generic_file_aio_write_nolock at ffffffff80016641 #12 [ffff81060275d968] generic_file_aio_write_nolock at ffffffff800c75cc #13 [ffff81060275d998] EsIbx_Write at ffffffff887d4882 #14 [ffff81060275da18] ibrix_aio_write_internal at ffffffff88917c82 #15 [ffff81060275db78] ibrix_aio_write at ffffffff889187b1 #16 [ffff81060275dbc8] ibrix_file_writev at ffffffff88924ea0 #17 [ffff81060275dc08] do_readv_writev at ffffffff800e1368 #18 [ffff81060275dce8] nfsd_vfs_write at ffffffff89cb4711 #19 [ffff81060275dd98] nfsd_write at ffffffff89cb5027 #20 [ffff81060275ddf8] nfsd3_proc_write at ffffffff89cbbb1a #21 [ffff81060275de48] nfsd_dispatch at ffffffff89cb11db #22 [ffff81060275de78] svc_process at ffffffff88630651 #23 [ffff81060275dee8] nfsd at ffffffff89cb1746 #24 [ffff81060275df48] kernel_thread at ffffffff8005efb1 crash> kmem -i PAGES TOTAL PERCENTAGE TOTAL MEM 12330931 47 GB ---- FREE 360301 1.4 GB 2% of TOTAL MEM USED 11970630 45.7 GB 97% of TOTAL MEM SHARED 10529542 40.2 GB 85% of TOTAL MEM BUFFERS 10265044 39.2 GB 83% of TOTAL MEM CACHED 1175243 4.5 GB 9% of TOTAL MEM SLAB 314137 1.2 GB 2% of TOTAL MEM TOTAL HIGH 0 0 0% of TOTAL MEM FREE HIGH 0 0 0% of TOTAL HIGH TOTAL LOW 12330931 47 GB 100% of TOTAL MEM FREE LOW 360301 1.4 GB 2% of TOTAL LOW TOTAL SWAP 524286 2 GB ---- SWAP USED 311 1.2 MB 0% of TOTAL SWAP SWAP FREE 523975 2 GB 99% of TOTAL SWAP Several threads were spinning for dcache_lock spinlock in shrink_slab()->shrink_dcache_memory()->prune_dcache()->_spinlock() crash> bt -a |grep prune_dcache #4 [ffff81061f549d58] prune_dcache at ffffffff8002ea85 #4 [ffff8101e0a6ba80] prune_dcache at ffffffff8002ea85 #4 [ffff8105adef3a80] prune_dcache at ffffffff8002ea85 #4 [ffff8104d509ba80] prune_dcache at ffffffff8002ea85 #4 [ffff810609f6fcf8] prune_dcache at ffffffff8002ea85 #4 [ffff81060c12dd38] prune_dcache at ffffffff8002ea85 #4 [ffff8101e6aeda80] prune_dcache at ffffffff8002ea85 #4 [ffff81061f5ddd58] prune_dcache at ffffffff8002ea85 #4 [ffff81060275d618] prune_dcache at ffffffff8002ea85 #4 [ffff810b9b443a80] prune_dcache at ffffffff8002ea85 #4 [ffff8103ea0b5a80] prune_dcache at ffffffff8002ea85 11 processors out of 16 were stuck contending for dcache_lock, like this one: PID: 6782 TASK: ffff8101e0a6e820 CPU: 15 COMMAND: "mkibfs" #0 [ffff810c1fccaf20] crash_nmi_callback at ffffffff8007bf44 #1 [ffff810c1fccaf40] do_nmi at ffffffff8006688a #2 [ffff810c1fccaf50] nmi at ffffffff80065eef [exception RIP: .text.lock.spinlock+2] RIP: ffffffff80065bfc RSP: ffff8103ea0b5a80 RFLAGS: 00000282 RAX: 00000000ffffffff RBX: ffff810c1fffcac0 RCX: 0000000000000064 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff803f2480 RBP: 00000000ffc6c470 R8: 0000000000000246 R9: 0000000000000030 R10: ffff81063511d490 R11: 00000000ffc6c470 R12: 0000000000000000 R13: 0000000000000080 R14: 0000000000000000 R15: 0000000000002b80 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #3 [ffff8103ea0b5a80] .text.lock.spinlock at ffffffff80065bfc (via _spin_lock) #4 [ffff8103ea0b5a80] prune_dcache at ffffffff8002ea85 #5 [ffff8103ea0b5ab0] shrink_dcache_memory at ffffffff800ec75f #6 [ffff8103ea0b5ac0] shrink_slab at ffffffff8003f7aa #7 [ffff8103ea0b5b00] try_to_free_pages at ffffffff800ccbb7 #8 [ffff8103ea0b5b90] __alloc_pages at ffffffff8000f47f #9 [ffff8103ea0b5c00] grab_cache_page_write_begin at ffffffff800c6d70 #10 [ffff8103ea0b5c30] generic_file_buffered_write at ffffffff8000fed3 #11 [ffff8103ea0b5d20] __generic_file_aio_write_nolock at ffffffff80016641 #12 [ffff8103ea0b5dd0] generic_file_aio_write_nolock at ffffffff800c75cc #13 [ffff8103ea0b5e00] generic_file_write_nolock at ffffffff800c79a3 #14 [ffff8103ea0b5ef0] blkdev_file_write at ffffffff800e560d #15 [ffff8103ea0b5f10] vfs_write at ffffffff80016a49 #16 [ffff8103ea0b5f40] sys_write at ffffffff80017316 #17 [ffff8103ea0b5f80] tracesys at ffffffff8005e28d (via system_call) The problem is similar to what has been described in RedHat bug 526612 https://bugzilla.redhat.com/show_bug.cgi?id=526612 It is supposed be fixed according to http://rhn.redhat.com/errata/RHSA-2010-0178.html in the kernel 2.6.18-194 and that is the version of the kernel we have installed. Indeed it looks like an incomplete fix for bug 526612.
This bug/component is not included in scope for RHEL-5.11.0 which is the last RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX (at the end of RHEL5.11 development phase (Apr 22, 2014)). Please contact your account manager or support representative in case you need to escalate this bug.
Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in RHEL5 stream. If the issue is critical for your business, please provide additional business justification through the appropriate support channels (https://access.redhat.com/site/support).