Bug 734228 - dcache spinlock contention in 2.6.18-194.el5 kernel
Summary: dcache spinlock contention in 2.6.18-194.el5 kernel
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.6
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Red Hat Kernel Manager
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-08-29 19:14 UTC by Dmitry Zogin
Modified: 2015-11-16 16:03 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-06-02 13:25:59 UTC


Attachments (Terms of Use)

Description Dmitry Zogin 2011-08-29 19:14:03 UTC
Description of problem:

During our internal tests of HP Ibrix file system and NFS, we found the Linux kernel lock contention for dcache_lock under low memory pressure.
I can submit the system crashdump, if needed - let me know where you want me to upload it.
The symptoms are:

  SYSTEM MAP: ./System.map-2.6.18-194.el5                              
DEBUG KERNEL: ./vmlinux (2.6.18-194.el5)
    DUMPFILE: ./vmcore
        CPUS: 16
        DATE: Mon Aug 29 10:49:51 2011
      UPTIME: 03:04:02
LOAD AVERAGE: 25.35, 17.47, 11.32
       TASKS: 854
    NODENAME: ib48-123
     RELEASE: 2.6.18-194.el5
     VERSION: #1 SMP Tue Mar 16 21:52:39 EDT 2010
     MACHINE: x86_64  (2665 Mhz)
      MEMORY: 47.3 GB
       PANIC: "Kernel panic - not syncing: hung_task: blocked tasks"
         PID: 903
     COMMAND: "khungtaskd"
        TASK: ffff810c1fb6d040  [THREAD_INFO: ffff81061f0fe000]
         CPU: 8
       STATE: TASK_RUNNING (PANIC)

INFO: task nfsd:11087 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
nfsd          D ffffffff80151248     0 11087      1         11098 11094 (L-TLB)
 ffff810609359900 0000000000000046 ffff810bd1398bc0 ffff81061b420000
 0000000000000000 000000000000000a ffff81060e4080c0 ffff810c1fc36820
 00000a138f8ff7bc 00000000000157eb ffff81060e4082a8 0000000ebb4eb380
Call Trace:
 [<ffffffff800322bb>] ip_output+0x29a/0x2dd
 [<ffffffff80064c6f>] __mutex_lock_slowpath+0x60/0x9b
 [<ffffffff888eff12>] :ibrix:unreference_internal+0x142/0x280
 [<ffffffff80064cb9>] .text.lock.mutex+0xf/0x14
 [<ffffffff888f0095>] :ibrix:unreference+0x45/0x60
 [<ffffffff888d74c4>] :ibrix:setInodeWithDelegInContextByIndexEx+0x7e4/0x1450
 [<ffffffff88889253>] :ibrix:put_delegation+0x643/0x6d0
 [<ffffffff888d8d61>] :ibrix:setInodeWithDelegInContextByIndex+0x61/0xa0
 [<ffffffff88916c80>] :ibrix:ibrix_aio_write_internal+0x330/0x1c40
 [<ffffffff888802ce>] :ibrix:get_ibr_time+0xe/0x30
 [<ffffffff888824f3>] :ibrix:ibr_release_resource+0x5f3/0x670
 [<ffffffff888d5f1c>] :ibrix:removeSegmentFromContext+0x1bc/0x320
 [<ffffffff889ef56d>] :ibrix:j_errorInjector+0x2d/0x3d0
 [<ffffffff889187b1>] :ibrix:ibrix_aio_write+0x221/0x260
 [<ffffffff88924ea0>] :ibrix:ibrix_file_writev+0xe0/0x130
 [<ffffffff8001819f>] do_sync_write+0x0/0x104
 [<ffffffff800e1368>] do_readv_writev+0x172/0x291
 [<ffffffff8001819f>] do_sync_write+0x0/0x104
 [<ffffffff888d6cb2>] :ibrix:release_ds_context_internal+0x202/0x230
 [<ffffffff88941106>] :ibrix:ibrix_open_file+0x756/0x770
 [<ffffffff89cb4711>] :nfsd:nfsd_vfs_write+0xf2/0x2e1
 [<ffffffff889409b0>] :ibrix:ibrix_open_file+0x0/0x770
 [<ffffffff8001e9e2>] __dentry_open+0x101/0x1dc
 [<ffffffff89cb5027>] :nfsd:nfsd_write+0xb5/0xd5
 [<ffffffff89cbbb1a>] :nfsd:nfsd3_proc_write+0xea/0x109
 [<ffffffff89cb11db>] :nfsd:nfsd_dispatch+0xd8/0x1d6
 [<ffffffff88630651>] :sunrpc:svc_process+0x454/0x71b
 [<ffffffff80065644>] __down_read+0x12/0x92
 [<ffffffff89cb15a1>] :nfsd:nfsd+0x0/0x2cb
 [<ffffffff89cb1746>] :nfsd:nfsd+0x1a5/0x2cb
 [<ffffffff8005efb1>] child_rip+0xa/0x11
 [<ffffffff89cb15a1>] :nfsd:nfsd+0x0/0x2cb
 [<ffffffff89cb15a1>] :nfsd:nfsd+0x0/0x2cb
 [<ffffffff8005efa7>] child_rip+0x0/0x11

Kernel panic - not syncing: hung_task: blocked tasks
The system was low on memory, and a thread was blocked on a RW lock
waiting for other thread to finish write. That other thread was waiting in prune_dcache() for the spinlock:

--- <NMI exception stack> ---
 #3 [ffff81060275d618] .text.lock.spinlock at ffffffff80065bfc (via _spin_lock)
 #4 [ffff81060275d618] prune_dcache at ffffffff8002ea85
 #5 [ffff81060275d648] shrink_dcache_memory at ffffffff800ec75f
 #6 [ffff81060275d658] shrink_slab at ffffffff8003f7aa
 #7 [ffff81060275d698] try_to_free_pages at ffffffff800ccbb7
 #8 [ffff81060275d728] __alloc_pages at ffffffff8000f47f
 #9 [ffff81060275d798] grab_cache_page_write_begin at ffffffff800c6d70
#10 [ffff81060275d7c8] generic_file_buffered_write at ffffffff8000fed3
#11 [ffff81060275d8b8] __generic_file_aio_write_nolock at ffffffff80016641
#12 [ffff81060275d968] generic_file_aio_write_nolock at ffffffff800c75cc
#13 [ffff81060275d998] EsIbx_Write at ffffffff887d4882
#14 [ffff81060275da18] ibrix_aio_write_internal at ffffffff88917c82
#15 [ffff81060275db78] ibrix_aio_write at ffffffff889187b1
#16 [ffff81060275dbc8] ibrix_file_writev at ffffffff88924ea0
#17 [ffff81060275dc08] do_readv_writev at ffffffff800e1368
#18 [ffff81060275dce8] nfsd_vfs_write at ffffffff89cb4711
#19 [ffff81060275dd98] nfsd_write at ffffffff89cb5027
#20 [ffff81060275ddf8] nfsd3_proc_write at ffffffff89cbbb1a
#21 [ffff81060275de48] nfsd_dispatch at ffffffff89cb11db
#22 [ffff81060275de78] svc_process at ffffffff88630651
#23 [ffff81060275dee8] nfsd at ffffffff89cb1746
#24 [ffff81060275df48] kernel_thread at ffffffff8005efb1

crash> kmem -i
              PAGES        TOTAL      PERCENTAGE
 TOTAL MEM  12330931        47 GB         ----
      FREE   360301       1.4 GB    2% of TOTAL MEM
      USED  11970630      45.7 GB   97% of TOTAL MEM
    SHARED  10529542      40.2 GB   85% of TOTAL MEM
   BUFFERS  10265044      39.2 GB   83% of TOTAL MEM
    CACHED  1175243       4.5 GB    9% of TOTAL MEM
      SLAB   314137       1.2 GB    2% of TOTAL MEM

TOTAL HIGH        0            0    0% of TOTAL MEM
 FREE HIGH        0            0    0% of TOTAL HIGH
 TOTAL LOW  12330931        47 GB  100% of TOTAL MEM
  FREE LOW   360301       1.4 GB    2% of TOTAL LOW

TOTAL SWAP   524286         2 GB         ----
 SWAP USED      311       1.2 MB    0% of TOTAL SWAP
 SWAP FREE   523975         2 GB   99% of TOTAL SWAP


Several threads were spinning for dcache_lock spinlock in
shrink_slab()->shrink_dcache_memory()->prune_dcache()->_spinlock()

crash> bt -a |grep prune_dcache
 #4 [ffff81061f549d58] prune_dcache at ffffffff8002ea85
 #4 [ffff8101e0a6ba80] prune_dcache at ffffffff8002ea85
 #4 [ffff8105adef3a80] prune_dcache at ffffffff8002ea85
 #4 [ffff8104d509ba80] prune_dcache at ffffffff8002ea85
 #4 [ffff810609f6fcf8] prune_dcache at ffffffff8002ea85
 #4 [ffff81060c12dd38] prune_dcache at ffffffff8002ea85
 #4 [ffff8101e6aeda80] prune_dcache at ffffffff8002ea85
 #4 [ffff81061f5ddd58] prune_dcache at ffffffff8002ea85
 #4 [ffff81060275d618] prune_dcache at ffffffff8002ea85
 #4 [ffff810b9b443a80] prune_dcache at ffffffff8002ea85
 #4 [ffff8103ea0b5a80] prune_dcache at ffffffff8002ea85

11 processors out of 16 were stuck contending for dcache_lock, like this one:

PID: 6782   TASK: ffff8101e0a6e820  CPU: 15  COMMAND: "mkibfs"
 #0 [ffff810c1fccaf20] crash_nmi_callback at ffffffff8007bf44
 #1 [ffff810c1fccaf40] do_nmi at ffffffff8006688a
 #2 [ffff810c1fccaf50] nmi at ffffffff80065eef
    [exception RIP: .text.lock.spinlock+2]
    RIP: ffffffff80065bfc  RSP: ffff8103ea0b5a80  RFLAGS: 00000282
    RAX: 00000000ffffffff  RBX: ffff810c1fffcac0  RCX: 0000000000000064
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: ffffffff803f2480
    RBP: 00000000ffc6c470   R8: 0000000000000246   R9: 0000000000000030
    R10: ffff81063511d490  R11: 00000000ffc6c470  R12: 0000000000000000
    R13: 0000000000000080  R14: 0000000000000000  R15: 0000000000002b80
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #3 [ffff8103ea0b5a80] .text.lock.spinlock at ffffffff80065bfc (via _spin_lock)
 #4 [ffff8103ea0b5a80] prune_dcache at ffffffff8002ea85
 #5 [ffff8103ea0b5ab0] shrink_dcache_memory at ffffffff800ec75f
 #6 [ffff8103ea0b5ac0] shrink_slab at ffffffff8003f7aa
 #7 [ffff8103ea0b5b00] try_to_free_pages at ffffffff800ccbb7
 #8 [ffff8103ea0b5b90] __alloc_pages at ffffffff8000f47f
 #9 [ffff8103ea0b5c00] grab_cache_page_write_begin at ffffffff800c6d70
#10 [ffff8103ea0b5c30] generic_file_buffered_write at ffffffff8000fed3
#11 [ffff8103ea0b5d20] __generic_file_aio_write_nolock at ffffffff80016641
#12 [ffff8103ea0b5dd0] generic_file_aio_write_nolock at ffffffff800c75cc
#13 [ffff8103ea0b5e00] generic_file_write_nolock at ffffffff800c79a3
#14 [ffff8103ea0b5ef0] blkdev_file_write at ffffffff800e560d
#15 [ffff8103ea0b5f10] vfs_write at ffffffff80016a49
#16 [ffff8103ea0b5f40] sys_write at ffffffff80017316
#17 [ffff8103ea0b5f80] tracesys at ffffffff8005e28d (via system_call)

The problem is similar to what has been described in
RedHat bug 526612
https://bugzilla.redhat.com/show_bug.cgi?id=526612
It is supposed be fixed according to
http://rhn.redhat.com/errata/RHSA-2010-0178.html
in the kernel 2.6.18-194 and that is the version of the kernel we have installed.

Indeed it looks like an incomplete fix for bug 526612.

Comment 1 RHEL Product and Program Management 2014-03-07 13:57:50 UTC
This bug/component is not included in scope for RHEL-5.11.0 which is the last RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX (at the end of RHEL5.11 development phase (Apr 22, 2014)). Please contact your account manager or support representative in case you need to escalate this bug.

Comment 2 RHEL Product and Program Management 2014-06-02 13:25:59 UTC
Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in RHEL5 stream. If the issue is critical for your business, please provide additional business justification through the appropriate support channels (https://access.redhat.com/site/support).


Note You need to log in before you can comment on or make changes to this bug.