Bug 669418

Summary: khugepaged blocking on page locks
Product: Red Hat Enterprise Linux 6 Reporter: Jeremy Agee <jagee>
Component: kernelAssignee: Larry Woodman <lwoodman>
Status: CLOSED ERRATA QA Contact: Chao Ye <cye>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.0CC: czhang, kbanerje, lit-cs-sysadmin, qcai, syeghiay
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-2.6.32-104.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-23 20:37:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jeremy Agee 2011-01-13 16:13:18 UTC
khugepaged blocking on page locks


vmcore info:

      KERNEL: /cores/20110110101743/work/vmlinux
    DUMPFILE: /cores/20110110101743/work/vmcore_debug_kernel_Iointerceptor_crash  [PARTIAL DUMP]
        CPUS: 24
        DATE: Thu Jan  6 22:13:43 2011
      UPTIME: 00:59:10
LOAD AVERAGE: 13.13, 13.58, 16.83
       TASKS: 1071
    NODENAME: node3
     RELEASE: 2.6.32-71.7.1.el6.x86_64.debug
     VERSION: #1 SMP Wed Oct 27 03:01:55 EDT 2010
     MACHINE: x86_64  (2800 Mhz)
      MEMORY: 48 GB
       PANIC: "Oops: 0002 [#1] SMP " (check log for details)

Current analysis by Lachlan McIlroy:

The khugepaged thread is the one causing all the problems.  In the following stack trace khugepaged has the mmap_sem semaphore locked and is currently trying to allocate and compact pages.  It has found a page that is currently locked due to I/O and khugepaged is waiting for the I/O to complete and the page to be unlocked.  Why has it been blocked for 2 minutes?  There could be severe I/O congestion holding up the page or it could be locked by another process.

Note that a hugepage is not just one page but 512 pages and waiting for that many pages to be unlocked means that the mmap_sem can be held for a long time and that can hold up other processes and then they'll report being blocked too.

INFO: task khugepaged:212 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
khugepaged    D 0000000000000000  4296   212      2 0x00000000
 ffff880c0386f8e0 0000000000000046 0000000000000000 0000000000000007
 0000000000000006 ffffffff81120a30 ffff8800581d6fd8 0000000100221784
 ffff880c03880cc0 ffff880c0386ffd8 0000000000010608 ffff880c03880cc0
Call Trace:
 [<ffffffff81120a30>] ? sync_page+0x0/0x60
 [<ffffffff81120a30>] ? sync_page+0x0/0x60
 [<ffffffff814f8d63>] io_schedule+0x73/0xc0
 [<ffffffff81120a74>] sync_page+0x44/0x60
 [<ffffffff814f94ea>] __wait_on_bit_lock+0x5a/0xc0
 [<ffffffff81120a07>] __lock_page+0x67/0x70
 [<ffffffff81096e50>] ? wake_bit_function+0x0/0x50
 [<ffffffff811751a3>] lock_page+0x43/0x50
 [<ffffffff81175938>] migrate_pages+0x6b8/0x6e0
 [<ffffffff81169920>] ? compaction_alloc+0x0/0x370
 [<ffffffff8116938d>] compact_zone+0x4cd/0x600
 [<ffffffff810ae43d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff8116974e>] compact_zone_order+0x7e/0xb0
 [<ffffffff811698a1>] try_to_compact_pages+0x121/0x1a0
 [<ffffffff811345d7>] __alloc_pages_nodemask+0x5a7/0x8d0
 [<ffffffff8128cf91>] ? debug_object_free+0xc1/0x140
 [<ffffffff8101b145>] ? native_sched_clock+0x15/0x70
 [<ffffffff8109d7cf>] ? cpu_clock+0x6f/0x80
 [<ffffffff81167d84>] alloc_pages_vma+0x84/0x110
 [<ffffffff814fa31a>] ? down_write+0x9a/0xb0
 [<ffffffff811811a9>] ? khugepaged+0x9d9/0x1320
 [<ffffffff811813a8>] khugepaged+0xbd8/0x1320
 [<ffffffff814fb8c0>] ? _spin_unlock_irq+0x30/0x40
 [<ffffffff81096e10>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff811807d0>] ? khugepaged+0x0/0x1320
 [<ffffffff81096ac6>] kthread+0x96/0xa0
 [<ffffffff810142ca>] child_rip+0xa/0x20
 [<ffffffff81013c10>] ? restore_args+0x0/0x30
 [<ffffffff81096a30>] ? kthread+0x0/0xa0
 [<ffffffff810142c0>] ? child_rip+0x0/0x20
1 lock held by khugepaged/212:
 #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff811811a9>] khugepaged+0x9d9/0x1320

This IOInterceptor process is trying to acquire the mmap_sem semaphore so is probably stuck behind khugepaged:

INFO: task IOInterceptor.u:6499 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
IOInterceptor D 00000000ffffffff  3872  6499   6461 0x00000080
 ffff880c065ffde8 0000000000000046 0000000000000000 ffff880c04bdc240
 0000000000000000 0000000000000007 ffff8800593d6fd8 0000000100221799
 ffff880c04bdc800 ffff880c065fffd8 0000000000010608 ffff880c04bdc800
Call Trace:
 [<ffffffff814fb10d>] rwsem_down_failed_common+0x8d/0x1d0
 [<ffffffff814fb273>] rwsem_down_write_failed+0x23/0x30
 [<ffffffff811811a9>] ? khugepaged+0x9d9/0x1320
 [<ffffffff812875d3>] call_rwsem_down_write_failed+0x13/0x20
 [<ffffffff814fa30e>] ? down_write+0x8e/0xb0
 [<ffffffff811427fc>] ? sys_mmap_pgoff+0x5c/0x2a0
 [<ffffffff811427fc>] sys_mmap_pgoff+0x5c/0x2a0
 [<ffffffff814fb3d2>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff810182b9>] sys_mmap+0x29/0x30
 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b
1 lock held by IOInterceptor.u/6499:
 #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff811427fc>] sys_mmap_pgoff+0x5c/0x2a0

These three IOInterceptor processes are also trying to acquire the mmap_sem semaphore so are probably also stuck behind khugepaged:

INFO: task IOInterceptor.u:20167 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
IOInterceptor D 0000000000000000  4880 20167   6461 0x00000080
 ffff880acc8b7b60 0000000000000046 0000000000000000 ffff880102481100
 0000000000000000 0000000000000007 ffff8800587d6fd8 0000000100221ba4
 ffff8801024816c0 ffff880acc8b7fd8 0000000000010608 ffff8801024816c0
Call Trace:
 [<ffffffff814fb10d>] rwsem_down_failed_common+0x8d/0x1d0
 [<ffffffff8101b145>] ? native_sched_clock+0x15/0x70
 [<ffffffff8101a739>] ? sched_clock+0x9/0x10
 [<ffffffff8101b145>] ? native_sched_clock+0x15/0x70
 [<ffffffff814fb2a6>] rwsem_down_read_failed+0x26/0x30
 [<ffffffff812875a4>] call_rwsem_down_read_failed+0x14/0x30
 [<ffffffff814fa3ca>] ? down_read+0x9a/0xa0
 [<ffffffffa0344438>] ? fuse_copy_fill+0x98/0x1f0 [fuse]
 [<ffffffff814fb97b>] ? _spin_unlock+0x2b/0x40
 [<ffffffffa0344438>] fuse_copy_fill+0x98/0x1f0 [fuse]
 [<ffffffffa03445cf>] fuse_copy_one+0x3f/0x70 [fuse]
 [<ffffffffa034552e>] fuse_dev_read+0x23e/0x320 [fuse]
 [<ffffffff81188a7a>] do_sync_read+0xfa/0x140
 [<ffffffff810adaed>] ? lock_release_holdtime+0x3d/0x190
 [<ffffffff81096e10>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8109d71d>] ? sched_clock_cpu+0xcd/0x110
 [<ffffffff810aa7ad>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff810adaed>] ? lock_release_holdtime+0x3d/0x190
 [<ffffffff8121f7c6>] ? security_file_permission+0x16/0x20
 [<ffffffff811894a5>] vfs_read+0xb5/0x1a0
 [<ffffffff81189ff6>] ? fget_light+0x66/0x100
 [<ffffffff811895e1>] sys_read+0x51/0x90
 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b
1 lock held by IOInterceptor.u/20167:
 #0:  (&mm->mmap_sem){++++++}, at: [<ffffffffa0344438>] fuse_copy_fill+0x98/0x1f0 [fuse]

INFO: task IOInterceptor.u:20201 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
IOInterceptor D 00000000ffffffff  5648 20201   6461 0x00000080
 ffff880120927b60 0000000000000046 0000000000000000 ffff8804c6c248c0
 0000000000000000 0000000000000007 ffff88005a3d6fd8 0000000100221b92
 ffff8804c6c24e80 ffff880120927fd8 0000000000010608 ffff8804c6c24e80
Call Trace:
 [<ffffffff814fb10d>] rwsem_down_failed_common+0x8d/0x1d0
 [<ffffffff8101b145>] ? native_sched_clock+0x15/0x70
 [<ffffffff8101a739>] ? sched_clock+0x9/0x10
 [<ffffffff8101b145>] ? native_sched_clock+0x15/0x70
 [<ffffffff814fb2a6>] rwsem_down_read_failed+0x26/0x30
 [<ffffffff812875a4>] call_rwsem_down_read_failed+0x14/0x30
 [<ffffffff814fa3ca>] ? down_read+0x9a/0xa0
 [<ffffffffa0344438>] ? fuse_copy_fill+0x98/0x1f0 [fuse]
 [<ffffffff814fb97b>] ? _spin_unlock+0x2b/0x40
 [<ffffffffa0344438>] fuse_copy_fill+0x98/0x1f0 [fuse]
 [<ffffffffa03445cf>] fuse_copy_one+0x3f/0x70 [fuse]
 [<ffffffffa034552e>] fuse_dev_read+0x23e/0x320 [fuse]
 [<ffffffff81188a7a>] do_sync_read+0xfa/0x140
 [<ffffffff810adaed>] ? lock_release_holdtime+0x3d/0x190
 [<ffffffff81096e10>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8109d71d>] ? sched_clock_cpu+0xcd/0x110
 [<ffffffff810aa7ad>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff810adaed>] ? lock_release_holdtime+0x3d/0x190
 [<ffffffff8121f7c6>] ? security_file_permission+0x16/0x20
 [<ffffffff811894a5>] vfs_read+0xb5/0x1a0
 [<ffffffff81189ff6>] ? fget_light+0x66/0x100
 [<ffffffff811895e1>] sys_read+0x51/0x90
 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b
1 lock held by IOInterceptor.u/20201:
 #0:  (&mm->mmap_sem){++++++}, at: [<ffffffffa0344438>] fuse_copy_fill+0x98/0x1f0 [fuse]

INFO: task IOInterceptor.u:20377 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
IOInterceptor D 00000000ffffffff  3920 20377   6461 0x00000080
 ffff8806f94b3b60 0000000000000046 0000000000000000 ffff8806fcb7c140
 0000000000000000 0000000000000007 ffff880058dd6fd8 0000000100221ba6
 ffff8806fcb7c700 ffff8806f94b3fd8 0000000000010608 ffff8806fcb7c700
Call Trace:
 [<ffffffff814fb10d>] rwsem_down_failed_common+0x8d/0x1d0
 [<ffffffff8101b145>] ? native_sched_clock+0x15/0x70
 [<ffffffff8101a739>] ? sched_clock+0x9/0x10
 [<ffffffff8101b145>] ? native_sched_clock+0x15/0x70
 [<ffffffff814fb2a6>] rwsem_down_read_failed+0x26/0x30
 [<ffffffff812875a4>] call_rwsem_down_read_failed+0x14/0x30
 [<ffffffff814fa3ca>] ? down_read+0x9a/0xa0
 [<ffffffffa0344438>] ? fuse_copy_fill+0x98/0x1f0 [fuse]
 [<ffffffff814fb97b>] ? _spin_unlock+0x2b/0x40
 [<ffffffffa0344438>] fuse_copy_fill+0x98/0x1f0 [fuse]
 [<ffffffffa03445cf>] fuse_copy_one+0x3f/0x70 [fuse]
 [<ffffffffa034552e>] fuse_dev_read+0x23e/0x320 [fuse]
 [<ffffffff81188a7a>] do_sync_read+0xfa/0x140
 [<ffffffff810adaed>] ? lock_release_holdtime+0x3d/0x190
 [<ffffffff81096e10>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8109d71d>] ? sched_clock_cpu+0xcd/0x110
 [<ffffffff810aa7ad>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff810adaed>] ? lock_release_holdtime+0x3d/0x190
 [<ffffffff8121f7c6>] ? security_file_permission+0x16/0x20
 [<ffffffff811894a5>] vfs_read+0xb5/0x1a0
 [<ffffffff81189ff6>] ? fget_light+0x66/0x100
 [<ffffffff811895e1>] sys_read+0x51/0x90
 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b
1 lock held by IOInterceptor.u/20377:
 #0:  (&mm->mmap_sem){++++++}, at: [<ffffffffa0344438>] fuse_copy_fill+0x98/0x1f0 [fuse]

These two python processes are trying to acquire the mmap_sem semaphore so are probably stuck behind khugepaged:

INFO: task python:7585 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
python        D 0000000000000000  4608  7585   7576 0x00000080
 ffff880c06e5bca0 0000000000000046 0000000000000000 ffff880bc534d640
 0000000000000000 0000000000000007 ffff880058bd6fd8 00000001002233be
 ffff880bc534dc00 ffff880c06e5bfd8 0000000000010608 ffff880bc534dc00
Call Trace:
 [<ffffffff814fb10d>] rwsem_down_failed_common+0x8d/0x1d0
 [<ffffffff8101b145>] ? native_sched_clock+0x15/0x70
 [<ffffffff8101a739>] ? sched_clock+0x9/0x10
 [<ffffffff8101b145>] ? native_sched_clock+0x15/0x70
 [<ffffffff814fb2a6>] rwsem_down_read_failed+0x26/0x30
 [<ffffffff812875a4>] call_rwsem_down_read_failed+0x14/0x30
 [<ffffffff814fa3ca>] ? down_read+0x9a/0xa0
 [<ffffffff8114d35d>] ? access_process_vm+0x4d/0x200
 [<ffffffff8114d35d>] access_process_vm+0x4d/0x200
 [<ffffffff811eef7d>] proc_pid_cmdline+0x6d/0x120
 [<ffffffff81167cb7>] ? alloc_pages_current+0x87/0xd0
 [<ffffffff811efdad>] proc_info_read+0xad/0xf0
 [<ffffffff811894a5>] vfs_read+0xb5/0x1a0
 [<ffffffff81189ff6>] ? fget_light+0x66/0x100
 [<ffffffff811895e1>] sys_read+0x51/0x90
 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b
1 lock held by python/7585:
 #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8114d35d>] access_process_vm+0x4d/0x200

INFO: task python:7599 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
python        D 0000000000000000  4608  7599   7576 0x00000080
 ffff880bc18bbca0 0000000000000046 0000000000000000 ffff880c00850180
 0000000000000000 0000000000000007 ffff880058bd6fd8 0000000100230158
 ffff880c00850740 ffff880bc18bbfd8 0000000000010608 ffff880c00850740
Call Trace:
 [<ffffffff814fb10d>] rwsem_down_failed_common+0x8d/0x1d0
 [<ffffffff8101b145>] ? native_sched_clock+0x15/0x70
 [<ffffffff8101a739>] ? sched_clock+0x9/0x10
 [<ffffffff8101b145>] ? native_sched_clock+0x15/0x70
 [<ffffffff814fb2a6>] rwsem_down_read_failed+0x26/0x30
 [<ffffffff812875a4>] call_rwsem_down_read_failed+0x14/0x30
 [<ffffffff814fa3ca>] ? down_read+0x9a/0xa0
 [<ffffffff8114d35d>] ? access_process_vm+0x4d/0x200
 [<ffffffff8114d35d>] access_process_vm+0x4d/0x200
 [<ffffffff811eef7d>] proc_pid_cmdline+0x6d/0x120
 [<ffffffff81167cb7>] ? alloc_pages_current+0x87/0xd0
 [<ffffffff811efdad>] proc_info_read+0xad/0xf0
 [<ffffffff811894a5>] vfs_read+0xb5/0x1a0
 [<ffffffff81189ff6>] ? fget_light+0x66/0x100
 [<ffffffff811895e1>] sys_read+0x51/0x90
 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b
1 lock held by python/7599:
 #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8114d35d>] access_process_vm+0x4d/0x200

These three postgres processes are all blocked on an inode's mutex while trying to lseek on a file in a fuse filesystem:

INFO: task postgres:11939 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
postgres      D 0000000000000246  4304 11939  11672 0x00000084
 ffff880ab924de08 0000000000000046 ffff880ab924dd68 ffffffff8101a739
 ffff880ab924dd88 ffffffff8109d71d ffff880ab924dd88 ffff880b8d91c800
 ffff880b8d91cdc0 ffff880ab924dfd8 0000000000010608 ffff880b8d91cdc0
Call Trace:
 [<ffffffff8101a739>] ? sched_clock+0x9/0x10
 [<ffffffff8109d71d>] ? sched_clock_cpu+0xcd/0x110
 [<ffffffff814f9cad>] ? __mutex_lock_common+0x24d/0x400
 [<ffffffff814f9bff>] __mutex_lock_common+0x19f/0x400
 [<ffffffffa034aca3>] ? fuse_file_llseek+0x43/0xe0 [fuse]
 [<ffffffffa034aca3>] ? fuse_file_llseek+0x43/0xe0 [fuse]
 [<ffffffff814f9f68>] mutex_lock_nested+0x48/0x60
 [<ffffffffa034aca3>] fuse_file_llseek+0x43/0xe0 [fuse]
 [<ffffffff811877fa>] vfs_llseek+0x3a/0x40
 [<ffffffff81188fa6>] sys_lseek+0x66/0x80
 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b
1 lock held by postgres/11939:
 #0:  (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffffa034aca3>] fuse_file_llseek+0x43/0xe0 [fuse]

INFO: task postgres:11944 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
postgres      D 00000000ffffffff  4032 11944  11652 0x00000084
 ffff880b9df4de08 0000000000000046 0000000000000000 ffffffff8101a739
 ffff880b9df4dd88 ffffffff8109d71d ffff880059dd6fd8 000000010022170b
 ffff880b95e68f80 ffff880b9df4dfd8 0000000000010608 ffff880b95e68f80
Call Trace:
 [<ffffffff8101a739>] ? sched_clock+0x9/0x10
 [<ffffffff8109d71d>] ? sched_clock_cpu+0xcd/0x110
 [<ffffffff814f9bff>] __mutex_lock_common+0x19f/0x400
 [<ffffffffa034aca3>] ? fuse_file_llseek+0x43/0xe0 [fuse]
 [<ffffffff8109d7cf>] ? cpu_clock+0x6f/0x80
 [<ffffffffa034aca3>] ? fuse_file_llseek+0x43/0xe0 [fuse]
 [<ffffffff814f9f68>] mutex_lock_nested+0x48/0x60
 [<ffffffffa034aca3>] fuse_file_llseek+0x43/0xe0 [fuse]
 [<ffffffff811877fa>] vfs_llseek+0x3a/0x40
 [<ffffffff81188fa6>] sys_lseek+0x66/0x80
 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b
1 lock held by postgres/11944:
 #0:  (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffffa034aca3>] fuse_file_llseek+0x43/0xe0 [fuse]

INFO: task postgres:11946 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
postgres      D 0000000000000246  4304 11946  11639 0x00000084
 ffff880a3b6b5e08 0000000000000046 ffff880a3b6b5d68 ffffffff8101a739
 ffff880a3b6b5d88 ffffffff8109d71d ffff880a3b6b5d88 ffff880b8dcb4440
 ffff880b8dcb4a00 ffff880a3b6b5fd8 0000000000010608 ffff880b8dcb4a00
Call Trace:
 [<ffffffff8101a739>] ? sched_clock+0x9/0x10
 [<ffffffff8109d71d>] ? sched_clock_cpu+0xcd/0x110
 [<ffffffff814f9cad>] ? __mutex_lock_common+0x24d/0x400
 [<ffffffff814f9bff>] __mutex_lock_common+0x19f/0x400
 [<ffffffffa034aca3>] ? fuse_file_llseek+0x43/0xe0 [fuse]
 [<ffffffffa034aca3>] ? fuse_file_llseek+0x43/0xe0 [fuse]
 [<ffffffff814f9f68>] mutex_lock_nested+0x48/0x60
 [<ffffffffa034aca3>] fuse_file_llseek+0x43/0xe0 [fuse]
 [<ffffffff811877fa>] vfs_llseek+0x3a/0x40
 [<ffffffff81188fa6>] sys_lseek+0x66/0x80
 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b
1 lock held by postgres/11946:
 #0:  (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffffa034aca3>] fuse_file_llseek+0x43/0xe0 [fuse]

It's not clear why these three postgres processes are stuck on inode mutexes.  I think the first one is stuck behind this process:

PID: 11714  TASK: ffff880ad9631240  CPU: 15  COMMAND: "postgres"
 #0 [ffff880903841cc8] schedule at ffffffff814f852d
 #1 [ffff880903841d90] wait_answer_interruptible at ffffffffa0344cb1
 #2 [ffff880903841e00] fuse_request_send at ffffffffa0344eeb
 #3 [ffff880903841e70] fuse_fsync_common at ffffffffa034be6a
 #4 [ffff880903841ed0] fuse_fsync at ffffffffa034bed0
 #5 [ffff880903841ee0] vfs_fsync_range at ffffffff811b7fa3
 #6 [ffff880903841f30] vfs_fsync at ffffffff811b804d
 #7 [ffff880903841f40] do_fsync at ffffffff811b808e
 #8 [ffff880903841f70] sys_fsync at ffffffff811b80e0
 #9 [ffff880903841f80] system_call_fastpath at ffffffff81013172
    RIP: 00007f136424ea30  RSP: 00007fff30f9c608  RFLAGS: 00000246
    RAX: 000000000000004a  RBX: ffffffff81013172  RCX: 0000000000000000
    RDX: 0000000002302000  RSI: 00000000000000ae  RDI: 0000000000000038
    RBP: 00000000022e6178   R8: 0000000000000000   R9: 000000004ce4f9ab
    R10: 00000000bca7d16a  R11: 0000000000000246  R12: ffffffff811b80e0
    R13: ffff880903841f78  R14: 0000000000000fad  R15: 0000000000000000
    ORIG_RAX: 000000000000004a  CS: 0033  SS: 002b

Which appears to be waiting for something fuse-related to respond and based on the last run times that process has been waiting quite a while.

So it definitely looks like we have a problem with khugepaged blocking on page locks when it shouldn't be and holding up many other processes.  We may also have a secondary problem with fuse not responding but that could be a further consequence of the khugepaged problem.


Discussion about the issue:
http://groups.google.com/group/fa.linux.kernel/browse_thread/thread/465007e8e5bfee15

Posable patch:
http://git.kernel.org/?p=linux/kernel/git/andrea/aa.git;a=commitdiff;h=06e5d52f1815848da1647c8021150db037cf366e

Comment 4 RHEL Program Management 2011-01-14 04:31:16 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 9 errata-xmlrpc 2011-05-23 20:37:35 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html

Comment 10 Linda Wang 2016-08-26 01:48:18 UTC
*** Bug 790862 has been marked as a duplicate of this bug. ***