Bug 669418
| Summary: | khugepaged blocking on page locks | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Jeremy Agee <jagee> |
| Component: | kernel | Assignee: | Larry Woodman <lwoodman> |
| Status: | CLOSED ERRATA | QA Contact: | Chao Ye <cye> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 6.0 | CC: | czhang, kbanerje, lit-cs-sysadmin, qcai, syeghiay |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | kernel-2.6.32-104.el6 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-05-23 20:37:35 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0542.html *** Bug 790862 has been marked as a duplicate of this bug. *** |
khugepaged blocking on page locks vmcore info: KERNEL: /cores/20110110101743/work/vmlinux DUMPFILE: /cores/20110110101743/work/vmcore_debug_kernel_Iointerceptor_crash [PARTIAL DUMP] CPUS: 24 DATE: Thu Jan 6 22:13:43 2011 UPTIME: 00:59:10 LOAD AVERAGE: 13.13, 13.58, 16.83 TASKS: 1071 NODENAME: node3 RELEASE: 2.6.32-71.7.1.el6.x86_64.debug VERSION: #1 SMP Wed Oct 27 03:01:55 EDT 2010 MACHINE: x86_64 (2800 Mhz) MEMORY: 48 GB PANIC: "Oops: 0002 [#1] SMP " (check log for details) Current analysis by Lachlan McIlroy: The khugepaged thread is the one causing all the problems. In the following stack trace khugepaged has the mmap_sem semaphore locked and is currently trying to allocate and compact pages. It has found a page that is currently locked due to I/O and khugepaged is waiting for the I/O to complete and the page to be unlocked. Why has it been blocked for 2 minutes? There could be severe I/O congestion holding up the page or it could be locked by another process. Note that a hugepage is not just one page but 512 pages and waiting for that many pages to be unlocked means that the mmap_sem can be held for a long time and that can hold up other processes and then they'll report being blocked too. INFO: task khugepaged:212 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. khugepaged D 0000000000000000 4296 212 2 0x00000000 ffff880c0386f8e0 0000000000000046 0000000000000000 0000000000000007 0000000000000006 ffffffff81120a30 ffff8800581d6fd8 0000000100221784 ffff880c03880cc0 ffff880c0386ffd8 0000000000010608 ffff880c03880cc0 Call Trace: [<ffffffff81120a30>] ? sync_page+0x0/0x60 [<ffffffff81120a30>] ? sync_page+0x0/0x60 [<ffffffff814f8d63>] io_schedule+0x73/0xc0 [<ffffffff81120a74>] sync_page+0x44/0x60 [<ffffffff814f94ea>] __wait_on_bit_lock+0x5a/0xc0 [<ffffffff81120a07>] __lock_page+0x67/0x70 [<ffffffff81096e50>] ? wake_bit_function+0x0/0x50 [<ffffffff811751a3>] lock_page+0x43/0x50 [<ffffffff81175938>] migrate_pages+0x6b8/0x6e0 [<ffffffff81169920>] ? compaction_alloc+0x0/0x370 [<ffffffff8116938d>] compact_zone+0x4cd/0x600 [<ffffffff810ae43d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff8116974e>] compact_zone_order+0x7e/0xb0 [<ffffffff811698a1>] try_to_compact_pages+0x121/0x1a0 [<ffffffff811345d7>] __alloc_pages_nodemask+0x5a7/0x8d0 [<ffffffff8128cf91>] ? debug_object_free+0xc1/0x140 [<ffffffff8101b145>] ? native_sched_clock+0x15/0x70 [<ffffffff8109d7cf>] ? cpu_clock+0x6f/0x80 [<ffffffff81167d84>] alloc_pages_vma+0x84/0x110 [<ffffffff814fa31a>] ? down_write+0x9a/0xb0 [<ffffffff811811a9>] ? khugepaged+0x9d9/0x1320 [<ffffffff811813a8>] khugepaged+0xbd8/0x1320 [<ffffffff814fb8c0>] ? _spin_unlock_irq+0x30/0x40 [<ffffffff81096e10>] ? autoremove_wake_function+0x0/0x40 [<ffffffff811807d0>] ? khugepaged+0x0/0x1320 [<ffffffff81096ac6>] kthread+0x96/0xa0 [<ffffffff810142ca>] child_rip+0xa/0x20 [<ffffffff81013c10>] ? restore_args+0x0/0x30 [<ffffffff81096a30>] ? kthread+0x0/0xa0 [<ffffffff810142c0>] ? child_rip+0x0/0x20 1 lock held by khugepaged/212: #0: (&mm->mmap_sem){++++++}, at: [<ffffffff811811a9>] khugepaged+0x9d9/0x1320 This IOInterceptor process is trying to acquire the mmap_sem semaphore so is probably stuck behind khugepaged: INFO: task IOInterceptor.u:6499 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. IOInterceptor D 00000000ffffffff 3872 6499 6461 0x00000080 ffff880c065ffde8 0000000000000046 0000000000000000 ffff880c04bdc240 0000000000000000 0000000000000007 ffff8800593d6fd8 0000000100221799 ffff880c04bdc800 ffff880c065fffd8 0000000000010608 ffff880c04bdc800 Call Trace: [<ffffffff814fb10d>] rwsem_down_failed_common+0x8d/0x1d0 [<ffffffff814fb273>] rwsem_down_write_failed+0x23/0x30 [<ffffffff811811a9>] ? khugepaged+0x9d9/0x1320 [<ffffffff812875d3>] call_rwsem_down_write_failed+0x13/0x20 [<ffffffff814fa30e>] ? down_write+0x8e/0xb0 [<ffffffff811427fc>] ? sys_mmap_pgoff+0x5c/0x2a0 [<ffffffff811427fc>] sys_mmap_pgoff+0x5c/0x2a0 [<ffffffff814fb3d2>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff810182b9>] sys_mmap+0x29/0x30 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b 1 lock held by IOInterceptor.u/6499: #0: (&mm->mmap_sem){++++++}, at: [<ffffffff811427fc>] sys_mmap_pgoff+0x5c/0x2a0 These three IOInterceptor processes are also trying to acquire the mmap_sem semaphore so are probably also stuck behind khugepaged: INFO: task IOInterceptor.u:20167 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. IOInterceptor D 0000000000000000 4880 20167 6461 0x00000080 ffff880acc8b7b60 0000000000000046 0000000000000000 ffff880102481100 0000000000000000 0000000000000007 ffff8800587d6fd8 0000000100221ba4 ffff8801024816c0 ffff880acc8b7fd8 0000000000010608 ffff8801024816c0 Call Trace: [<ffffffff814fb10d>] rwsem_down_failed_common+0x8d/0x1d0 [<ffffffff8101b145>] ? native_sched_clock+0x15/0x70 [<ffffffff8101a739>] ? sched_clock+0x9/0x10 [<ffffffff8101b145>] ? native_sched_clock+0x15/0x70 [<ffffffff814fb2a6>] rwsem_down_read_failed+0x26/0x30 [<ffffffff812875a4>] call_rwsem_down_read_failed+0x14/0x30 [<ffffffff814fa3ca>] ? down_read+0x9a/0xa0 [<ffffffffa0344438>] ? fuse_copy_fill+0x98/0x1f0 [fuse] [<ffffffff814fb97b>] ? _spin_unlock+0x2b/0x40 [<ffffffffa0344438>] fuse_copy_fill+0x98/0x1f0 [fuse] [<ffffffffa03445cf>] fuse_copy_one+0x3f/0x70 [fuse] [<ffffffffa034552e>] fuse_dev_read+0x23e/0x320 [fuse] [<ffffffff81188a7a>] do_sync_read+0xfa/0x140 [<ffffffff810adaed>] ? lock_release_holdtime+0x3d/0x190 [<ffffffff81096e10>] ? autoremove_wake_function+0x0/0x40 [<ffffffff8109d71d>] ? sched_clock_cpu+0xcd/0x110 [<ffffffff810aa7ad>] ? trace_hardirqs_off+0xd/0x10 [<ffffffff810adaed>] ? lock_release_holdtime+0x3d/0x190 [<ffffffff8121f7c6>] ? security_file_permission+0x16/0x20 [<ffffffff811894a5>] vfs_read+0xb5/0x1a0 [<ffffffff81189ff6>] ? fget_light+0x66/0x100 [<ffffffff811895e1>] sys_read+0x51/0x90 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b 1 lock held by IOInterceptor.u/20167: #0: (&mm->mmap_sem){++++++}, at: [<ffffffffa0344438>] fuse_copy_fill+0x98/0x1f0 [fuse] INFO: task IOInterceptor.u:20201 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. IOInterceptor D 00000000ffffffff 5648 20201 6461 0x00000080 ffff880120927b60 0000000000000046 0000000000000000 ffff8804c6c248c0 0000000000000000 0000000000000007 ffff88005a3d6fd8 0000000100221b92 ffff8804c6c24e80 ffff880120927fd8 0000000000010608 ffff8804c6c24e80 Call Trace: [<ffffffff814fb10d>] rwsem_down_failed_common+0x8d/0x1d0 [<ffffffff8101b145>] ? native_sched_clock+0x15/0x70 [<ffffffff8101a739>] ? sched_clock+0x9/0x10 [<ffffffff8101b145>] ? native_sched_clock+0x15/0x70 [<ffffffff814fb2a6>] rwsem_down_read_failed+0x26/0x30 [<ffffffff812875a4>] call_rwsem_down_read_failed+0x14/0x30 [<ffffffff814fa3ca>] ? down_read+0x9a/0xa0 [<ffffffffa0344438>] ? fuse_copy_fill+0x98/0x1f0 [fuse] [<ffffffff814fb97b>] ? _spin_unlock+0x2b/0x40 [<ffffffffa0344438>] fuse_copy_fill+0x98/0x1f0 [fuse] [<ffffffffa03445cf>] fuse_copy_one+0x3f/0x70 [fuse] [<ffffffffa034552e>] fuse_dev_read+0x23e/0x320 [fuse] [<ffffffff81188a7a>] do_sync_read+0xfa/0x140 [<ffffffff810adaed>] ? lock_release_holdtime+0x3d/0x190 [<ffffffff81096e10>] ? autoremove_wake_function+0x0/0x40 [<ffffffff8109d71d>] ? sched_clock_cpu+0xcd/0x110 [<ffffffff810aa7ad>] ? trace_hardirqs_off+0xd/0x10 [<ffffffff810adaed>] ? lock_release_holdtime+0x3d/0x190 [<ffffffff8121f7c6>] ? security_file_permission+0x16/0x20 [<ffffffff811894a5>] vfs_read+0xb5/0x1a0 [<ffffffff81189ff6>] ? fget_light+0x66/0x100 [<ffffffff811895e1>] sys_read+0x51/0x90 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b 1 lock held by IOInterceptor.u/20201: #0: (&mm->mmap_sem){++++++}, at: [<ffffffffa0344438>] fuse_copy_fill+0x98/0x1f0 [fuse] INFO: task IOInterceptor.u:20377 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. IOInterceptor D 00000000ffffffff 3920 20377 6461 0x00000080 ffff8806f94b3b60 0000000000000046 0000000000000000 ffff8806fcb7c140 0000000000000000 0000000000000007 ffff880058dd6fd8 0000000100221ba6 ffff8806fcb7c700 ffff8806f94b3fd8 0000000000010608 ffff8806fcb7c700 Call Trace: [<ffffffff814fb10d>] rwsem_down_failed_common+0x8d/0x1d0 [<ffffffff8101b145>] ? native_sched_clock+0x15/0x70 [<ffffffff8101a739>] ? sched_clock+0x9/0x10 [<ffffffff8101b145>] ? native_sched_clock+0x15/0x70 [<ffffffff814fb2a6>] rwsem_down_read_failed+0x26/0x30 [<ffffffff812875a4>] call_rwsem_down_read_failed+0x14/0x30 [<ffffffff814fa3ca>] ? down_read+0x9a/0xa0 [<ffffffffa0344438>] ? fuse_copy_fill+0x98/0x1f0 [fuse] [<ffffffff814fb97b>] ? _spin_unlock+0x2b/0x40 [<ffffffffa0344438>] fuse_copy_fill+0x98/0x1f0 [fuse] [<ffffffffa03445cf>] fuse_copy_one+0x3f/0x70 [fuse] [<ffffffffa034552e>] fuse_dev_read+0x23e/0x320 [fuse] [<ffffffff81188a7a>] do_sync_read+0xfa/0x140 [<ffffffff810adaed>] ? lock_release_holdtime+0x3d/0x190 [<ffffffff81096e10>] ? autoremove_wake_function+0x0/0x40 [<ffffffff8109d71d>] ? sched_clock_cpu+0xcd/0x110 [<ffffffff810aa7ad>] ? trace_hardirqs_off+0xd/0x10 [<ffffffff810adaed>] ? lock_release_holdtime+0x3d/0x190 [<ffffffff8121f7c6>] ? security_file_permission+0x16/0x20 [<ffffffff811894a5>] vfs_read+0xb5/0x1a0 [<ffffffff81189ff6>] ? fget_light+0x66/0x100 [<ffffffff811895e1>] sys_read+0x51/0x90 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b 1 lock held by IOInterceptor.u/20377: #0: (&mm->mmap_sem){++++++}, at: [<ffffffffa0344438>] fuse_copy_fill+0x98/0x1f0 [fuse] These two python processes are trying to acquire the mmap_sem semaphore so are probably stuck behind khugepaged: INFO: task python:7585 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. python D 0000000000000000 4608 7585 7576 0x00000080 ffff880c06e5bca0 0000000000000046 0000000000000000 ffff880bc534d640 0000000000000000 0000000000000007 ffff880058bd6fd8 00000001002233be ffff880bc534dc00 ffff880c06e5bfd8 0000000000010608 ffff880bc534dc00 Call Trace: [<ffffffff814fb10d>] rwsem_down_failed_common+0x8d/0x1d0 [<ffffffff8101b145>] ? native_sched_clock+0x15/0x70 [<ffffffff8101a739>] ? sched_clock+0x9/0x10 [<ffffffff8101b145>] ? native_sched_clock+0x15/0x70 [<ffffffff814fb2a6>] rwsem_down_read_failed+0x26/0x30 [<ffffffff812875a4>] call_rwsem_down_read_failed+0x14/0x30 [<ffffffff814fa3ca>] ? down_read+0x9a/0xa0 [<ffffffff8114d35d>] ? access_process_vm+0x4d/0x200 [<ffffffff8114d35d>] access_process_vm+0x4d/0x200 [<ffffffff811eef7d>] proc_pid_cmdline+0x6d/0x120 [<ffffffff81167cb7>] ? alloc_pages_current+0x87/0xd0 [<ffffffff811efdad>] proc_info_read+0xad/0xf0 [<ffffffff811894a5>] vfs_read+0xb5/0x1a0 [<ffffffff81189ff6>] ? fget_light+0x66/0x100 [<ffffffff811895e1>] sys_read+0x51/0x90 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b 1 lock held by python/7585: #0: (&mm->mmap_sem){++++++}, at: [<ffffffff8114d35d>] access_process_vm+0x4d/0x200 INFO: task python:7599 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. python D 0000000000000000 4608 7599 7576 0x00000080 ffff880bc18bbca0 0000000000000046 0000000000000000 ffff880c00850180 0000000000000000 0000000000000007 ffff880058bd6fd8 0000000100230158 ffff880c00850740 ffff880bc18bbfd8 0000000000010608 ffff880c00850740 Call Trace: [<ffffffff814fb10d>] rwsem_down_failed_common+0x8d/0x1d0 [<ffffffff8101b145>] ? native_sched_clock+0x15/0x70 [<ffffffff8101a739>] ? sched_clock+0x9/0x10 [<ffffffff8101b145>] ? native_sched_clock+0x15/0x70 [<ffffffff814fb2a6>] rwsem_down_read_failed+0x26/0x30 [<ffffffff812875a4>] call_rwsem_down_read_failed+0x14/0x30 [<ffffffff814fa3ca>] ? down_read+0x9a/0xa0 [<ffffffff8114d35d>] ? access_process_vm+0x4d/0x200 [<ffffffff8114d35d>] access_process_vm+0x4d/0x200 [<ffffffff811eef7d>] proc_pid_cmdline+0x6d/0x120 [<ffffffff81167cb7>] ? alloc_pages_current+0x87/0xd0 [<ffffffff811efdad>] proc_info_read+0xad/0xf0 [<ffffffff811894a5>] vfs_read+0xb5/0x1a0 [<ffffffff81189ff6>] ? fget_light+0x66/0x100 [<ffffffff811895e1>] sys_read+0x51/0x90 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b 1 lock held by python/7599: #0: (&mm->mmap_sem){++++++}, at: [<ffffffff8114d35d>] access_process_vm+0x4d/0x200 These three postgres processes are all blocked on an inode's mutex while trying to lseek on a file in a fuse filesystem: INFO: task postgres:11939 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. postgres D 0000000000000246 4304 11939 11672 0x00000084 ffff880ab924de08 0000000000000046 ffff880ab924dd68 ffffffff8101a739 ffff880ab924dd88 ffffffff8109d71d ffff880ab924dd88 ffff880b8d91c800 ffff880b8d91cdc0 ffff880ab924dfd8 0000000000010608 ffff880b8d91cdc0 Call Trace: [<ffffffff8101a739>] ? sched_clock+0x9/0x10 [<ffffffff8109d71d>] ? sched_clock_cpu+0xcd/0x110 [<ffffffff814f9cad>] ? __mutex_lock_common+0x24d/0x400 [<ffffffff814f9bff>] __mutex_lock_common+0x19f/0x400 [<ffffffffa034aca3>] ? fuse_file_llseek+0x43/0xe0 [fuse] [<ffffffffa034aca3>] ? fuse_file_llseek+0x43/0xe0 [fuse] [<ffffffff814f9f68>] mutex_lock_nested+0x48/0x60 [<ffffffffa034aca3>] fuse_file_llseek+0x43/0xe0 [fuse] [<ffffffff811877fa>] vfs_llseek+0x3a/0x40 [<ffffffff81188fa6>] sys_lseek+0x66/0x80 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b 1 lock held by postgres/11939: #0: (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffffa034aca3>] fuse_file_llseek+0x43/0xe0 [fuse] INFO: task postgres:11944 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. postgres D 00000000ffffffff 4032 11944 11652 0x00000084 ffff880b9df4de08 0000000000000046 0000000000000000 ffffffff8101a739 ffff880b9df4dd88 ffffffff8109d71d ffff880059dd6fd8 000000010022170b ffff880b95e68f80 ffff880b9df4dfd8 0000000000010608 ffff880b95e68f80 Call Trace: [<ffffffff8101a739>] ? sched_clock+0x9/0x10 [<ffffffff8109d71d>] ? sched_clock_cpu+0xcd/0x110 [<ffffffff814f9bff>] __mutex_lock_common+0x19f/0x400 [<ffffffffa034aca3>] ? fuse_file_llseek+0x43/0xe0 [fuse] [<ffffffff8109d7cf>] ? cpu_clock+0x6f/0x80 [<ffffffffa034aca3>] ? fuse_file_llseek+0x43/0xe0 [fuse] [<ffffffff814f9f68>] mutex_lock_nested+0x48/0x60 [<ffffffffa034aca3>] fuse_file_llseek+0x43/0xe0 [fuse] [<ffffffff811877fa>] vfs_llseek+0x3a/0x40 [<ffffffff81188fa6>] sys_lseek+0x66/0x80 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b 1 lock held by postgres/11944: #0: (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffffa034aca3>] fuse_file_llseek+0x43/0xe0 [fuse] INFO: task postgres:11946 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. postgres D 0000000000000246 4304 11946 11639 0x00000084 ffff880a3b6b5e08 0000000000000046 ffff880a3b6b5d68 ffffffff8101a739 ffff880a3b6b5d88 ffffffff8109d71d ffff880a3b6b5d88 ffff880b8dcb4440 ffff880b8dcb4a00 ffff880a3b6b5fd8 0000000000010608 ffff880b8dcb4a00 Call Trace: [<ffffffff8101a739>] ? sched_clock+0x9/0x10 [<ffffffff8109d71d>] ? sched_clock_cpu+0xcd/0x110 [<ffffffff814f9cad>] ? __mutex_lock_common+0x24d/0x400 [<ffffffff814f9bff>] __mutex_lock_common+0x19f/0x400 [<ffffffffa034aca3>] ? fuse_file_llseek+0x43/0xe0 [fuse] [<ffffffffa034aca3>] ? fuse_file_llseek+0x43/0xe0 [fuse] [<ffffffff814f9f68>] mutex_lock_nested+0x48/0x60 [<ffffffffa034aca3>] fuse_file_llseek+0x43/0xe0 [fuse] [<ffffffff811877fa>] vfs_llseek+0x3a/0x40 [<ffffffff81188fa6>] sys_lseek+0x66/0x80 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b 1 lock held by postgres/11946: #0: (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffffa034aca3>] fuse_file_llseek+0x43/0xe0 [fuse] It's not clear why these three postgres processes are stuck on inode mutexes. I think the first one is stuck behind this process: PID: 11714 TASK: ffff880ad9631240 CPU: 15 COMMAND: "postgres" #0 [ffff880903841cc8] schedule at ffffffff814f852d #1 [ffff880903841d90] wait_answer_interruptible at ffffffffa0344cb1 #2 [ffff880903841e00] fuse_request_send at ffffffffa0344eeb #3 [ffff880903841e70] fuse_fsync_common at ffffffffa034be6a #4 [ffff880903841ed0] fuse_fsync at ffffffffa034bed0 #5 [ffff880903841ee0] vfs_fsync_range at ffffffff811b7fa3 #6 [ffff880903841f30] vfs_fsync at ffffffff811b804d #7 [ffff880903841f40] do_fsync at ffffffff811b808e #8 [ffff880903841f70] sys_fsync at ffffffff811b80e0 #9 [ffff880903841f80] system_call_fastpath at ffffffff81013172 RIP: 00007f136424ea30 RSP: 00007fff30f9c608 RFLAGS: 00000246 RAX: 000000000000004a RBX: ffffffff81013172 RCX: 0000000000000000 RDX: 0000000002302000 RSI: 00000000000000ae RDI: 0000000000000038 RBP: 00000000022e6178 R8: 0000000000000000 R9: 000000004ce4f9ab R10: 00000000bca7d16a R11: 0000000000000246 R12: ffffffff811b80e0 R13: ffff880903841f78 R14: 0000000000000fad R15: 0000000000000000 ORIG_RAX: 000000000000004a CS: 0033 SS: 002b Which appears to be waiting for something fuse-related to respond and based on the last run times that process has been waiting quite a while. So it definitely looks like we have a problem with khugepaged blocking on page locks when it shouldn't be and holding up many other processes. We may also have a secondary problem with fuse not responding but that could be a further consequence of the khugepaged problem. Discussion about the issue: http://groups.google.com/group/fa.linux.kernel/browse_thread/thread/465007e8e5bfee15 Posable patch: http://git.kernel.org/?p=linux/kernel/git/andrea/aa.git;a=commitdiff;h=06e5d52f1815848da1647c8021150db037cf366e