Bug 764964 (GLUSTER-3232)
Summary: | deadlock related to transparent hugepage migration in kernels >= 2.6.32 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Joe Julian <joe> | ||||||
Component: | fuse | Assignee: | Anand Avati <aavati> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||||
Severity: | low | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | mainline | CC: | amarts, chrisw, chyd96, csaba, daniel.ortiz, fharshav, gluster-bugs, jdarcy, pasteur, toracat | ||||||
Target Milestone: | --- | Keywords: | Triaged | ||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | glusterfs-3.4.0 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 848331 (view as bug list) | Environment: | |||||||
Last Closed: | 2013-07-24 17:22:06 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | fuse | ||||||
Documentation: | DP | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 848331 | ||||||||
Attachments: |
|
Description
Joe Julian
2011-07-22 19:37:22 UTC
These are the server dumpfiles. The client won't dump anything on SIGUSR1. It just creates an empty file. http://joejulian.name/scratch/serverdumps.tar.bz2 full loglevel=TRACE log: http://joejulian.name/scratch/home-log-level-trace.log.bz2 I'm encountering a lockup problem when reading large numbers of files. I cannot break out of the race in gdb, a ps will lock up when it tries to read that process' data, df (of course) locks up. No kill signals have any effect. The only way out of it is to umount -f. Could this be bug 764743? Need 'volume info' and the type of operation which is being done on the volume. Would help to re-create the issue. -Amar Joe, is the issue happening to you on latest release-3.1 branch ? Should I be closing the bug? release-3.1 has not been tested directly. I have cherry-picked patches into 3.1.5 as have been suggested by yourself and Anand Avati. We have isolated the lockup to something in io-cache, but not fixed it. I currently have io-cache disabled in production. Avati has logged on to the system with the lockup to see the problem first hand. Joe, is this happening now? Let me know if the latest release has worked fine for you. I would like to close the bug if This issue is not happening anymore. I won't be able to test it until Tuesday night. I've been running with io-cache off ever since (per avati) and the only patches to that translator has been to change the License. Yes, with 3.1.7 I still get the same hard freeze of the process if io-cache is enabled. Created attachment 704 Created attachment 705 I'm going to go with the assumption that I've found a new way to duplicate the problem. This happens regardless of the translators used. I'm suspicious that the disabling of io-cache simply changes some unrelated dynamic. I've upgraded one of my VM hosts to CentOS 6.0. Without any specific sequence of events, I'm finding that the client for my vm volume keeps hanging. Killing with USR1 still produces a 0-length state file, touching the process in any way hangs the shell to the point where it cannot be interrupted (ps, cat'ing anything under the pid in /proc, ls of the mountpoint). The glusterfs process (as near as I can tell) is in the D state. Attached is an strace going back as far as I had screen buffer. The very last line happened when I did an ls of the mountpoint after a vmimage had frozen. When frozen, the glusterfs process will not respond to any signal but USR1. The attached backtrace may or may not have anything useful. I hit crtl-c in gdb when the hang happened, but it wouldn't interrupt. When I umount -f the volume, that's when I was able to break out and get this backtrace. Volume Name: vmimages Type: Distributed-Replicate Status: Started Number of Bricks: 4 x 3 = 12 Transport-type: tcp Bricks: Brick1: ewcs2:/var/spool/glusterfs/a_vmimages Brick2: ewcs4:/var/spool/glusterfs/a_vmimages Brick3: ewcs7:/var/spool/glusterfs/a_vmimages Brick4: ewcs2:/var/spool/glusterfs/b_vmimages Brick5: ewcs4:/var/spool/glusterfs/b_vmimages Brick6: ewcs7:/var/spool/glusterfs/b_vmimages Brick7: ewcs2:/var/spool/glusterfs/c_vmimages Brick8: ewcs4:/var/spool/glusterfs/c_vmimages Brick9: ewcs7:/var/spool/glusterfs/c_vmimages Brick10: ewcs2:/var/spool/glusterfs/d_vmimages Brick11: ewcs4:/var/spool/glusterfs/d_vmimages Brick12: ewcs7:/var/spool/glusterfs/d_vmimages dmesg had this: INFO: task glusterfs:7811 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. glusterfs D ffffffff8110c330 0 7811 7810 0x00000081 ffff8802239c39b8 0000000000000086 ffff8802239c3948 ffffffff81447a6c ffff8801bdf2f600 ffffea00063a5c00 000000000000000e ffff8800283138c0 ffff8800bd8705f8 ffff8802239c3fd8 0000000000010518 ffff8800bd8705f8 Call Trace: [<ffffffff81447a6c>] ? ip_finish_output+0x13c/0x310 [<ffffffff8109bba9>] ? ktime_get_ts+0xa9/0xe0 [<ffffffff8110c330>] ? sync_page+0x0/0x50 [<ffffffff814c9a53>] io_schedule+0x73/0xc0 [<ffffffff8110c36d>] sync_page+0x3d/0x50 [<ffffffff814ca17a>] __wait_on_bit_lock+0x5a/0xc0 [<ffffffff8110c307>] __lock_page+0x67/0x70 [<ffffffff81091ee0>] ? wake_bit_function+0x0/0x50 [<ffffffff81122781>] ? lru_cache_add_lru+0x21/0x40 [<ffffffff8115bf10>] lock_page+0x30/0x40 [<ffffffff8115c58d>] migrate_pages+0x59d/0x5d0 [<ffffffff811226d7>] ? ____pagevec_lru_add+0x167/0x180 [<ffffffff81152b20>] ? compaction_alloc+0x0/0x370 [<ffffffff811525cc>] compact_zone+0x4cc/0x600 [<ffffffff8111cffc>] ? get_page_from_freelist+0x15c/0x820 [<ffffffff8115297e>] compact_zone_order+0x7e/0xb0 [<ffffffff81152ab9>] try_to_compact_pages+0x109/0x170 [<ffffffff8111e99d>] __alloc_pages_nodemask+0x5ed/0x850 [<ffffffff810c6b88>] ? start_callback+0xb8/0xd0 [<ffffffff810c6a35>] ? finish_callback+0xa5/0x140 [<ffffffff810c8058>] ? finish_report+0x78/0xe0 [<ffffffff81150db3>] alloc_pages_vma+0x93/0x150 [<ffffffff81167f15>] do_huge_pmd_anonymous_page+0x135/0x340 [<ffffffff810c71dc>] ? utrace_stop+0x12c/0x1e0 [<ffffffff811367c5>] handle_mm_fault+0x245/0x2b0 [<ffffffff814ce503>] do_page_fault+0x123/0x3a0 [<ffffffff814cbf75>] page_fault+0x25/0x30 I've found the problem. There's an issue with khugepaged and it's interaction with userspace filesystems. http://kerneltrap.org/mailarchive/linux-kernel/2010/11/4/4641128/thread I was able to work around it by disabling THP entirely: echo never> /sys/kernel/mm/redhat_transparent_hugepage/enabled Someone else on the IRC channel said he was able to just disable hugepage defrag though: echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag echo no > /sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag Apparently debian has a different setting for /sys/kernel/mm/transparent_hugepage/enabled, madvise, that is the default (also it's a different pathname, if you're planning around my next suggestion). I'm not even sure if this is something that you even can program for. If not, maybe add a check for THP and issue a warning? Also related: https://bugzilla.redhat.com/show_bug.cgi?id=647849 https://bugzilla.redhat.com/show_bug.cgi?id=669418 Currently, as the scope of fix is outside the code of GlusterFS, marking it as an 'enhancement' severity. Joe, reducing the priority of the bug to 'medium', as this issue is not seen in higher version of kernel. (In reply to comment #17) > Joe, reducing the priority of the bug to 'medium', as this issue is not seen > in higher version of kernel. Which kernel? I'm still seeing it with the latest RHEL6. I'm seeing it in CentOS 6.0 (2.6.32-71.el6.x86_64) is this issue still valid? Joe? Yes, it still occurs with the latest kernel in CentOS 6.2. Do you mean it still happens without the workaround, or even with the workaround in place? kernel-2.6.32-220.17.1.el6.x86_64 with transparent hugepages enabled (specifically the transparent ones) still cause lock races with any fuse based filesystem. disabling THP is still a valid work-around. Joe, is there a reproducible case for this? like any sort of special applications? Odd, I thought I had posted an update to this quite some time ago. At least as of 2.6.32-220.23.1.el6.x86_64 it is no longer a problem. RHEL 6.4 has the following fixes - quite possibly fixing the THP issue from RHEL 6.3 ========= - [mm] compaction: clear PG_migrate_skip based on compaction and reclaim activity (Rik van Riel) [713546 783248] - [mm] compaction: fix bit ranges in {get,clear,set}_pageblock_skip() (Rik van Riel) [713546 783248] - [mm] compaction: Restart compaction from near where it left off (Rik van Riel) [713546 783248] - [mm] compaction: Cache if a pageblock was scanned and no pages were isolated (Rik van Riel) [713546 783248] - [mm] compaction: Abort compaction loop if lock is contended or run too long (Rik van Riel) [713546 783248] - [mm] compaction: Abort async compaction if locks are contended or taking too long (Rik van Riel) [713546 783248] - [mm] compaction: introduce sync-light migration for use by compaction (Rik van Riel) [713546 783248] - [mm] compaction: allow compaction to isolate dirty pages (Rik van Riel) [713546 783248] - [mm] compaction: make isolate_lru_page() filter-aware again (Rik van Riel) [713546 783248] - [mm] compaction: make isolate_lru_page() filter-aware (Rik van Riel) [713546 783248] - [mm] compaction: determine if dirty pages can be migrated without blocking within ->migratepage (Rik van Riel) [713546 783248] - [mm] compaction: use synchronous compaction for /proc/sys/vm/compact_memory (Rik van Riel) [713546 783248] ========= There is a KB solution article related to this bug report: https://access.redhat.com/site/solutions/362804 |