| Summary: | self-heal hangs | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Raghavendra Bhat <rabhat> |
| Component: | replicate | Assignee: | Pranith Kumar K <pkarampu> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |
| Severity: | high | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.2.0 | CC: | vbellur |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | --- | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Raghavendra Bhat
2011-05-26 12:05:29 UTC
This is the one reported in community.gluster.com:
I've been able to duplicate it a few times now. I'm running 4 v3.2.0-1 servers in a distributed-replicated setup. I'm testing from 2 clients. One is one of the gls servers, and the other is some other host on the same network.
Here's my basic setup.
host1:/var/gluster/data <-> host2:/var/gluster/data in one replicated pair.
host3:/var/gluster/data <-> host4:/var/gluster/data in one replicated pair.
Mount point on the clients = /mnt/gluster
Clients: host1, test1
Client type: gluster native cilent
OS: debian linux
kernel: 2.6.32 for all hosts
To test:
*umount /mnt/gluster on host1, stop its glusterd, kill its glusterfs procs.
*rm -rf /var/gluster/data/test_dir
*start glusterd, mount -t glusterfs host1:/vol /mnt/gluster
*Then from host1, perform a find /mnt/gluster -type f to trigger autoheal.
*At the same time, from test1, perform the same find command, because in production, multiple clients may be performing concurrent finds, even while self-healing.
Usually this isn't a problem, but 3 times now in the past 5 of testing, I've seen the finds hang. There are no obvious error messages in the logs. The clients just seem to be waiting for something. Uptime and top both don't show spikes in loads. It looks to me like the entire host is just blocked and waiting.
To recover from this, on host1, umount /mnt/gluster -f. Once this is done, the find on host1 dies, and the find on test1 continues. I actually need to umount a second time on host1 to really unmount it, because if I don't, then df -h shows transport not connected.
from host1, on glusterfs proc:
Here is strace output when a client hangs:
epoll_wait(3, {{EPOLLIN, {u32=7, u64=17179869191}}}, 257, 4294967295) = 1
readv(7, [{"\200\0\0 "..., 4}], 1) = 4
readv(7, [{"\0\16\254D\0\0\0\1"..., 8}], 1) = 8
readv(7, [{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\377\377\377\377\0\0\0\v"..., 24}], 1) = 24
writev(7, [{"\200\0\0\354"..., 4}, {"\0\16\254F\0\0\0\0\0\0\0\2\0\23\320\5\0\0\0016\0\0\0\35\0\0\0\5\0\0\0X\0"..., 128}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\2\0\0\0\1\0\0\0\1\0\0\0\0\177"..., 108}], 3) = 240
epoll_wait(3, {{EPOLLIN, {u32=7, u64=17179869191}}}, 257, 4294967295) = 1
readv(7, [{"\200\0\0 "..., 4}], 1) = 4
readv(7, [{"\0\16\254E\0\0\0\1"..., 8}], 1) = 8
readv(7, [{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 24}], 1) = 24
epoll_wait(3, {{EPOLLIN, {u32=7, u64=17179869191}}}, 257, 4294967295) = 1
readv(7, [{"\200\0\0 "..., 4}], 1) = 4
readv(7, [{"\0\16\254G\0\0\0\1"..., 8}], 1) = 8
readv(7, [{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 24}], 1) = 24
epoll_wait(3, {{EPOLLIN, {u32=7, u64=17179869191}}}, 257, 4294967295) = 1
readv(7, [{"\200\0\0 "..., 4}], 1) = 4
readv(7, [{"\0\16\254H\0\0\0\1"..., 8}], 1) = 8
readv(7, [{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 24}], 1) = 24
epoll_wait(3, {{EPOLLIN, {u32=7, u64=17179869191}}}, 257, 4294967295) = 1
readv(7, [{"\200\0\0 "..., 4}], 1) = 4
readv(7, [{"\0\16\254I\0\0\0\1"..., 8}], 1) = 8
readv(7, [{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 24}], 1) = 24
epoll_wait(3,
from mnt-gluster.log
[2011-05-06 19:09:33.316525] W [dict.c:437:dict_ref] (-->/usr/lib/glusterfs/3.2.0/xlator/protocol/client.so(client3_1_fstat_cbk+0xf3) [0x7f8b7a2d7a93] (-->/usr/lib/glusterfs/3.2.0/xlator/cluster/replicate.so(afr_sh_data_fstat_cbk+0x17d) [0x7f8b7a08529d] (-->/usr/lib/glusterfs/3.2.0/xlator/cluster/replicate.so(afr_sh_data_fix+0x228) [0x7f8b7a084f68]))) 0-dict: dict is NULL
[2011-05-06 19:09:33.318185] I [afr-common.c:581:afr_lookup_collect_xattr] 0-data-replicate-1: data self-heal is pending for /test2/0/test_0.log.gz.
[2011-05-06 19:09:33.318351] I [dht-common.c:739:dht_lookup_everywhere_cbk] 0-data-dht: deleting stale linkfile /test2/0/test_0.log.gz on data-replicate-1
[2011-05-06 19:09:33.322903] I [client3_1-fops.c:1226:client3_1_inodelk_cbk] 0-data-client-3: remote operation failed: No such file or directory
[2011-05-06 19:09:33.322958] I [afr-lk-common.c:568:afr_unlock_inodelk_cbk] 0-data-replicate-1: /test2/0/test_0.log.gz: unlock failed No such file or directory
[2011-05-06 19:09:33.325567] I [client3_1-fops.c:1226:client3_1_inodelk_cbk] 0-data-client-2: remote operation failed: No such file or directory
[2011-05-06 19:09:33.325607] I [afr-lk-common.c:568:afr_unlock_inodelk_cbk] 0-data-replicate-1: /test2/0/test_0.log.gz: unlock failed No such file or directory
[2011-05-06 19:09:33.326709] I [client3_1-fops.c:1640:client3_1_setattr_cbk] 0-data-client-3: remote operation failed: No such file or directory
[2011-05-06 19:09:33.326750] I [afr-self-heal-data.c:106:afr_sh_data_flush_cbk] 0-data-replicate-1: flush or setattr failed on /test2/0/test_0.log.gz on subvolume data-client-3: No such file or directory
[2011-05-06 19:09:33.327193] I [client3_1-fops.c:1640:client3_1_setattr_cbk] 0-data-client-2: remote operation failed: No such file or directory
[2011-05-06 19:09:33.327225] I [afr-self-heal-data.c:106:afr_sh_data_flush_cbk] 0-data-replicate-1: flush or setattr failed on /test2/0/test_0.log.gz on subvolume data-client-2: No such file or directory
strace on the client machine when it hangs:
epoll_wait(3, {{EPOLLIN, {u32=11, u64=11}}}, 257, 4294967295) = 1
readv(11, [{"\200\1\0\210"..., 4}], 1) = 4
readv(11, [{"\0g\372\244\0\0\0\1"..., 8}], 1) = 8
readv(11, [{"\0\0\0\0"..., 4}], 1) = 4
readv(11, [{"\0\0\0\0\0\0\0\0"..., 8}], 1) = 8
readv(11, [{"\0\0\0\0"..., 4}], 1) = 4
readv(11, [{"\0\1\0\0\0\0\0\0\213\"]D\247-MH\237\345\245\2015-r\340\0\0\0\0\4\223\300\n\0"..., 112}], 1) = 112
readv(11, [{"'\355(lG\311\30\333w\243<\353]O\245\337H\360\v\363\257-?\331i89\5\201\267\362\\\263"..., 65536}], 1) = 38240
readv(11, 0xb26a88, 1) = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(3, {{EPOLLIN, {u32=11, u64=11}}}, 257, 4294967295) = 1
readv(11, [{"\3232~B\37\6=rq\31\377\256Q/\241\212\37\0329\213[i\315hnA\1\232M\351\323\250\251"..., 27296}], 1) = 8688
readv(11, 0xb26a88, 1) = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(3, {{EPOLLIN, {u32=11, u64=11}}}, 257, 4294967295) = 1
readv(11, [{"\364\303y\271\265S\263\2\251.\226m-\245\253\217\177\337\226\223q=\17>\247N\257\224\250J[\302W"..., 18608}], 1) = 18608
epoll_wait(3, {{EPOLLIN, {u32=11, u64=11}}}, 257, 4294967295) = 1
readv(11, [{"\200\1\0\210"..., 4}], 1) = 4
readv(11, [{"\0g\372\246\0\0\0\1"..., 8}], 1) = 8
readv(11, [{"\0\0\0\0"..., 4}], 1) = 4
readv(11, [{"\0\0\0\0\0\0\0\0"..., 8}], 1) = 8
readv(11, [{"\0\0\0\0"..., 4}], 1) = 4
readv(11, [{"\0\1\0\0\0\0\0\0\213\"]D\247-MH\237\345\245\2015-r\340\0\0\0\0\4\223\300\n\0"..., 112}], 1) = 112
readv(11, [{"U'i^\5\335\36\2778\356\235\34\5/\333\234\347ij2i#A\3008\216vK\300\251\307\371g"..., 65536}], 1) = 17452
readv(11, 0xb26a88, 1) = -1 EAGAIN (Resource temporarily unavailable)
It has been reproduced in house also with methods mentioned in the report.
PATCH: http://patches.gluster.com/patch/7305 in master (cluster/afr: Send Non-blocking lock in non-blocking entrylk) PATCH: http://patches.gluster.com/patch/7306 in master (features/locks: Unwind with SUCCESS on non-blocking entrylk success) PATCH: http://patches.gluster.com/patch/7318 in release-3.1 (cluster/afr: Send Non-blocking lock in non-blocking entrylk) PATCH: http://patches.gluster.com/patch/7319 in release-3.1 (features/locks: Unwind with SUCCESS on non-blocking entrylk success) PATCH: http://patches.gluster.com/patch/7314 in release-3.2 (cluster/afr: Send Non-blocking lock in non-blocking entrylk) PATCH: http://patches.gluster.com/patch/7315 in release-3.2 (features/locks: Unwind with SUCCESS on non-blocking entrylk success) The bug is observed because of a dead-lock between blocking lock-attempts. Blocking lock apis in afr avoid this situation by trying the lock on each brick one by one from "0 to child_count". Only non-blocking locks are attempted in parallel to all the nodes, but because they are not blocking a dead-lock wont happen. This dead-lock is observed because the entry-lk non-blocking api is trying a blocking lock inside. The fix is to try non-blocking lock inside non-blocking entrylk. Now find command on 2 glusterfs clients does not hang. PATCH: http://patches.gluster.com/patch/7269 in master (print the owner and the transport for the blocked locks in server statedump) |