Hide Forgot
This is due to the blocked locks issue. Since there are 2 clients one of the replicate clients will get lock on one of the servers and gets blocked on the other. The same case with the other client, which gets the lock on the 2nd server and gets blocked on the 1st server. This is the statedump information from 2 servers which shows the blocked locks. [xlator.protocol.server.conn.1.bound_xl./export/mirror.active.1] xlator.protocol.server.conn.1.bound_xl./export/mirror.active.1.gfid=92e77d0e-a64e-493b-9754-f6640e05711f xlator.protocol.server.conn.1.bound_xl./export/mirror.active.1.nlookup=7 xlator.protocol.server.conn.1.bound_xl./export/mirror.active.1.ref=3 xlator.protocol.server.conn.1.bound_xl./export/mirror.active.1.ino=1687888 xlator.protocol.server.conn.1.bound_xl./export/mirror.active.1.ia_type=2 xlator.feature.locks.inode.1687888.mandatory=0 xlator.feature.locks.entrylk-count.1687888.entrylk-count=2 xlator.feature.locks.lock-dump.domain.domain=mirror-replicate-0 xlator.feature.locks.lock-dump.domain.entrylk.entrylk[0](ACTIVE)= ENTRYLK_WRLCK on (null) owner=47614754272372, transport=0xc1ced00 xlator.feature.locks.lock-dump.domain.entrylk.entrylk[1](BLOCKED)= ENTRYLK_WRLCK on (null) owner=47940836762464, transport=0xc1cd220, state = Blocked [xlator.protocol.server.conn.1.bound_xl./export/mirror.active.8] xlator.protocol.server.conn.1.bound_xl./export/mirror.active.8.gfid=92e77d0e-a64e-493b-9754-f6640e05711f xlator.protocol.server.conn.1.bound_xl./export/mirror.active.8.nlookup=7 xlator.protocol.server.conn.1.bound_xl./export/mirror.active.8.ref=7 xlator.protocol.server.conn.1.bound_xl./export/mirror.active.8.ino=2326709 xlator.protocol.server.conn.1.bound_xl./export/mirror.active.8.ia_type=2 xlator.feature.locks.inode.2326709.mandatory=0 xlator.feature.locks.entrylk-count.2326709.entrylk-count=2 xlator.feature.locks.lock-dump.domain.domain=mirror-replicate-0 xlator.feature.locks.lock-dump.domain.entrylk.entrylk[0](ACTIVE)= ENTRYLK_WRLCK on (null) owner=47940836762464, transport=0x1ad5b0a0 xlator.feature.locks.lock-dump.domain.entrylk.entrylk[1](BLOCKED)= ENTRYLK_WRLCK on (null) owner=47614754272372, transport=0x1adb6900, state = Blocked
This is the one reported in community.gluster.com: I've been able to duplicate it a few times now. I'm running 4 v3.2.0-1 servers in a distributed-replicated setup. I'm testing from 2 clients. One is one of the gls servers, and the other is some other host on the same network. Here's my basic setup. host1:/var/gluster/data <-> host2:/var/gluster/data in one replicated pair. host3:/var/gluster/data <-> host4:/var/gluster/data in one replicated pair. Mount point on the clients = /mnt/gluster Clients: host1, test1 Client type: gluster native cilent OS: debian linux kernel: 2.6.32 for all hosts To test: *umount /mnt/gluster on host1, stop its glusterd, kill its glusterfs procs. *rm -rf /var/gluster/data/test_dir *start glusterd, mount -t glusterfs host1:/vol /mnt/gluster *Then from host1, perform a find /mnt/gluster -type f to trigger autoheal. *At the same time, from test1, perform the same find command, because in production, multiple clients may be performing concurrent finds, even while self-healing. Usually this isn't a problem, but 3 times now in the past 5 of testing, I've seen the finds hang. There are no obvious error messages in the logs. The clients just seem to be waiting for something. Uptime and top both don't show spikes in loads. It looks to me like the entire host is just blocked and waiting. To recover from this, on host1, umount /mnt/gluster -f. Once this is done, the find on host1 dies, and the find on test1 continues. I actually need to umount a second time on host1 to really unmount it, because if I don't, then df -h shows transport not connected. from host1, on glusterfs proc: Here is strace output when a client hangs: epoll_wait(3, {{EPOLLIN, {u32=7, u64=17179869191}}}, 257, 4294967295) = 1 readv(7, [{"\200\0\0 "..., 4}], 1) = 4 readv(7, [{"\0\16\254D\0\0\0\1"..., 8}], 1) = 8 readv(7, [{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\377\377\377\377\0\0\0\v"..., 24}], 1) = 24 writev(7, [{"\200\0\0\354"..., 4}, {"\0\16\254F\0\0\0\0\0\0\0\2\0\23\320\5\0\0\0016\0\0\0\35\0\0\0\5\0\0\0X\0"..., 128}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\2\0\0\0\1\0\0\0\1\0\0\0\0\177"..., 108}], 3) = 240 epoll_wait(3, {{EPOLLIN, {u32=7, u64=17179869191}}}, 257, 4294967295) = 1 readv(7, [{"\200\0\0 "..., 4}], 1) = 4 readv(7, [{"\0\16\254E\0\0\0\1"..., 8}], 1) = 8 readv(7, [{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 24}], 1) = 24 epoll_wait(3, {{EPOLLIN, {u32=7, u64=17179869191}}}, 257, 4294967295) = 1 readv(7, [{"\200\0\0 "..., 4}], 1) = 4 readv(7, [{"\0\16\254G\0\0\0\1"..., 8}], 1) = 8 readv(7, [{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 24}], 1) = 24 epoll_wait(3, {{EPOLLIN, {u32=7, u64=17179869191}}}, 257, 4294967295) = 1 readv(7, [{"\200\0\0 "..., 4}], 1) = 4 readv(7, [{"\0\16\254H\0\0\0\1"..., 8}], 1) = 8 readv(7, [{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 24}], 1) = 24 epoll_wait(3, {{EPOLLIN, {u32=7, u64=17179869191}}}, 257, 4294967295) = 1 readv(7, [{"\200\0\0 "..., 4}], 1) = 4 readv(7, [{"\0\16\254I\0\0\0\1"..., 8}], 1) = 8 readv(7, [{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 24}], 1) = 24 epoll_wait(3, from mnt-gluster.log [2011-05-06 19:09:33.316525] W [dict.c:437:dict_ref] (-->/usr/lib/glusterfs/3.2.0/xlator/protocol/client.so(client3_1_fstat_cbk+0xf3) [0x7f8b7a2d7a93] (-->/usr/lib/glusterfs/3.2.0/xlator/cluster/replicate.so(afr_sh_data_fstat_cbk+0x17d) [0x7f8b7a08529d] (-->/usr/lib/glusterfs/3.2.0/xlator/cluster/replicate.so(afr_sh_data_fix+0x228) [0x7f8b7a084f68]))) 0-dict: dict is NULL [2011-05-06 19:09:33.318185] I [afr-common.c:581:afr_lookup_collect_xattr] 0-data-replicate-1: data self-heal is pending for /test2/0/test_0.log.gz. [2011-05-06 19:09:33.318351] I [dht-common.c:739:dht_lookup_everywhere_cbk] 0-data-dht: deleting stale linkfile /test2/0/test_0.log.gz on data-replicate-1 [2011-05-06 19:09:33.322903] I [client3_1-fops.c:1226:client3_1_inodelk_cbk] 0-data-client-3: remote operation failed: No such file or directory [2011-05-06 19:09:33.322958] I [afr-lk-common.c:568:afr_unlock_inodelk_cbk] 0-data-replicate-1: /test2/0/test_0.log.gz: unlock failed No such file or directory [2011-05-06 19:09:33.325567] I [client3_1-fops.c:1226:client3_1_inodelk_cbk] 0-data-client-2: remote operation failed: No such file or directory [2011-05-06 19:09:33.325607] I [afr-lk-common.c:568:afr_unlock_inodelk_cbk] 0-data-replicate-1: /test2/0/test_0.log.gz: unlock failed No such file or directory [2011-05-06 19:09:33.326709] I [client3_1-fops.c:1640:client3_1_setattr_cbk] 0-data-client-3: remote operation failed: No such file or directory [2011-05-06 19:09:33.326750] I [afr-self-heal-data.c:106:afr_sh_data_flush_cbk] 0-data-replicate-1: flush or setattr failed on /test2/0/test_0.log.gz on subvolume data-client-3: No such file or directory [2011-05-06 19:09:33.327193] I [client3_1-fops.c:1640:client3_1_setattr_cbk] 0-data-client-2: remote operation failed: No such file or directory [2011-05-06 19:09:33.327225] I [afr-self-heal-data.c:106:afr_sh_data_flush_cbk] 0-data-replicate-1: flush or setattr failed on /test2/0/test_0.log.gz on subvolume data-client-2: No such file or directory strace on the client machine when it hangs: epoll_wait(3, {{EPOLLIN, {u32=11, u64=11}}}, 257, 4294967295) = 1 readv(11, [{"\200\1\0\210"..., 4}], 1) = 4 readv(11, [{"\0g\372\244\0\0\0\1"..., 8}], 1) = 8 readv(11, [{"\0\0\0\0"..., 4}], 1) = 4 readv(11, [{"\0\0\0\0\0\0\0\0"..., 8}], 1) = 8 readv(11, [{"\0\0\0\0"..., 4}], 1) = 4 readv(11, [{"\0\1\0\0\0\0\0\0\213\"]D\247-MH\237\345\245\2015-r\340\0\0\0\0\4\223\300\n\0"..., 112}], 1) = 112 readv(11, [{"'\355(lG\311\30\333w\243<\353]O\245\337H\360\v\363\257-?\331i89\5\201\267\362\\\263"..., 65536}], 1) = 38240 readv(11, 0xb26a88, 1) = -1 EAGAIN (Resource temporarily unavailable) epoll_wait(3, {{EPOLLIN, {u32=11, u64=11}}}, 257, 4294967295) = 1 readv(11, [{"\3232~B\37\6=rq\31\377\256Q/\241\212\37\0329\213[i\315hnA\1\232M\351\323\250\251"..., 27296}], 1) = 8688 readv(11, 0xb26a88, 1) = -1 EAGAIN (Resource temporarily unavailable) epoll_wait(3, {{EPOLLIN, {u32=11, u64=11}}}, 257, 4294967295) = 1 readv(11, [{"\364\303y\271\265S\263\2\251.\226m-\245\253\217\177\337\226\223q=\17>\247N\257\224\250J[\302W"..., 18608}], 1) = 18608 epoll_wait(3, {{EPOLLIN, {u32=11, u64=11}}}, 257, 4294967295) = 1 readv(11, [{"\200\1\0\210"..., 4}], 1) = 4 readv(11, [{"\0g\372\246\0\0\0\1"..., 8}], 1) = 8 readv(11, [{"\0\0\0\0"..., 4}], 1) = 4 readv(11, [{"\0\0\0\0\0\0\0\0"..., 8}], 1) = 8 readv(11, [{"\0\0\0\0"..., 4}], 1) = 4 readv(11, [{"\0\1\0\0\0\0\0\0\213\"]D\247-MH\237\345\245\2015-r\340\0\0\0\0\4\223\300\n\0"..., 112}], 1) = 112 readv(11, [{"U'i^\5\335\36\2778\356\235\34\5/\333\234\347ij2i#A\3008\216vK\300\251\307\371g"..., 65536}], 1) = 17452 readv(11, 0xb26a88, 1) = -1 EAGAIN (Resource temporarily unavailable) It has been reproduced in house also with methods mentioned in the report.
PATCH: http://patches.gluster.com/patch/7305 in master (cluster/afr: Send Non-blocking lock in non-blocking entrylk)
PATCH: http://patches.gluster.com/patch/7306 in master (features/locks: Unwind with SUCCESS on non-blocking entrylk success)
PATCH: http://patches.gluster.com/patch/7318 in release-3.1 (cluster/afr: Send Non-blocking lock in non-blocking entrylk)
PATCH: http://patches.gluster.com/patch/7319 in release-3.1 (features/locks: Unwind with SUCCESS on non-blocking entrylk success)
PATCH: http://patches.gluster.com/patch/7314 in release-3.2 (cluster/afr: Send Non-blocking lock in non-blocking entrylk)
PATCH: http://patches.gluster.com/patch/7315 in release-3.2 (features/locks: Unwind with SUCCESS on non-blocking entrylk success)
The bug is observed because of a dead-lock between blocking lock-attempts. Blocking lock apis in afr avoid this situation by trying the lock on each brick one by one from "0 to child_count". Only non-blocking locks are attempted in parallel to all the nodes, but because they are not blocking a dead-lock wont happen. This dead-lock is observed because the entry-lk non-blocking api is trying a blocking lock inside. The fix is to try non-blocking lock inside non-blocking entrylk.
Now find command on 2 glusterfs clients does not hang.
PATCH: http://patches.gluster.com/patch/7269 in master (print the owner and the transport for the blocked locks in server statedump)