Description of problem: ====================== There were two bricks down and I started creating files in below fashion c1:under dir1 touch of f{1..10000} c2:under / touch x{1..1000} c3: after c2 action completed, started touch x{1001..10000} I then brought up all the bricks which were down (2 in each of the two subvols) then while c1 and c3 were happening, on c2 i did a move of all the x* into dir1 at this time the mv was hung for ever After all files were written and healing completed, I did a ls on / from c1 and c3 they are also not responding Version-Release number of selected component (if applicable): ======= 3.8.4-5 [root@dhcp35-37 ~]# gluster v info disperse Volume Name: disperse Type: Distributed-Disperse Volume ID: 9620f72c-f8bb-4e05-98e4-d1a2132db8a7 Status: Started Snapshot Count: 2 Number of Bricks: 2 x (4 + 2) = 12 Transport-type: tcp Bricks: Brick1: 10.70.35.37:/rhs/brick2/disperse Brick2: 10.70.35.116:/rhs/brick2/disperse Brick3: 10.70.35.239:/rhs/brick2/disperse Brick4: 10.70.35.135:/rhs/brick2/disperse Brick5: 10.70.35.8:/rhs/brick2/disperse Brick6: 10.70.35.196:/rhs/brick2/disperse Brick7: 10.70.35.37:/rhs/brick3/disperse Brick8: 10.70.35.116:/rhs/brick3/disperse Brick9: 10.70.35.239:/rhs/brick3/disperse Brick10: 10.70.35.135:/rhs/brick3/disperse Brick11: 10.70.35.8:/rhs/brick3/disperse Brick12: 10.70.35.196:/rhs/brick3/disperse Options Reconfigured: features.barrier: disable features.uss: enable features.quota-deem-statfs: on features.inode-quota: on features.quota: on diagnostics.count-fop-hits: on diagnostics.latency-measurement: on transport.address-family: inet performance.readdir-ahead: on nfs.disable: on [root@dhcp35-37 ~]# [root@dhcp35-37 ~]# gluster v status disperse Status of volume: disperse Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.37:/rhs/brick2/disperse 49153 0 Y 32685 Brick 10.70.35.116:/rhs/brick2/disperse 49152 0 Y 13804 Brick 10.70.35.239:/rhs/brick2/disperse 49152 0 Y 6677 Brick 10.70.35.135:/rhs/brick2/disperse 49153 0 Y 6698 Brick 10.70.35.8:/rhs/brick2/disperse 49153 0 Y 6617 Brick 10.70.35.196:/rhs/brick2/disperse 49153 0 Y 6700 Brick 10.70.35.37:/rhs/brick3/disperse 49154 0 Y 32704 Brick 10.70.35.116:/rhs/brick3/disperse 49154 0 Y 13823 Brick 10.70.35.239:/rhs/brick3/disperse 49154 0 Y 6696 Brick 10.70.35.135:/rhs/brick3/disperse 49154 0 Y 6717 Brick 10.70.35.8:/rhs/brick3/disperse 49154 0 Y 6636 Brick 10.70.35.196:/rhs/brick3/disperse 49154 0 Y 6719 Snapshot Daemon on localhost 49155 0 Y 13153 Self-heal Daemon on localhost N/A N/A Y 23366 Quota Daemon on localhost N/A N/A Y 13090 Snapshot Daemon on 10.70.35.8 49155 0 Y 20959 Self-heal Daemon on 10.70.35.8 N/A N/A Y 32233 Quota Daemon on 10.70.35.8 N/A N/A Y 20907 Snapshot Daemon on 10.70.35.116 49155 0 Y 26181 Self-heal Daemon on 10.70.35.116 N/A N/A Y 2799 Quota Daemon on 10.70.35.116 N/A N/A Y 26127 Snapshot Daemon on 10.70.35.239 49155 0 Y 21107 Self-heal Daemon on 10.70.35.239 N/A N/A Y 32373 Quota Daemon on 10.70.35.239 N/A N/A Y 21056 Snapshot Daemon on 10.70.35.196 49155 0 Y 21087 Self-heal Daemon on 10.70.35.196 N/A N/A Y 32376 Quota Daemon on 10.70.35.196 N/A N/A Y 21036 Snapshot Daemon on 10.70.35.135 49155 0 Y 21068 Self-heal Daemon on 10.70.35.135 N/A N/A Y 32324 Quota Daemon on 10.70.35.135 N/A N/A Y 21018 Task Status of Volume disperse ------------------------------------------------------------------------------ Task : Rebalance ID : e401b7fd-1ef6-4d1f-a818-228cba36a128 Status : completed [root@dhcp35-37 ~]# kill -9 32685 [root@dhcp35-37 ~]# kill -9 32704 [root@dhcp35-37 ~]# gluster v status|grep disp|grep 37 Volume halo is not started Brick 10.70.35.37:/rhs/brick2/disperse N/A N/A N N/A Brick 10.70.35.37:/rhs/brick3/disperse N/A N/A N N/A [root@dhcp35-37 ~]# gluster v heal disperse info|grep ntries Number of entries: 0 Number of entries: 0 Number of entries: 0 Number of entries: 0 Number of entries: 0 Number of entries: 0 Number of entries: 0 Number of entries: 0 Number of entries: 0 Number of entries: 0 Number of entries: 0 Number of entries: 0 [root@dhcp35-37 ~]# [root@dhcp35-37 ~]# [root@dhcp35-37 ~]# [root@dhcp35-37 ~]#
the client log where mv got hung was as below [2016-11-30 12:18:40.514765] W [MSGID: 122015] [ec-common.c:1774:ec_unlocked] 3-disperse-disperse-0: entry/inode unlocking failed (RENAME) [Invalid argument] [2016-11-30 12:18:40.534720] W [MSGID: 122053] [ec-common.c:116:ec_check_status] 3-disperse-disperse-1: Operation failed on some subvolumes (up=3F, mask=2E, remaining=0, good=2E, bad=11) [2016-11-30 12:18:40.538990] W [MSGID: 122053] [ec-common.c:116:ec_check_status] 3-disperse-disperse-0: Operation failed on some subvolumes (up=3F, mask=2E, remaining=0, good=2E, bad=11) [2016-11-30 12:18:32.135232] W [MSGID: 122002] [ec-common.c:71:ec_heal_report] 3-disperse-disperse-1: Heal failed [Invalid argument] The message "W [MSGID: 122002] [ec-common.c:71:ec_heal_report] 3-disperse-disperse-0: Heal failed [Invalid argument]" repeated 3 times between [2016-11-30 12:18:32.108297] and [2016-11-30 12:18:32.135236]
sosreports [qe@rhsqe-repo nchilaka]$ pwd /var/www/html/sosreports/nchilaka [qe@rhsqe-repo nchilaka]$ hostname rhsqe-repo.lab.eng.blr.redhat.com [qe@rhsqe-repo nchilaka]$ cd bug.1400093 [qe@rhsqe-repo bug.1400093]$ ls clients servers [qe@rhsqe-repo bug.1400093]$
This bug is easily reproducible on plain dispersed volume too without killing any brick. [root@apandey gluster]# gvi Volume Name: vol Type: Disperse Volume ID: c3e903e0-e7b5-42a3-9e75-798c4e3268a0 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (4 + 2) = 6 Transport-type: tcp Bricks: Brick1: apandey:/brick/gluster/vol-1 Brick2: apandey:/brick/gluster/vol-2 Brick3: apandey:/brick/gluster/vol-3 Brick4: apandey:/brick/gluster/vol-4 Brick5: apandey:/brick/gluster/vol-5 Brick6: apandey:/brick/gluster/vol-6 Options Reconfigured: transport.address-family: inet performance.readdir-ahead: on nfs.disable: on 1 - Mount this volume on 3 mount point, c1, c2, and c3 2 - on c1 - mkdir /c1/dir ; cd dir 3 - on c2 touch 4000 files on mount point i.e. "/" 4 - After step 3, start touching next 4000 files on c2 on mount point i.e. "/" 3 - on c1 start touching 10000 files on /dir/. 4 - on c3 start moving 4000 files created on step 3 from mount point to /dir/ 5 - on c3, from different console , start ls in a loop. All the client will be hanged after sometime (within 3-4 min). I took statedump and found following inode lock contention - lock-dump.domain.domain=dht.file.migrate inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 12892, owner=c4582de9117f0000, client=0x7f0c800b0430, connection-id=apandey-29101-2016/12/05-07:33:03:610275-vol-client-0-0-0, granted at 2016-12-05 07:37:05 lock-dump.domain.domain=vol-disperse-0 [conn.0.bound_xl./brick/gluster/vol-1.active.2] gfid=6a7c71ba-9814-41ad-9975-3a7ef4e97d60 nlookup=2528 fd-count=0 ref=9179 ia_type=2 [xlator.features.locks.vol-locks.inode] path=/dir mandatory=0 inodelk-count=2 lock-dump.domain.domain=dht.layout.heal lock-dump.domain.domain=vol-disperse-0 inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 12892, owner=6c20b6d5117f0000, client=0x7f0c800b0430, connection-id=apandey-29101-2016/12/05-07:33:03:610275-vol-client-0-0-0, blocked at 2016-12-05 07:37:05, granted at 2016-12-05 07:37:05 inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 12894, owner=e870234b277f0000, client=0x7f0c800ac5f0, connection-id=apandey-29029-2016/12/05-07:32:51:253616-vol-client-0-0-0, blocked at 2016-12-05 07:37:05 [conn.0.bound_xl./brick/gluster/vol-1.active.3] gfid=00000000-0000-0000-0000-000000000001 nlookup=1 fd-count=1 ref=1 ia_type=2 [xlator.features.locks.vol-locks.inode] path=/ mandatory=0 inodelk-count=3 lock-dump.domain.domain=dht.layout.heal lock-dump.domain.domain=vol-disperse-0:self-heal lock-dump.domain.domain=vol-disperse-0 inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 12885, owner=6c20b6d5117f0000, client=0x7f0c800b0430, connection-id=apandey-29101-2016/12/05-07:33:03:610275-vol-client-0-0-0, blocked at 2016-12-05 07:37:05, granted at 2016-12-05 07:37:05 inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 12890, owner=ac4c67f4257f0000, client=0x7f0c800a87b0, connection-id=apandey-28922-2016/12/05-07:32:35:356027-vol-client-0-0-0, blocked at 2016-12-05 07:37:05 inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 12892, owner=6825b6d5117f0000, client=0x7f0c800b0430, connection-id=apandey-29101-2016/12/05-07:33:03:610275-vol-client-0-0-0, blocked at 2016-12-05 07:37:05 [conn.0.bound_xl./brick/gluster/vol-1.lru.1] gfid=00000000-0000-0000-0000-000000000005 nlookup=6 fd-count=0 ref=0 ia_type=2 On first brick log i see following filed unlock request - [2016-12-05 07:32:58.401408] I [MSGID: 115029] [server-handshake.c:693:server_setvolume] 0-vol-server: accepted client from apandey-29029-2016/12/05-07:32:51:253616-vol-client-0-0-0 (version: 3.8.5) [2016-12-05 07:33:10.774116] I [login.c:76:gf_auth] 0-auth/login: allowed user names: 0b9b19e0-5b6f-4401-8e1e-53fb21a2fd9d [2016-12-05 07:33:10.774138] I [MSGID: 115029] [server-handshake.c:693:server_setvolume] 0-vol-server: accepted client from apandey-29101-2016/12/05-07:33:03:610275-vol-client-0-0-0 (version: 3.8.5) [2016-12-05 07:37:05.212110] E [inodelk.c:404:__inode_unlock_lock] 0-vol-locks: Matching lock not found for unlock 0-9223372036854775807, by 6825b6d5117f0000 on 0x7f0c800b0430 [2016-12-05 07:37:05.212146] E [MSGID: 115053] [server-rpc-fops.c:278:server_inodelk_cbk] 0-vol-server: 45050: INODELK / (00000000-0000-0000-0000-000000000001), client: apandey-29101-2016/12/05-07:33:03:610275-vol-client-0-0-0, error-xlator: vol-locks [Invalid argument] only on C3 logs I found this error - [2016-12-05 07:37:05.212211] E [MSGID: 114031] [client-rpc-fops.c:1549:client3_3_inodelk_cbk] 0-vol-client-1: remote operation failed [Invalid argument] [2016-12-05 07:37:05.212268] E [MSGID: 114031] [client-rpc-fops.c:1549:client3_3_inodelk_cbk] 0-vol-client-2: remote operation failed [Invalid argument] [2016-12-05 07:37:05.212315] E [MSGID: 114031] [client-rpc-fops.c:1549:client3_3_inodelk_cbk] 0-vol-client-3: remote operation failed [Invalid argument] [2016-12-05 07:37:05.212345] E [MSGID: 114031] [client-rpc-fops.c:1549:client3_3_inodelk_cbk] 0-vol-client-0: remote operation failed [Invalid argument] [2016-12-05 07:37:05.212381] E [MSGID: 114031] [client-rpc-fops.c:1549:client3_3_inodelk_cbk] 0-vol-client-5: remote operation failed [Invalid argument] [2016-12-05 07:37:05.212387] E [MSGID: 114031] [client-rpc-fops.c:1549:client3_3_inodelk_cbk] 0-vol-client-4: remote operation failed [Invalid argument] [2016-12-05 07:37:05.212414] W [MSGID: 122015] [ec-common.c:1749:ec_unlocked] 0-vol-disperse-0: entry/inode unlocking failed (RENAME) [Invalid argument] [2016-12-05 07:37:05.222999] I [MSGID: 109066] [dht-rename.c:1562:dht_rename] 0-vol-dht: renaming /a-2501 (hash=vol-disperse-0/cache=vol-disperse-0) => /dir/a-2501 (hash=vol-disperse-0/cache=<nul>)
Created attachment 1228478 [details] possible fix I found one race which could cause this bug: i.e. lk-owner changes just before inodelk unlock. Rename takes two locks 1) On parent directory of the source file 2) On parent directory of the destination file/location. Lk-owner of these two locks is different i.e. the address of the ctx->inode_lock member. After rename, at the time of unlock it sets the lk-owner on the same rename fop frame and unwinds. But it can so happen that just before ec_inodelk is called the owner can be changed to that of the second parent inode's inode_lock. This can cause this problem. I am a bit confused about the steps I will take your help Ashish tomorrow at 10AM
(In reply to Pranith Kumar K from comment #5) > Created attachment 1228478 [details] > possible fix > > I found one race which could cause this bug: > i.e. lk-owner changes just before inodelk unlock. > Rename takes two locks 1) On parent directory of the source file 2) On > parent directory of the destination file/location. Lk-owner of these two > locks is different i.e. the address of the ctx->inode_lock member. After > rename, at the time of unlock it sets the lk-owner on the same rename fop > frame and unwinds. But it can so happen that just before ec_inodelk is > called the owner can be changed to that of the second parent inode's > inode_lock. This can cause this problem. > > I am a bit confused about the steps I will take your help Ashish tomorrow at > 10AM Ashish found that this patch leads to a crash, I just checked and the fix is a bit more involved than anticipated.
verified below on 3.8.4-9 and didn't find any hang hence moving to verified TC#1: There were two bricks down and I started creating files in below fashion c1:under dir1 touch of f{1..10000} c2:under / touch x{1..1000} c3: after c2 action completed, started touch x{1001..10000} I then brought up all the bricks which were down (2 in each of the two subvols) then while c1 and c3 were happening, on c2 i did a move of all the x* into dir1 Behavior:ls and mv didnt hang TC2:no bricks down 1 - Mount this volume on 3 mount point, c1, c2, and c3 2 - on c1 - mkdir /c1/dir ; cd dir 3 - on c2 touch 4000 files on mount point i.e. "/" 4 - After step 3, start touching next 4000 files on c2 on mount point i.e. "/" 3 - on c1 start touching 10000 files on /dir/. 4 - on c3 start moving 4000 files created on step 3 from mount point to /dir/ 5 - on c3, from different console , start ls in a loop. Behavior:ls and mv didnt hang
testcases added to polarion: RHG3-11786 - BZ#1400093 -TC#1: simultaneous issue of lookup and write (ls and move) should not hang on disperse volume (All bricks up) RHG3-11787 - BZ#1400093 -TC#2: simultaneous issue of lookup and write (ls and move) should not hang on disperse volume (few bricks down)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html