Description of problem: ======================= In My systemic setup, I have a 4x2 volume with IOs being done from multiple clients. However from two clients I issued same directory structure creates in a loop as below: for i in {1..100};do for j in {1..100};do for k in {1..100} ;do for l in {1..100} ;do for m in {1..100} ;do echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log;mkdir -p level1.$i |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log; mkdir -p level1.$i/level2.$j |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log; mkdir -p level1.$i/level2.$j/level3.$k |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log; mkdir -p level1.$i/level2.$j/level3.$k/level4.$l |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log;mkdir -p level1.$i/level2.$j/level3.$k/level4.$l |& tee -a dir.$HOSTNAME.log;mkdir -p level1.$i/level2.$j/level3.$k/level4.$l/level5.$m |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log;done;done;done;done;done" While the directory creations seem to be going smooth, I see same brick error logs repeatedly as below [2016-09-30 06:23:40.633923] W [MSGID: 113096] [posix-handle.c:828osix_handle_soft] 0-distrepvol-posix: symlink ../../42/49/42499791-a12d-4e2e-a6d4-e1dc79e99b79/level5.13 -> /rhs/brick1/distrepvol/.glusterfs/0c/75/0c75cd8e-07e3-4b39-a132-49b0742ae851 failed [File exists] [2016-09-30 06:23:40.659436] E [inodelk.c:304:__inode_unlock_lock] 0-distrepvol-locks: Matching lock not found for unlock 0-9223372036854775807, by 3c67d9277a7f0000 on 0x7f21884fd2c0 [2016-09-30 06:23:40.659500] E [MSGID: 115053] [server-rpc-fops.c:275:server_inodelk_cbk] 0-distrepvol-server: 38032968: INODELK /rootdir1/renames/dir_samenames/level1.1/level2.1/level3.20/level4.94/level5.13 (0c75cd8e-07e3-4b39-a132-49b0742ae851) ==> (Invalid argument) [Invalid argument] [2016-09-30 06:23:48.639445] W [MSGID: 113096] [posix-handle.c:828osix_handle_soft] 0-distrepvol-posix: symlink ../../42/49/42499791-a12d-4e2e-a6d4-e1dc79e99b79/level5.40 -> /rhs/brick1/distrepvol/.glusterfs/5b/7c/5b7c7df0-8982-4456-9e42-3fe9df2acf09 failed [File exists] [2016-09-30 06:23:50.276463] W [MSGID: 113096] [posix-handle.c:828osix_handle_soft] 0-distrepvol-posix: symlink ../../42/49/42499791-a12d-4e2e-a6d4-e1dc79e99b79/level5.49 -> /rhs/brick1/distrepvol/.glusterfs/65/8a/658ab78d-ccaa-4f5b-bf5d-141d809e97fb failed [File exists] [2016-09-30 06:24:02.289286] E [inodelk.c:304:__inode_unlock_lock] 0-distrepvol-locks: Matching lock not found for unlock 0-9223372036854775807, by d0b6d9277a7f0000 on 0x7f21884fd2c0 [2016-09-30 06:24:02.289442] E [MSGID: 115053] [server-rpc-fops.c:275:server_inodelk_cbk] 0-distrepvol-server: 38046191: INODELK /rootdir1/renames/dir_samenames/level1.1/level2.1/level3.20/level4.94/level5.88 (28261f34-344b-4cf5-9596-1bff50ffbf3b) ==> (Invalid argument) [Invalid argument] [2016-09-30 06:24:12.249850] W [MSGID: 113096] [posix-handle.c:828osix_handle_soft] 0-distrepvol-posix: symlink ../../d1/2d/d12da57e-3f31-4d11-9f74-f7ba0c5cf246/level5.27 -> /rhs/brick1/distrepvol/.glusterfs/dd/6a/dd6a0956-d3cb-4dc1-9efd-a6e56515877a failed [File exists] [2016-09-30 06:24:23.276703] W [MSGID: 113096] [posix-handle.c:828osix_handle_soft] 0-distrepvol-posix: symlink ../../d1/2d/d12da57e-3f31-4d11-9f74-f7ba0c5cf246/level5.73 -> /rhs/brick1/distrepvol/.glusterfs/4f/bd/4fbd0000-ee14-47f4-a32f-123839cd2d4d failed [File exists] Version-Release number of selected component (if applicable): ==================== [root@dhcp37-187 dir_samenames]# rpm -qa|grep gluster glusterfs-api-3.8.4-1.el7rhgs.x86_64 glusterfs-rdma-3.8.4-1.el7rhgs.x86_64 glusterfs-libs-3.8.4-1.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-1.el7rhgs.x86_64 glusterfs-fuse-3.8.4-1.el7rhgs.x86_64 glusterfs-server-3.8.4-1.el7rhgs.x86_64 python-gluster-3.8.4-1.el7rhgs.noarch glusterfs-devel-3.8.4-1.el7rhgs.x86_64 glusterfs-debuginfo-3.8.4-1.el7rhgs.x86_64 glusterfs-3.8.4-1.el7rhgs.x86_64 glusterfs-cli-3.8.4-1.el7rhgs.x86_64 glusterfs-events-3.8.4-1.el7rhgs.x86_64 [root@dhcp37-187 dir_samenames]# client logs: ========= client Logs: [2016-09-30 06:34:58.938667] W [dict.c:418ict_set] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so(+0x58722) [0x7f7a1c50f722] -->/usr/lib64/libglusterfs.so.0(dict_set_str+0x3c) [0x7f7a2a3d178c] -->/usr/lib64/libglusterfs.so.0(dict_set+0x113) [0x7f7a2a3d0bc3] ) 0-dict: !this || !value for key=link-count [Invalid argument] [2016-09-30 06:34:58.949023] E [MSGID: 114031] [client-rpc-fops.c:1550:client3_3_inodelk_cbk] 0-distrepvol-client-7: remote operation failed [Invalid argument] [2016-09-30 06:34:59.178135] I [MSGID: 109063] [dht-layout.c:713ht_layout_normalize] 0-distrepvol-dht: Found anomalies in /rootdir1/renames/dir_samenames/level1.1/level2.1/level3.21/level4.17/level5.13 (gfid = 6bd93a82-7c5e-47d4-9f7d-5e703a1225d6). Holes=1 overlaps=0 [2016-09-30 06:35:01.301329] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 27400471: MKDIR() /rootdir1/renames/dir_samenames/level1.1/level2.1/level3.21/level4.17/level5.24 => -1 (File exists) [2016-09-30 06:35:01.371991] I [MSGID: 109063] [dht-layout.c:713ht_layout_normalize] 0-distrepvol-dht: Found anomalies in /rootdir1/renames/dir_samenames/level1.1/level2.1/level3.21/level4.17/level5.24 (gfid = 310d4874-bcc5-442f-a378-265004540333). Holes=1 overlaps=0 Systemic testing details: https://docs.google.com/spreadsheets/d/1iP5Mi1TewBFVh8HTmlcBm9072Bgsbgkr3CLcGmawDys/edit#gid=760435885 Steps to Reproduce: 1. create same directory structure from two different cleints
Hi Nag, I tried to re-create the issue on a 2-node 1x2 vol with the mkdir loop given in the BZ description from 4 parallel fuse mounts but did not hit any of the error logs. Could you see if you are able to recreate this issue on rhgs-3.3.0? If yes, please give me the set up for debugging. Some points about the various errors described in the BZ: 1. 0-dict: !this || !value for key=link-count [Invalid argument]. ^^ This is fixed in BZ 1380710. 2.E [inodelk.c:304:__inode_unlock_lock] 0-distrepvol-locks: Matching lock not found for unlock server-rpc-fops.c:275:server_inodelk_cbk] 0-distrepvol-server: 38046191: INODELK /rootdir1/renames/dir_samenames/level1.1/level2.1/level3.20/level4.94/level5.88 (28261f34-344b-4cf5-9596-1bff50ffbf3b) ==> (Invalid argument) [client-rpc-fops.c:1550:client3_3_inodelk_cbk] 0-distrepvol-client-7: remote operation failed [Invalid argument] ^^ These usually occur when there is a client disconnect. 3. [posix-handle.c:828osix_handle_soft] 0-distrepvol-posix: symlink ../../42/49/42499791-a12d-4e2e-a6d4-e1dc79e99b79/level5.40 -> /rhs/brick1/distrepvol/.glusterfs/5b/7c/5b7c7df0-8982-4456-9e42-3fe9df2acf09 failed [File exists] ^^ This should not occur. I'm trying to find if 2 and 3 are still occurring in the latest version without any network issues or pending self-heals.
Ravi, Kindly look into above logs, clearing my needinfo as Upasana has provided the information requested at comment#6 below was the IOs performed from 4 clients: small file creations were issued parallely, but to their own dedicated directories for i in {1..10};do smallfile/smallfile_cli.py --operation create --threads 4 --file-size 64 --files 1000000 --top /mnt/replicate/$HOSTNAME/ ;done from 2 clients, rename of same directories (which would conflict) was triggered simultaneously for i in {1..100};do for j in {1..100};do for k in {1..100} ;do for l in {1..100} ;do for m in {1..100} ;do echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log;mkdir -p level1.$i |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log; mkdir -p level1.$i/level2.$j |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log; mkdir -p level1.$i/level2.$j/level3.$k |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log; mkdir -p level1.$i/level2.$j/level3.$k/level4.$l |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log;mkdir -p level1.$i/level2.$j/level3.$k/level4.$l |& tee -a dir.$HOSTNAME.log;mkdir -p level1.$i/level2.$j/level3.$k/level4.$l/level5.$m |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log;done;done;done;done;done"
(In reply to nchilaka from comment #7) > Ravi, > Kindly look into above logs, clearing my needinfo as Upasana has provided > the information requested at comment#6 > > > below was the IOs performed > > from 4 clients: small file creations were issued parallely, but to their own > dedicated directories > > for i in {1..10};do smallfile/smallfile_cli.py --operation create > --threads 4 --file-size 64 --files 1000000 --top /mnt/replicate/$HOSTNAME/ > ;done > > > from 2 clients, rename of same directories (which would conflict) was > triggered simultaneously sorry, it was not rename, but same dir creations from 2 clients simultaneously. > > for i in {1..100};do for j in {1..100};do for k in {1..100} ;do for l in > {1..100} ;do for m in {1..100} ;do echo ""THIS IS LOOP $i $j $k $l $m"" |& > tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo > ""###############################"" |& tee -a dir.$HOSTNAME.log;mkdir -p > level1.$i |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" > |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo > ""###############################"" |& tee -a dir.$HOSTNAME.log; mkdir -p > level1.$i/level2.$j |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k > $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo > ""###############################"" |& tee -a dir.$HOSTNAME.log; mkdir -p > level1.$i/level2.$j/level3.$k |& tee -a dir.$HOSTNAME.log;echo ""THIS IS > LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a > dir.$HOSTNAME.log;echo ""###############################"" |& tee -a > dir.$HOSTNAME.log; mkdir -p level1.$i/level2.$j/level3.$k/level4.$l |& tee > -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a > dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo > ""###############################"" |& tee -a dir.$HOSTNAME.log;mkdir -p > level1.$i/level2.$j/level3.$k/level4.$l |& tee -a dir.$HOSTNAME.log;mkdir -p > level1.$i/level2.$j/level3.$k/level4.$l/level5.$m |& tee -a > dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a > dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo > ""###############################"" |& tee -a > dir.$HOSTNAME.log;done;done;done;done;done"
Upasana, Could you see if the logs appear in a plain-distribute volume as well? Thanks.
(In reply to Ravishankar N from comment #9) > Upasana, > Could you see if the logs appear in a plain-distribute volume as well? > Thanks. Specifically the "posix_handle_soft" messages. The mkdir failing with EXIST messages on the client log are fine because multiple clients are attempting the same mkdir.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days