Bug 1380699

Summary: Errors seen in brick logs when same directory paths are created from two different clients inodelk.c:304:__inode_unlock_lock] 0-distrepvol-locks: Matching lock not found for unlock
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Nag Pavan Chilakam <nchilaka>
Component: replicateAssignee: Ravishankar N <ravishankar>
Status: CLOSED WORKSFORME QA Contact: Nag Pavan Chilakam <nchilaka>
Severity: high Docs Contact:
Priority: medium    
Version: rhgs-3.2CC: nbalacha, nchilaka, ravishankar, rhs-bugs, sheggodu, storage-qa-internal, ubansal
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-21 11:49:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nag Pavan Chilakam 2016-09-30 11:26:20 UTC
Description of problem:
=======================
In My systemic setup, I have a 4x2 volume with IOs being done from multiple clients.
However from two clients I issued same directory structure creates in a loop as below:


for i in {1..100};do for j in {1..100};do for k in {1..100} ;do for l in {1..100} ;do for m in {1..100} ;do echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log;mkdir -p level1.$i |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log; mkdir -p level1.$i/level2.$j |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log; mkdir -p level1.$i/level2.$j/level3.$k |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log; mkdir -p level1.$i/level2.$j/level3.$k/level4.$l |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log;mkdir -p level1.$i/level2.$j/level3.$k/level4.$l |& tee -a dir.$HOSTNAME.log;mkdir -p level1.$i/level2.$j/level3.$k/level4.$l/level5.$m |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log;done;done;done;done;done"


While the directory creations seem to be going smooth, I see same brick error logs repeatedly as below


[2016-09-30 06:23:40.633923] W [MSGID: 113096] [posix-handle.c:828osix_handle_soft] 0-distrepvol-posix: symlink ../../42/49/42499791-a12d-4e2e-a6d4-e1dc79e99b79/level5.13 -> /rhs/brick1/distrepvol/.glusterfs/0c/75/0c75cd8e-07e3-4b39-a132-49b0742ae851 failed [File exists]
[2016-09-30 06:23:40.659436] E [inodelk.c:304:__inode_unlock_lock] 0-distrepvol-locks:  Matching lock not found for unlock 0-9223372036854775807, by 3c67d9277a7f0000 on 0x7f21884fd2c0
[2016-09-30 06:23:40.659500] E [MSGID: 115053] [server-rpc-fops.c:275:server_inodelk_cbk] 0-distrepvol-server: 38032968: INODELK /rootdir1/renames/dir_samenames/level1.1/level2.1/level3.20/level4.94/level5.13 (0c75cd8e-07e3-4b39-a132-49b0742ae851) ==> (Invalid argument) [Invalid argument]
[2016-09-30 06:23:48.639445] W [MSGID: 113096] [posix-handle.c:828osix_handle_soft] 0-distrepvol-posix: symlink ../../42/49/42499791-a12d-4e2e-a6d4-e1dc79e99b79/level5.40 -> /rhs/brick1/distrepvol/.glusterfs/5b/7c/5b7c7df0-8982-4456-9e42-3fe9df2acf09 failed [File exists]
[2016-09-30 06:23:50.276463] W [MSGID: 113096] [posix-handle.c:828osix_handle_soft] 0-distrepvol-posix: symlink ../../42/49/42499791-a12d-4e2e-a6d4-e1dc79e99b79/level5.49 -> /rhs/brick1/distrepvol/.glusterfs/65/8a/658ab78d-ccaa-4f5b-bf5d-141d809e97fb failed [File exists]
[2016-09-30 06:24:02.289286] E [inodelk.c:304:__inode_unlock_lock] 0-distrepvol-locks:  Matching lock not found for unlock 0-9223372036854775807, by d0b6d9277a7f0000 on 0x7f21884fd2c0
[2016-09-30 06:24:02.289442] E [MSGID: 115053] [server-rpc-fops.c:275:server_inodelk_cbk] 0-distrepvol-server: 38046191: INODELK /rootdir1/renames/dir_samenames/level1.1/level2.1/level3.20/level4.94/level5.88 (28261f34-344b-4cf5-9596-1bff50ffbf3b) ==> (Invalid argument) [Invalid argument]
[2016-09-30 06:24:12.249850] W [MSGID: 113096] [posix-handle.c:828osix_handle_soft] 0-distrepvol-posix: symlink ../../d1/2d/d12da57e-3f31-4d11-9f74-f7ba0c5cf246/level5.27 -> /rhs/brick1/distrepvol/.glusterfs/dd/6a/dd6a0956-d3cb-4dc1-9efd-a6e56515877a failed [File exists]
[2016-09-30 06:24:23.276703] W [MSGID: 113096] [posix-handle.c:828osix_handle_soft] 0-distrepvol-posix: symlink ../../d1/2d/d12da57e-3f31-4d11-9f74-f7ba0c5cf246/level5.73 -> /rhs/brick1/distrepvol/.glusterfs/4f/bd/4fbd0000-ee14-47f4-a32f-123839cd2d4d failed [File exists]




Version-Release number of selected component (if applicable):
====================
[root@dhcp37-187 dir_samenames]# rpm -qa|grep gluster
glusterfs-api-3.8.4-1.el7rhgs.x86_64
glusterfs-rdma-3.8.4-1.el7rhgs.x86_64
glusterfs-libs-3.8.4-1.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-1.el7rhgs.x86_64
glusterfs-fuse-3.8.4-1.el7rhgs.x86_64
glusterfs-server-3.8.4-1.el7rhgs.x86_64
python-gluster-3.8.4-1.el7rhgs.noarch
glusterfs-devel-3.8.4-1.el7rhgs.x86_64
glusterfs-debuginfo-3.8.4-1.el7rhgs.x86_64
glusterfs-3.8.4-1.el7rhgs.x86_64
glusterfs-cli-3.8.4-1.el7rhgs.x86_64
glusterfs-events-3.8.4-1.el7rhgs.x86_64
[root@dhcp37-187 dir_samenames]# 



client logs:
=========
client Logs:
[2016-09-30 06:34:58.938667] W [dict.c:418ict_set] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so(+0x58722) [0x7f7a1c50f722] -->/usr/lib64/libglusterfs.so.0(dict_set_str+0x3c) [0x7f7a2a3d178c] -->/usr/lib64/libglusterfs.so.0(dict_set+0x113) [0x7f7a2a3d0bc3] ) 0-dict: !this || !value for key=link-count [Invalid argument]
[2016-09-30 06:34:58.949023] E [MSGID: 114031] [client-rpc-fops.c:1550:client3_3_inodelk_cbk] 0-distrepvol-client-7: remote operation failed [Invalid argument]
[2016-09-30 06:34:59.178135] I [MSGID: 109063] [dht-layout.c:713ht_layout_normalize] 0-distrepvol-dht: Found anomalies in /rootdir1/renames/dir_samenames/level1.1/level2.1/level3.21/level4.17/level5.13 (gfid = 6bd93a82-7c5e-47d4-9f7d-5e703a1225d6). Holes=1 overlaps=0
[2016-09-30 06:35:01.301329] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 27400471: MKDIR() /rootdir1/renames/dir_samenames/level1.1/level2.1/level3.21/level4.17/level5.24 => -1 (File exists)
[2016-09-30 06:35:01.371991] I [MSGID: 109063] [dht-layout.c:713ht_layout_normalize] 0-distrepvol-dht: Found anomalies in /rootdir1/renames/dir_samenames/level1.1/level2.1/level3.21/level4.17/level5.24 (gfid = 310d4874-bcc5-442f-a378-265004540333). Holes=1 overlaps=0

Systemic testing details:
https://docs.google.com/spreadsheets/d/1iP5Mi1TewBFVh8HTmlcBm9072Bgsbgkr3CLcGmawDys/edit#gid=760435885


Steps to Reproduce:
1. create same directory structure from two different cleints

Comment 4 Ravishankar N 2017-10-10 01:46:05 UTC
Hi Nag,

I tried to re-create the issue on a 2-node 1x2 vol with the mkdir loop given in the BZ description from 4 parallel fuse mounts but did not hit any of the error logs. Could you see if you are able to recreate this issue on rhgs-3.3.0? If yes, please give me the set up for debugging.

Some points about the various errors described in the BZ:

1. 0-dict: !this || !value for key=link-count [Invalid argument].
^^
This is fixed in BZ 1380710.

2.E [inodelk.c:304:__inode_unlock_lock] 0-distrepvol-locks:  Matching lock not found for unlock
server-rpc-fops.c:275:server_inodelk_cbk] 0-distrepvol-server: 38046191: INODELK /rootdir1/renames/dir_samenames/level1.1/level2.1/level3.20/level4.94/level5.88 (28261f34-344b-4cf5-9596-1bff50ffbf3b) ==> (Invalid argument)
[client-rpc-fops.c:1550:client3_3_inodelk_cbk] 0-distrepvol-client-7: remote operation failed [Invalid argument]
^^
These usually occur when there is a client disconnect.

3. [posix-handle.c:828osix_handle_soft] 0-distrepvol-posix: symlink ../../42/49/42499791-a12d-4e2e-a6d4-e1dc79e99b79/level5.40 -> /rhs/brick1/distrepvol/.glusterfs/5b/7c/5b7c7df0-8982-4456-9e42-3fe9df2acf09 failed [File exists]
^^
This should not occur.

I'm trying to find if 2 and 3 are still occurring in the latest version without any network issues or pending self-heals.

Comment 7 Nag Pavan Chilakam 2018-07-12 05:54:26 UTC
Ravi,
Kindly look into above logs, clearing my needinfo as Upasana has provided the information requested at comment#6 


below was the IOs performed

from 4 clients: small file creations were issued parallely, but to their own dedicated directories

 for i in {1..10};do smallfile/smallfile_cli.py  --operation create --threads 4  --file-size 64 --files 1000000  --top /mnt/replicate/$HOSTNAME/ ;done


from 2 clients, rename of same directories (which would conflict) was triggered simultaneously

for i in {1..100};do for j in {1..100};do for k in {1..100} ;do for l in {1..100} ;do for m in {1..100} ;do echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log;mkdir -p level1.$i |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log; mkdir -p level1.$i/level2.$j |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log; mkdir -p level1.$i/level2.$j/level3.$k |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log; mkdir -p level1.$i/level2.$j/level3.$k/level4.$l |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log;mkdir -p level1.$i/level2.$j/level3.$k/level4.$l |& tee -a dir.$HOSTNAME.log;mkdir -p level1.$i/level2.$j/level3.$k/level4.$l/level5.$m |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log;done;done;done;done;done"

Comment 8 Nag Pavan Chilakam 2018-07-12 09:22:36 UTC
(In reply to nchilaka from comment #7)
> Ravi,
> Kindly look into above logs, clearing my needinfo as Upasana has provided
> the information requested at comment#6 
> 
> 
> below was the IOs performed
> 
> from 4 clients: small file creations were issued parallely, but to their own
> dedicated directories
> 
>  for i in {1..10};do smallfile/smallfile_cli.py  --operation create
> --threads 4  --file-size 64 --files 1000000  --top /mnt/replicate/$HOSTNAME/
> ;done
> 
> 
> from 2 clients, rename of same directories (which would conflict) was
> triggered simultaneously

sorry, it was not rename, but same dir creations from 2 clients simultaneously.

> 
> for i in {1..100};do for j in {1..100};do for k in {1..100} ;do for l in
> {1..100} ;do for m in {1..100} ;do echo ""THIS IS LOOP $i $j $k $l $m"" |&
> tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo
> ""###############################"" |& tee -a dir.$HOSTNAME.log;mkdir -p
> level1.$i |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m""
> |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo
> ""###############################"" |& tee -a dir.$HOSTNAME.log; mkdir -p
> level1.$i/level2.$j |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k
> $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo
> ""###############################"" |& tee -a dir.$HOSTNAME.log; mkdir -p
> level1.$i/level2.$j/level3.$k |& tee -a dir.$HOSTNAME.log;echo ""THIS IS
> LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a
> dir.$HOSTNAME.log;echo ""###############################"" |& tee -a
> dir.$HOSTNAME.log; mkdir -p level1.$i/level2.$j/level3.$k/level4.$l |& tee
> -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a
> dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo
> ""###############################"" |& tee -a dir.$HOSTNAME.log;mkdir -p
> level1.$i/level2.$j/level3.$k/level4.$l |& tee -a dir.$HOSTNAME.log;mkdir -p
> level1.$i/level2.$j/level3.$k/level4.$l/level5.$m |& tee -a
> dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a
> dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo
> ""###############################"" |& tee -a
> dir.$HOSTNAME.log;done;done;done;done;done"

Comment 9 Ravishankar N 2018-07-12 09:31:19 UTC
Upasana, 
Could you see if the logs appear in a plain-distribute volume as well?
Thanks.

Comment 10 Ravishankar N 2018-07-12 09:38:48 UTC
(In reply to Ravishankar N from comment #9)
> Upasana, 
> Could you see if the logs appear in a plain-distribute volume as well?
> Thanks.

Specifically the "posix_handle_soft" messages. The mkdir failing with EXIST messages on the client log are fine because multiple clients are attempting the same mkdir.

Comment 14 Red Hat Bugzilla 2023-09-14 03:31:40 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days