Bug 1380699 - Errors seen in brick logs when same directory paths are created from two different clients inodelk.c:304:__inode_unlock_lock] 0-distrepvol-locks: Matching lock not found for unlock
Summary: Errors seen in brick logs when same directory paths are created from two diff...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: replicate
Version: rhgs-3.2
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: ---
Assignee: Ravishankar N
QA Contact: Nag Pavan Chilakam
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-30 11:26 UTC by Nag Pavan Chilakam
Modified: 2023-09-14 03:31 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-11-21 11:49:56 UTC
Embargoed:


Attachments (Terms of Use)

Description Nag Pavan Chilakam 2016-09-30 11:26:20 UTC
Description of problem:
=======================
In My systemic setup, I have a 4x2 volume with IOs being done from multiple clients.
However from two clients I issued same directory structure creates in a loop as below:


for i in {1..100};do for j in {1..100};do for k in {1..100} ;do for l in {1..100} ;do for m in {1..100} ;do echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log;mkdir -p level1.$i |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log; mkdir -p level1.$i/level2.$j |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log; mkdir -p level1.$i/level2.$j/level3.$k |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log; mkdir -p level1.$i/level2.$j/level3.$k/level4.$l |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log;mkdir -p level1.$i/level2.$j/level3.$k/level4.$l |& tee -a dir.$HOSTNAME.log;mkdir -p level1.$i/level2.$j/level3.$k/level4.$l/level5.$m |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log;done;done;done;done;done"


While the directory creations seem to be going smooth, I see same brick error logs repeatedly as below


[2016-09-30 06:23:40.633923] W [MSGID: 113096] [posix-handle.c:828osix_handle_soft] 0-distrepvol-posix: symlink ../../42/49/42499791-a12d-4e2e-a6d4-e1dc79e99b79/level5.13 -> /rhs/brick1/distrepvol/.glusterfs/0c/75/0c75cd8e-07e3-4b39-a132-49b0742ae851 failed [File exists]
[2016-09-30 06:23:40.659436] E [inodelk.c:304:__inode_unlock_lock] 0-distrepvol-locks:  Matching lock not found for unlock 0-9223372036854775807, by 3c67d9277a7f0000 on 0x7f21884fd2c0
[2016-09-30 06:23:40.659500] E [MSGID: 115053] [server-rpc-fops.c:275:server_inodelk_cbk] 0-distrepvol-server: 38032968: INODELK /rootdir1/renames/dir_samenames/level1.1/level2.1/level3.20/level4.94/level5.13 (0c75cd8e-07e3-4b39-a132-49b0742ae851) ==> (Invalid argument) [Invalid argument]
[2016-09-30 06:23:48.639445] W [MSGID: 113096] [posix-handle.c:828osix_handle_soft] 0-distrepvol-posix: symlink ../../42/49/42499791-a12d-4e2e-a6d4-e1dc79e99b79/level5.40 -> /rhs/brick1/distrepvol/.glusterfs/5b/7c/5b7c7df0-8982-4456-9e42-3fe9df2acf09 failed [File exists]
[2016-09-30 06:23:50.276463] W [MSGID: 113096] [posix-handle.c:828osix_handle_soft] 0-distrepvol-posix: symlink ../../42/49/42499791-a12d-4e2e-a6d4-e1dc79e99b79/level5.49 -> /rhs/brick1/distrepvol/.glusterfs/65/8a/658ab78d-ccaa-4f5b-bf5d-141d809e97fb failed [File exists]
[2016-09-30 06:24:02.289286] E [inodelk.c:304:__inode_unlock_lock] 0-distrepvol-locks:  Matching lock not found for unlock 0-9223372036854775807, by d0b6d9277a7f0000 on 0x7f21884fd2c0
[2016-09-30 06:24:02.289442] E [MSGID: 115053] [server-rpc-fops.c:275:server_inodelk_cbk] 0-distrepvol-server: 38046191: INODELK /rootdir1/renames/dir_samenames/level1.1/level2.1/level3.20/level4.94/level5.88 (28261f34-344b-4cf5-9596-1bff50ffbf3b) ==> (Invalid argument) [Invalid argument]
[2016-09-30 06:24:12.249850] W [MSGID: 113096] [posix-handle.c:828osix_handle_soft] 0-distrepvol-posix: symlink ../../d1/2d/d12da57e-3f31-4d11-9f74-f7ba0c5cf246/level5.27 -> /rhs/brick1/distrepvol/.glusterfs/dd/6a/dd6a0956-d3cb-4dc1-9efd-a6e56515877a failed [File exists]
[2016-09-30 06:24:23.276703] W [MSGID: 113096] [posix-handle.c:828osix_handle_soft] 0-distrepvol-posix: symlink ../../d1/2d/d12da57e-3f31-4d11-9f74-f7ba0c5cf246/level5.73 -> /rhs/brick1/distrepvol/.glusterfs/4f/bd/4fbd0000-ee14-47f4-a32f-123839cd2d4d failed [File exists]




Version-Release number of selected component (if applicable):
====================
[root@dhcp37-187 dir_samenames]# rpm -qa|grep gluster
glusterfs-api-3.8.4-1.el7rhgs.x86_64
glusterfs-rdma-3.8.4-1.el7rhgs.x86_64
glusterfs-libs-3.8.4-1.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-1.el7rhgs.x86_64
glusterfs-fuse-3.8.4-1.el7rhgs.x86_64
glusterfs-server-3.8.4-1.el7rhgs.x86_64
python-gluster-3.8.4-1.el7rhgs.noarch
glusterfs-devel-3.8.4-1.el7rhgs.x86_64
glusterfs-debuginfo-3.8.4-1.el7rhgs.x86_64
glusterfs-3.8.4-1.el7rhgs.x86_64
glusterfs-cli-3.8.4-1.el7rhgs.x86_64
glusterfs-events-3.8.4-1.el7rhgs.x86_64
[root@dhcp37-187 dir_samenames]# 



client logs:
=========
client Logs:
[2016-09-30 06:34:58.938667] W [dict.c:418ict_set] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so(+0x58722) [0x7f7a1c50f722] -->/usr/lib64/libglusterfs.so.0(dict_set_str+0x3c) [0x7f7a2a3d178c] -->/usr/lib64/libglusterfs.so.0(dict_set+0x113) [0x7f7a2a3d0bc3] ) 0-dict: !this || !value for key=link-count [Invalid argument]
[2016-09-30 06:34:58.949023] E [MSGID: 114031] [client-rpc-fops.c:1550:client3_3_inodelk_cbk] 0-distrepvol-client-7: remote operation failed [Invalid argument]
[2016-09-30 06:34:59.178135] I [MSGID: 109063] [dht-layout.c:713ht_layout_normalize] 0-distrepvol-dht: Found anomalies in /rootdir1/renames/dir_samenames/level1.1/level2.1/level3.21/level4.17/level5.13 (gfid = 6bd93a82-7c5e-47d4-9f7d-5e703a1225d6). Holes=1 overlaps=0
[2016-09-30 06:35:01.301329] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 27400471: MKDIR() /rootdir1/renames/dir_samenames/level1.1/level2.1/level3.21/level4.17/level5.24 => -1 (File exists)
[2016-09-30 06:35:01.371991] I [MSGID: 109063] [dht-layout.c:713ht_layout_normalize] 0-distrepvol-dht: Found anomalies in /rootdir1/renames/dir_samenames/level1.1/level2.1/level3.21/level4.17/level5.24 (gfid = 310d4874-bcc5-442f-a378-265004540333). Holes=1 overlaps=0

Systemic testing details:
https://docs.google.com/spreadsheets/d/1iP5Mi1TewBFVh8HTmlcBm9072Bgsbgkr3CLcGmawDys/edit#gid=760435885


Steps to Reproduce:
1. create same directory structure from two different cleints

Comment 4 Ravishankar N 2017-10-10 01:46:05 UTC
Hi Nag,

I tried to re-create the issue on a 2-node 1x2 vol with the mkdir loop given in the BZ description from 4 parallel fuse mounts but did not hit any of the error logs. Could you see if you are able to recreate this issue on rhgs-3.3.0? If yes, please give me the set up for debugging.

Some points about the various errors described in the BZ:

1. 0-dict: !this || !value for key=link-count [Invalid argument].
^^
This is fixed in BZ 1380710.

2.E [inodelk.c:304:__inode_unlock_lock] 0-distrepvol-locks:  Matching lock not found for unlock
server-rpc-fops.c:275:server_inodelk_cbk] 0-distrepvol-server: 38046191: INODELK /rootdir1/renames/dir_samenames/level1.1/level2.1/level3.20/level4.94/level5.88 (28261f34-344b-4cf5-9596-1bff50ffbf3b) ==> (Invalid argument)
[client-rpc-fops.c:1550:client3_3_inodelk_cbk] 0-distrepvol-client-7: remote operation failed [Invalid argument]
^^
These usually occur when there is a client disconnect.

3. [posix-handle.c:828osix_handle_soft] 0-distrepvol-posix: symlink ../../42/49/42499791-a12d-4e2e-a6d4-e1dc79e99b79/level5.40 -> /rhs/brick1/distrepvol/.glusterfs/5b/7c/5b7c7df0-8982-4456-9e42-3fe9df2acf09 failed [File exists]
^^
This should not occur.

I'm trying to find if 2 and 3 are still occurring in the latest version without any network issues or pending self-heals.

Comment 7 Nag Pavan Chilakam 2018-07-12 05:54:26 UTC
Ravi,
Kindly look into above logs, clearing my needinfo as Upasana has provided the information requested at comment#6 


below was the IOs performed

from 4 clients: small file creations were issued parallely, but to their own dedicated directories

 for i in {1..10};do smallfile/smallfile_cli.py  --operation create --threads 4  --file-size 64 --files 1000000  --top /mnt/replicate/$HOSTNAME/ ;done


from 2 clients, rename of same directories (which would conflict) was triggered simultaneously

for i in {1..100};do for j in {1..100};do for k in {1..100} ;do for l in {1..100} ;do for m in {1..100} ;do echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log;mkdir -p level1.$i |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log; mkdir -p level1.$i/level2.$j |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log; mkdir -p level1.$i/level2.$j/level3.$k |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log; mkdir -p level1.$i/level2.$j/level3.$k/level4.$l |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log;mkdir -p level1.$i/level2.$j/level3.$k/level4.$l |& tee -a dir.$HOSTNAME.log;mkdir -p level1.$i/level2.$j/level3.$k/level4.$l/level5.$m |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log;done;done;done;done;done"

Comment 8 Nag Pavan Chilakam 2018-07-12 09:22:36 UTC
(In reply to nchilaka from comment #7)
> Ravi,
> Kindly look into above logs, clearing my needinfo as Upasana has provided
> the information requested at comment#6 
> 
> 
> below was the IOs performed
> 
> from 4 clients: small file creations were issued parallely, but to their own
> dedicated directories
> 
>  for i in {1..10};do smallfile/smallfile_cli.py  --operation create
> --threads 4  --file-size 64 --files 1000000  --top /mnt/replicate/$HOSTNAME/
> ;done
> 
> 
> from 2 clients, rename of same directories (which would conflict) was
> triggered simultaneously

sorry, it was not rename, but same dir creations from 2 clients simultaneously.

> 
> for i in {1..100};do for j in {1..100};do for k in {1..100} ;do for l in
> {1..100} ;do for m in {1..100} ;do echo ""THIS IS LOOP $i $j $k $l $m"" |&
> tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo
> ""###############################"" |& tee -a dir.$HOSTNAME.log;mkdir -p
> level1.$i |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m""
> |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo
> ""###############################"" |& tee -a dir.$HOSTNAME.log; mkdir -p
> level1.$i/level2.$j |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k
> $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo
> ""###############################"" |& tee -a dir.$HOSTNAME.log; mkdir -p
> level1.$i/level2.$j/level3.$k |& tee -a dir.$HOSTNAME.log;echo ""THIS IS
> LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a
> dir.$HOSTNAME.log;echo ""###############################"" |& tee -a
> dir.$HOSTNAME.log; mkdir -p level1.$i/level2.$j/level3.$k/level4.$l |& tee
> -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a
> dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo
> ""###############################"" |& tee -a dir.$HOSTNAME.log;mkdir -p
> level1.$i/level2.$j/level3.$k/level4.$l |& tee -a dir.$HOSTNAME.log;mkdir -p
> level1.$i/level2.$j/level3.$k/level4.$l/level5.$m |& tee -a
> dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a
> dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo
> ""###############################"" |& tee -a
> dir.$HOSTNAME.log;done;done;done;done;done"

Comment 9 Ravishankar N 2018-07-12 09:31:19 UTC
Upasana, 
Could you see if the logs appear in a plain-distribute volume as well?
Thanks.

Comment 10 Ravishankar N 2018-07-12 09:38:48 UTC
(In reply to Ravishankar N from comment #9)
> Upasana, 
> Could you see if the logs appear in a plain-distribute volume as well?
> Thanks.

Specifically the "posix_handle_soft" messages. The mkdir failing with EXIST messages on the client log are fine because multiple clients are attempting the same mkdir.

Comment 14 Red Hat Bugzilla 2023-09-14 03:31:40 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.