Bug 1380710

Summary: invalid argument warning messages seen in fuse client logs 2016-09-30 06:34:58.938667] W [dict.c:418ict_set] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so(+0x58722) 0-dict: !this || !value for key=link-count [Invalid argument]
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Nag Pavan Chilakam <nchilaka>
Component: replicateAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED ERRATA QA Contact: Nag Pavan Chilakam <nchilaka>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: asrivast, nbalacha, rhinduja, rhs-bugs, storage-qa-internal, tdesala
Target Milestone: ---   
Target Release: RHGS 3.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.8.4-3 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1385104 (view as bug list) Environment:
Last Closed: 2017-03-23 06:07:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1351528, 1385104, 1385236, 1385442    

Description Nag Pavan Chilakam 2016-09-30 11:58:08 UTC
Description of problem:
=======================
Description of problem:
=======================
In My systemic setup, I have a 4x2 volume with IOs being done from multiple clients.
However from two clients I issued same directory structure creates in a loop as below:
I am seeing Invalid arguments message on the client log as below
[2016-09-30 06:34:58.938667] W [dict.c:418ict_set] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so(+0x58722) [0x7f7a1c50f722] -->/usr/lib64/libglusterfs.so.0(dict_set_str+0x3c) [0x7f7a2a3d178c] -->/usr/lib64/libglusterfs.so.0(dict_set+0x113) [0x7f7a2a3d0bc3] ) 0-dict: !this || !value for key=link-count [Invalid argument]


for i in {1..100};do for j in {1..100};do for k in {1..100} ;do for l in {1..100} ;do for m in {1..100} ;do echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log;mkdir -p level1.$i |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log; mkdir -p level1.$i/level2.$j |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log; mkdir -p level1.$i/level2.$j/level3.$k |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log; mkdir -p level1.$i/level2.$j/level3.$k/level4.$l |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log;mkdir -p level1.$i/level2.$j/level3.$k/level4.$l |& tee -a dir.$HOSTNAME.log;mkdir -p level1.$i/level2.$j/level3.$k/level4.$l/level5.$m |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log;done;done;done;done;done"


While the directory creations seem to be going smooth, I see same brick error logs repeated for which  BZ#1380699 has been raised.
However on the client too I see below messages:

client Logs:
[2016-09-30 06:34:58.938667] W [dict.c:418ict_set] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so(+0x58722) [0x7f7a1c50f722] -->/usr/lib64/libglusterfs.so.0(dict_set_str+0x3c) [0x7f7a2a3d178c] -->/usr/lib64/libglusterfs.so.0(dict_set+0x113) [0x7f7a2a3d0bc3] ) 0-dict: !this || !value for key=link-count [Invalid argument]
[2016-09-30 06:34:58.949023] E [MSGID: 114031] [client-rpc-fops.c:1550:client3_3_inodelk_cbk] 0-distrepvol-client-7: remote operation failed [Invalid argument]
[2016-09-30 06:34:59.178135] I [MSGID: 109063] [dht-layout.c:713ht_layout_normalize] 0-distrepvol-dht: Found anomalies in /rootdir1/renames/dir_samenames/level1.1/level2.1/level3.21/level4.17/level5.13 (gfid = 6bd93a82-7c5e-47d4-9f7d-5e703a1225d6). Holes=1 overlaps=0
[2016-09-30 06:35:01.301329] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 27400471: MKDIR() /rootdir1/renames/dir_samenames/level1.1/level2.1/level3.21/level4.17/level5.24 => -1 (File exists)
[2016-09-30 06:35:01.371991] I [MSGID: 109063] [dht-layout.c:713ht_layout_normalize] 0-distrepvol-dht: Found anomalies in /rootdir1/renames/dir_samenames/level1.1/level2.1/level3.21/level4.17/level5.24 (gfid = 310d4874-bcc5-442f-a378-265004540333). Holes=1 overlaps=0

Systemic testing details:
https://docs.google.com/spreadsheets/d/1iP5Mi1TewBFVh8HTmlcBm9072Bgsbgkr3CLcGmawDys/edit#gid=760435885


Steps to Reproduce:
1. create same directory structure from two different clients
Version-Release number of selected component (if applicable):
====================
[root@dhcp37-187 dir_samenames]# rpm -qa|grep gluster
glusterfs-api-3.8.4-1.el7rhgs.x86_64
glusterfs-rdma-3.8.4-1.el7rhgs.x86_64
glusterfs-libs-3.8.4-1.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-1.el7rhgs.x86_64
glusterfs-fuse-3.8.4-1.el7rhgs.x86_64
glusterfs-server-3.8.4-1.el7rhgs.x86_64
python-gluster-3.8.4-1.el7rhgs.noarch
glusterfs-devel-3.8.4-1.el7rhgs.x86_64
glusterfs-debuginfo-3.8.4-1.el7rhgs.x86_64
glusterfs-3.8.4-1.el7rhgs.x86_64
glusterfs-cli-3.8.4-1.el7rhgs.x86_64
glusterfs-events-3.8.4-1.el7rhgs.x86_64
[root@dhcp37-187 dir_samenames]#

Comment 2 Nithya Balachandran 2016-09-30 14:10:45 UTC
Steps to reproduce this:
1. Create a 2x2 volume. 
2. Fuse mount the volume and create dir1
3. Unmount volume
4. Delete dir1 manually on both bricks of any one replica set.
5. Mount the volume and do a lookup. DHT should see that the directory is missing and trigger a heal, causing this message to be logged.

Comment 3 Prasad Desala 2016-10-06 12:01:26 UTC
Glusterfs version: 3.8.4-2.el7rhgs.x86_64

Seeing similar warning messages in rebalance logs as well during rebalance.

[2016-10-06 10:09:11.181450] W [dict.c:418:dict_set] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so(+0x4b320) [0x7efdb3b7d320] -->/lib64/libglusterfs.so.0(dict_set_str+0x2c) [0x7efdc5bce32c] -->/lib64/libglusterfs.so.0(dict_set+0xe6) [0x7efdc5bcc1e6] ) 0-dict: !this || !value for key=link-count [Invalid argument]
[2016-10-06 10:09:11.184983] I [dht-rebalance.c:2902:gf_defrag_process_dir] 0-distrep-dht: Migration operation on dir /manual/sticky/d3263 took 0.08 secs
[2016-10-06 10:09:11.191802] W [dict.c:418:dict_set] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so(+0x4b320) [0x7efdb3b7d320] -->/lib64/libglusterfs.so.0(dict_set_str+0x2c) [0x7efdc5bce32c] -->/lib64/libglusterfs.so.0(dict_set+0xe6) [0x7efdc5bcc1e6] ) 0-dict: !this || !value for key=link-count [Invalid argument]

Updated this BZ as the warning messages observed in both fuse client and rebalance logs looks similar. If not, please let me know I will open a new BZ for the warning messages seen in rebalance logs.

Steps that were performed:
==========================
1) Create a distributed replica volume and start it.
2) FUSE mount the volume and create files and directories.
3) Add few bricks to the volume.
4) Trigger rebalance.
5) monitor rebalance logs for the above warning messages... /var/log/glusterfs/<volname-rebalance.log>

Comment 4 Nithya Balachandran 2016-10-07 04:14:30 UTC
These are two separate test cases that trigger the same condition - healing of directories that are missing on some bricks. QE needs to decide whether the same BZ can be used to verify both scenarios.

Comment 5 Pranith Kumar K 2016-10-14 19:05:52 UTC
http://review.gluster.org/15646

Comment 9 Nag Pavan Chilakam 2016-11-07 06:32:20 UTC
QATP:
=====
Have rerun the case with fixed in build and didn't see any the warnings in all the below cases Hence moving to verified:

TC#1:
====
1. create same directory structure from two different clients
Result:not seeing the warning

TC#2:
====
1) Create a distributed replica volume and start it.
2) FUSE mount the volume and create files and directories.
3) Add few bricks to the volume.
4) Trigger rebalance.
5) monitor rebalance logs for the above warning messages... /var/log/glusterfs/<volname-rebalance.log>

Not seeing the warnings anymore


TC#3:
====
1. Create a 2x2 volume. 
2. Fuse mount the volume and create dir1
3. Unmount volume
4. Delete dir1 manually on both bricks of any one replica set.
5. Mount the volume and do a lookup. DHT should see that the directory is missing and trigger a heal, causing this message to be logged.

Not seeing warnings anymore


Hence moving to verified
[root@dhcp35-86 glusterfs]# rpm -qa|grep gluster
glusterfs-3.8.4-3.el7rhgs.x86_64
glusterfs-server-3.8.4-3.el7rhgs.x86_64
glusterfs-fuse-3.8.4-3.el7rhgs.x86_64
glusterfs-libs-3.8.4-3.el7rhgs.x86_64
glusterfs-api-3.8.4-3.el7rhgs.x86_64
glusterfs-cli-3.8.4-3.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-3.el7rhgs.x86_64

Comment 11 errata-xmlrpc 2017-03-23 06:07:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html