Bug 1380710 - invalid argument warning messages seen in fuse client logs 2016-09-30 06:34:58.938667] W [dict.c:418ict_set] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so(+0x58722) 0-dict: !this || !value for key=link-count [Invalid argument]
Summary: invalid argument warning messages seen in fuse client logs 2016-09-30 06:34:5...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: replicate
Version: rhgs-3.2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: RHGS 3.2.0
Assignee: Pranith Kumar K
QA Contact: Nag Pavan Chilakam
URL:
Whiteboard:
Depends On:
Blocks: 1351528 1385104 1385236 1385442
TreeView+ depends on / blocked
 
Reported: 2016-09-30 11:58 UTC by Nag Pavan Chilakam
Modified: 2018-11-30 05:39 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.8.4-3
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1385104 (view as bug list)
Environment:
Last Closed: 2017-03-23 06:07:09 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0486 0 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 09:18:45 UTC

Description Nag Pavan Chilakam 2016-09-30 11:58:08 UTC
Description of problem:
=======================
Description of problem:
=======================
In My systemic setup, I have a 4x2 volume with IOs being done from multiple clients.
However from two clients I issued same directory structure creates in a loop as below:
I am seeing Invalid arguments message on the client log as below
[2016-09-30 06:34:58.938667] W [dict.c:418ict_set] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so(+0x58722) [0x7f7a1c50f722] -->/usr/lib64/libglusterfs.so.0(dict_set_str+0x3c) [0x7f7a2a3d178c] -->/usr/lib64/libglusterfs.so.0(dict_set+0x113) [0x7f7a2a3d0bc3] ) 0-dict: !this || !value for key=link-count [Invalid argument]


for i in {1..100};do for j in {1..100};do for k in {1..100} ;do for l in {1..100} ;do for m in {1..100} ;do echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log;mkdir -p level1.$i |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log; mkdir -p level1.$i/level2.$j |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log; mkdir -p level1.$i/level2.$j/level3.$k |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log; mkdir -p level1.$i/level2.$j/level3.$k/level4.$l |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log;mkdir -p level1.$i/level2.$j/level3.$k/level4.$l |& tee -a dir.$HOSTNAME.log;mkdir -p level1.$i/level2.$j/level3.$k/level4.$l/level5.$m |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log;done;done;done;done;done"


While the directory creations seem to be going smooth, I see same brick error logs repeated for which  BZ#1380699 has been raised.
However on the client too I see below messages:

client Logs:
[2016-09-30 06:34:58.938667] W [dict.c:418ict_set] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so(+0x58722) [0x7f7a1c50f722] -->/usr/lib64/libglusterfs.so.0(dict_set_str+0x3c) [0x7f7a2a3d178c] -->/usr/lib64/libglusterfs.so.0(dict_set+0x113) [0x7f7a2a3d0bc3] ) 0-dict: !this || !value for key=link-count [Invalid argument]
[2016-09-30 06:34:58.949023] E [MSGID: 114031] [client-rpc-fops.c:1550:client3_3_inodelk_cbk] 0-distrepvol-client-7: remote operation failed [Invalid argument]
[2016-09-30 06:34:59.178135] I [MSGID: 109063] [dht-layout.c:713ht_layout_normalize] 0-distrepvol-dht: Found anomalies in /rootdir1/renames/dir_samenames/level1.1/level2.1/level3.21/level4.17/level5.13 (gfid = 6bd93a82-7c5e-47d4-9f7d-5e703a1225d6). Holes=1 overlaps=0
[2016-09-30 06:35:01.301329] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 27400471: MKDIR() /rootdir1/renames/dir_samenames/level1.1/level2.1/level3.21/level4.17/level5.24 => -1 (File exists)
[2016-09-30 06:35:01.371991] I [MSGID: 109063] [dht-layout.c:713ht_layout_normalize] 0-distrepvol-dht: Found anomalies in /rootdir1/renames/dir_samenames/level1.1/level2.1/level3.21/level4.17/level5.24 (gfid = 310d4874-bcc5-442f-a378-265004540333). Holes=1 overlaps=0

Systemic testing details:
https://docs.google.com/spreadsheets/d/1iP5Mi1TewBFVh8HTmlcBm9072Bgsbgkr3CLcGmawDys/edit#gid=760435885


Steps to Reproduce:
1. create same directory structure from two different clients
Version-Release number of selected component (if applicable):
====================
[root@dhcp37-187 dir_samenames]# rpm -qa|grep gluster
glusterfs-api-3.8.4-1.el7rhgs.x86_64
glusterfs-rdma-3.8.4-1.el7rhgs.x86_64
glusterfs-libs-3.8.4-1.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-1.el7rhgs.x86_64
glusterfs-fuse-3.8.4-1.el7rhgs.x86_64
glusterfs-server-3.8.4-1.el7rhgs.x86_64
python-gluster-3.8.4-1.el7rhgs.noarch
glusterfs-devel-3.8.4-1.el7rhgs.x86_64
glusterfs-debuginfo-3.8.4-1.el7rhgs.x86_64
glusterfs-3.8.4-1.el7rhgs.x86_64
glusterfs-cli-3.8.4-1.el7rhgs.x86_64
glusterfs-events-3.8.4-1.el7rhgs.x86_64
[root@dhcp37-187 dir_samenames]#

Comment 2 Nithya Balachandran 2016-09-30 14:10:45 UTC
Steps to reproduce this:
1. Create a 2x2 volume. 
2. Fuse mount the volume and create dir1
3. Unmount volume
4. Delete dir1 manually on both bricks of any one replica set.
5. Mount the volume and do a lookup. DHT should see that the directory is missing and trigger a heal, causing this message to be logged.

Comment 3 Prasad Desala 2016-10-06 12:01:26 UTC
Glusterfs version: 3.8.4-2.el7rhgs.x86_64

Seeing similar warning messages in rebalance logs as well during rebalance.

[2016-10-06 10:09:11.181450] W [dict.c:418:dict_set] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so(+0x4b320) [0x7efdb3b7d320] -->/lib64/libglusterfs.so.0(dict_set_str+0x2c) [0x7efdc5bce32c] -->/lib64/libglusterfs.so.0(dict_set+0xe6) [0x7efdc5bcc1e6] ) 0-dict: !this || !value for key=link-count [Invalid argument]
[2016-10-06 10:09:11.184983] I [dht-rebalance.c:2902:gf_defrag_process_dir] 0-distrep-dht: Migration operation on dir /manual/sticky/d3263 took 0.08 secs
[2016-10-06 10:09:11.191802] W [dict.c:418:dict_set] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so(+0x4b320) [0x7efdb3b7d320] -->/lib64/libglusterfs.so.0(dict_set_str+0x2c) [0x7efdc5bce32c] -->/lib64/libglusterfs.so.0(dict_set+0xe6) [0x7efdc5bcc1e6] ) 0-dict: !this || !value for key=link-count [Invalid argument]

Updated this BZ as the warning messages observed in both fuse client and rebalance logs looks similar. If not, please let me know I will open a new BZ for the warning messages seen in rebalance logs.

Steps that were performed:
==========================
1) Create a distributed replica volume and start it.
2) FUSE mount the volume and create files and directories.
3) Add few bricks to the volume.
4) Trigger rebalance.
5) monitor rebalance logs for the above warning messages... /var/log/glusterfs/<volname-rebalance.log>

Comment 4 Nithya Balachandran 2016-10-07 04:14:30 UTC
These are two separate test cases that trigger the same condition - healing of directories that are missing on some bricks. QE needs to decide whether the same BZ can be used to verify both scenarios.

Comment 5 Pranith Kumar K 2016-10-14 19:05:52 UTC
http://review.gluster.org/15646

Comment 9 Nag Pavan Chilakam 2016-11-07 06:32:20 UTC
QATP:
=====
Have rerun the case with fixed in build and didn't see any the warnings in all the below cases Hence moving to verified:

TC#1:
====
1. create same directory structure from two different clients
Result:not seeing the warning

TC#2:
====
1) Create a distributed replica volume and start it.
2) FUSE mount the volume and create files and directories.
3) Add few bricks to the volume.
4) Trigger rebalance.
5) monitor rebalance logs for the above warning messages... /var/log/glusterfs/<volname-rebalance.log>

Not seeing the warnings anymore


TC#3:
====
1. Create a 2x2 volume. 
2. Fuse mount the volume and create dir1
3. Unmount volume
4. Delete dir1 manually on both bricks of any one replica set.
5. Mount the volume and do a lookup. DHT should see that the directory is missing and trigger a heal, causing this message to be logged.

Not seeing warnings anymore


Hence moving to verified
[root@dhcp35-86 glusterfs]# rpm -qa|grep gluster
glusterfs-3.8.4-3.el7rhgs.x86_64
glusterfs-server-3.8.4-3.el7rhgs.x86_64
glusterfs-fuse-3.8.4-3.el7rhgs.x86_64
glusterfs-libs-3.8.4-3.el7rhgs.x86_64
glusterfs-api-3.8.4-3.el7rhgs.x86_64
glusterfs-cli-3.8.4-3.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-3.el7rhgs.x86_64

Comment 11 errata-xmlrpc 2017-03-23 06:07:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html


Note You need to log in before you can comment on or make changes to this bug.