Description of problem: ======================= Description of problem: ======================= In My systemic setup, I have a 4x2 volume with IOs being done from multiple clients. However from two clients I issued same directory structure creates in a loop as below: I am seeing Invalid arguments message on the client log as below [2016-09-30 06:34:58.938667] W [dict.c:418ict_set] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so(+0x58722) [0x7f7a1c50f722] -->/usr/lib64/libglusterfs.so.0(dict_set_str+0x3c) [0x7f7a2a3d178c] -->/usr/lib64/libglusterfs.so.0(dict_set+0x113) [0x7f7a2a3d0bc3] ) 0-dict: !this || !value for key=link-count [Invalid argument] for i in {1..100};do for j in {1..100};do for k in {1..100} ;do for l in {1..100} ;do for m in {1..100} ;do echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log;mkdir -p level1.$i |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log; mkdir -p level1.$i/level2.$j |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log; mkdir -p level1.$i/level2.$j/level3.$k |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log; mkdir -p level1.$i/level2.$j/level3.$k/level4.$l |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log;mkdir -p level1.$i/level2.$j/level3.$k/level4.$l |& tee -a dir.$HOSTNAME.log;mkdir -p level1.$i/level2.$j/level3.$k/level4.$l/level5.$m |& tee -a dir.$HOSTNAME.log;echo ""THIS IS LOOP $i $j $k $l $m"" |& tee -a dir.$HOSTNAME.log;date |& tee -a dir.$HOSTNAME.log;echo ""###############################"" |& tee -a dir.$HOSTNAME.log;done;done;done;done;done" While the directory creations seem to be going smooth, I see same brick error logs repeated for which BZ#1380699 has been raised. However on the client too I see below messages: client Logs: [2016-09-30 06:34:58.938667] W [dict.c:418ict_set] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so(+0x58722) [0x7f7a1c50f722] -->/usr/lib64/libglusterfs.so.0(dict_set_str+0x3c) [0x7f7a2a3d178c] -->/usr/lib64/libglusterfs.so.0(dict_set+0x113) [0x7f7a2a3d0bc3] ) 0-dict: !this || !value for key=link-count [Invalid argument] [2016-09-30 06:34:58.949023] E [MSGID: 114031] [client-rpc-fops.c:1550:client3_3_inodelk_cbk] 0-distrepvol-client-7: remote operation failed [Invalid argument] [2016-09-30 06:34:59.178135] I [MSGID: 109063] [dht-layout.c:713ht_layout_normalize] 0-distrepvol-dht: Found anomalies in /rootdir1/renames/dir_samenames/level1.1/level2.1/level3.21/level4.17/level5.13 (gfid = 6bd93a82-7c5e-47d4-9f7d-5e703a1225d6). Holes=1 overlaps=0 [2016-09-30 06:35:01.301329] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 27400471: MKDIR() /rootdir1/renames/dir_samenames/level1.1/level2.1/level3.21/level4.17/level5.24 => -1 (File exists) [2016-09-30 06:35:01.371991] I [MSGID: 109063] [dht-layout.c:713ht_layout_normalize] 0-distrepvol-dht: Found anomalies in /rootdir1/renames/dir_samenames/level1.1/level2.1/level3.21/level4.17/level5.24 (gfid = 310d4874-bcc5-442f-a378-265004540333). Holes=1 overlaps=0 Systemic testing details: https://docs.google.com/spreadsheets/d/1iP5Mi1TewBFVh8HTmlcBm9072Bgsbgkr3CLcGmawDys/edit#gid=760435885 Steps to Reproduce: 1. create same directory structure from two different clients Version-Release number of selected component (if applicable): ==================== [root@dhcp37-187 dir_samenames]# rpm -qa|grep gluster glusterfs-api-3.8.4-1.el7rhgs.x86_64 glusterfs-rdma-3.8.4-1.el7rhgs.x86_64 glusterfs-libs-3.8.4-1.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-1.el7rhgs.x86_64 glusterfs-fuse-3.8.4-1.el7rhgs.x86_64 glusterfs-server-3.8.4-1.el7rhgs.x86_64 python-gluster-3.8.4-1.el7rhgs.noarch glusterfs-devel-3.8.4-1.el7rhgs.x86_64 glusterfs-debuginfo-3.8.4-1.el7rhgs.x86_64 glusterfs-3.8.4-1.el7rhgs.x86_64 glusterfs-cli-3.8.4-1.el7rhgs.x86_64 glusterfs-events-3.8.4-1.el7rhgs.x86_64 [root@dhcp37-187 dir_samenames]#
Steps to reproduce this: 1. Create a 2x2 volume. 2. Fuse mount the volume and create dir1 3. Unmount volume 4. Delete dir1 manually on both bricks of any one replica set. 5. Mount the volume and do a lookup. DHT should see that the directory is missing and trigger a heal, causing this message to be logged.
Glusterfs version: 3.8.4-2.el7rhgs.x86_64 Seeing similar warning messages in rebalance logs as well during rebalance. [2016-10-06 10:09:11.181450] W [dict.c:418:dict_set] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so(+0x4b320) [0x7efdb3b7d320] -->/lib64/libglusterfs.so.0(dict_set_str+0x2c) [0x7efdc5bce32c] -->/lib64/libglusterfs.so.0(dict_set+0xe6) [0x7efdc5bcc1e6] ) 0-dict: !this || !value for key=link-count [Invalid argument] [2016-10-06 10:09:11.184983] I [dht-rebalance.c:2902:gf_defrag_process_dir] 0-distrep-dht: Migration operation on dir /manual/sticky/d3263 took 0.08 secs [2016-10-06 10:09:11.191802] W [dict.c:418:dict_set] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so(+0x4b320) [0x7efdb3b7d320] -->/lib64/libglusterfs.so.0(dict_set_str+0x2c) [0x7efdc5bce32c] -->/lib64/libglusterfs.so.0(dict_set+0xe6) [0x7efdc5bcc1e6] ) 0-dict: !this || !value for key=link-count [Invalid argument] Updated this BZ as the warning messages observed in both fuse client and rebalance logs looks similar. If not, please let me know I will open a new BZ for the warning messages seen in rebalance logs. Steps that were performed: ========================== 1) Create a distributed replica volume and start it. 2) FUSE mount the volume and create files and directories. 3) Add few bricks to the volume. 4) Trigger rebalance. 5) monitor rebalance logs for the above warning messages... /var/log/glusterfs/<volname-rebalance.log>
These are two separate test cases that trigger the same condition - healing of directories that are missing on some bricks. QE needs to decide whether the same BZ can be used to verify both scenarios.
http://review.gluster.org/15646
QATP: ===== Have rerun the case with fixed in build and didn't see any the warnings in all the below cases Hence moving to verified: TC#1: ==== 1. create same directory structure from two different clients Result:not seeing the warning TC#2: ==== 1) Create a distributed replica volume and start it. 2) FUSE mount the volume and create files and directories. 3) Add few bricks to the volume. 4) Trigger rebalance. 5) monitor rebalance logs for the above warning messages... /var/log/glusterfs/<volname-rebalance.log> Not seeing the warnings anymore TC#3: ==== 1. Create a 2x2 volume. 2. Fuse mount the volume and create dir1 3. Unmount volume 4. Delete dir1 manually on both bricks of any one replica set. 5. Mount the volume and do a lookup. DHT should see that the directory is missing and trigger a heal, causing this message to be logged. Not seeing warnings anymore Hence moving to verified [root@dhcp35-86 glusterfs]# rpm -qa|grep gluster glusterfs-3.8.4-3.el7rhgs.x86_64 glusterfs-server-3.8.4-3.el7rhgs.x86_64 glusterfs-fuse-3.8.4-3.el7rhgs.x86_64 glusterfs-libs-3.8.4-3.el7rhgs.x86_64 glusterfs-api-3.8.4-3.el7rhgs.x86_64 glusterfs-cli-3.8.4-3.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-3.el7rhgs.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html