Description of problem: On 2x2 replicate volume cifs mount creating file/directories and executing ll or arequal-checksum causes client logs filled with " Mismatching layouts for /.., gfid" messages. Found anomalies in /.. (gfid = 00000000-0000-0000-0000-000000000001). Holes=1 overlaps=0 Version-Release number of selected component (if applicable): glusterfs-3.6.0.27-1.el6rhs.x86_64 How reproducible: Always Steps to Reproduce: 1.Create a 2X2 volume , mount it via cifs 2.create a directory/file 3.Do ll on the mount point 4.Execute arequal-checksum on the mount point 5.Check client logs Actual results: As soon as ll or arequal-checksum is executed huge messages seen in cifs client logs. [2014-08-07 09:14:10.534787] I [dht-layout.c:754:dht_layout_dir_mismatch] 0-newbug-dht: /..: Disk layout missing, gfid = 00000000-0000-0000-0000-000000000001 [2014-08-07 09:14:10.534912] I [MSGID: 109018] [dht-common.c:696:dht_revalidate_cbk] 0-newbug-dht: Mismatching layouts for /.., gfid = 00000000-0000-0000-0000-000000000001 [2014-08-07 09:14:10.535024] I [dht-layout.c:754:dht_layout_dir_mismatch] 0-newbug-dht: /..: Disk layout missing, gfid = 00000000-0000-0000-0000-000000000001 [2014-08-07 09:14:10.536122] I [dht-layout.c:663:dht_layout_normalize] 0-newbug-dht: Found anomalies in /.. (gfid = 00000000-0000-0000-0000-000000000001). Holes=1 overlaps=0 [2014-08-07 09:14:10.541898] I [dht-layout.c:754:dht_layout_dir_mismatch] 0-newbug-dht: /..: Disk layout missing, gfid = 00000000-0000-0000-0000-000000000001 [2014-08-07 09:14:10.542055] I [dht-layout.c:754:dht_layout_dir_mismatch] 0-newbug-dht: /..: Disk layout missing, gfid = 00000000-0000-0000-0000-000000000001 [2014-08-07 09:14:10.543073] I [dht-layout.c:663:dht_layout_normalize] 0-newbug-dht: Found anomalies in /.. (gfid = 00000000-0000-0000-0000-000000000001). Holes=1 overlaps=0 The message "I [MSGID: 109018] [dht-common.c:696:dht_revalidate_cbk] 0-newbug-dht: Mismatching layouts for /.., gfid = 00000000-0000-0000-0000-000000000001" repeated 3 times between [2014-08-07 09:14:10.534912] and [2014-08-07 09:14:10.542148] Expected results: There should be such messages related to anomalies found,DHT mismatch. Additional info:
While doing an add-brick test saw following error : [root@dhcp159-197 /]# gluster vol add-brick afr-vol 10.16.159.197:/rhs/brick1/afr-vol/b10 10.16.159.210:/rhs/brick1/afr-vol/b11 volume add-brick: failed: parent directory /rhs/brick1/afr-vol is already part of a volume The error looks like the xattr is been set on parent directory so it is not allowing the add-brick. Workaround: To remove the xattr on parent dir and then it allows to do add-brick.
Root Cause: When a lookup is sent on path "/.." for a gluster volume through gfapi, the lookup is being passed down all the way till the posix xlator. When dht discovers that the dir(parent of the brickpath) does not have any xattrs set, it initiates a heal and sets attrs. Example: /bricks/brick1. Here brick1 is the brick path and bricks dir is parent of brick1. Now we have parent dir of bricks with xattrs set. 1. Any new add-brick operation with a brickpath involving bricks as parent dir will fail because parent dir has gluster xattrs set. Example: /bricks/brick2. 2. Add brick won't fail for /newbricks/brick2 as brickpath.
Patch posted at https://code.engineering.redhat.com/gerrit/#/c/36685/
Verified the BZ with glusterfs-3.6.0.33-1.el6rhs.x86_64. Executing are-equal checksum on the cifs mount doesn't logs errors in client logs. Also executed add-brick and remove-brick operations to verify if xattrs are set on parent of root.The add-brick succeeds without setting the xattr on parent. Performed sanity test and verified basic ACL's as well. Moving the BZ to verified.
Raghavendra, Please review the edited doc text and sign-off.
The doc text looks good to me.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0038.html