Bug 1127658

Summary: SMB:On Cifs mount creating files and doing list or running arequal checksum fills client log with "found anomalies for /.. gfid" issue
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: surabhi <sbhaloth>
Component: sambaAssignee: Raghavendra Talur <rtalur>
Status: CLOSED ERRATA QA Contact: surabhi <sbhaloth>
Severity: urgent Docs Contact:
Priority: urgent    
Version: rhgs-3.0CC: divya, nlevinki, pgurusid, rtalur, sharne, spalai, surs, vagarwal
Target Milestone: ---Keywords: Patch, ZStream
Target Release: RHGS 3.0.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.6.0.33-1 Doc Type: Bug Fix
Doc Text:
Previously, when the gluster volume was accessed through libgfapi, xattrs were being set on parent of the brick directories. This led to add-brick failures if new bricks were to be under the same parent directory. With this fix, xattrs are not set on the parent directory. However, existing xattrs on parent directory would remain and users must manually remove it if any add-brick failures are encountered.
Story Points: ---
Clone Of:
: 1128648 (view as bug list) Environment:
Last Closed: 2015-01-15 13:39:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1128648    
Bug Blocks: 1087818, 1162694    

Description surabhi 2014-08-07 10:08:52 UTC
Description of problem:
On 2x2 replicate volume cifs mount creating file/directories and executing ll or arequal-checksum causes client logs filled with " Mismatching layouts for /.., gfid" messages.
Found anomalies in /.. (gfid = 00000000-0000-0000-0000-000000000001). Holes=1 overlaps=0


Version-Release number of selected component (if applicable):
glusterfs-3.6.0.27-1.el6rhs.x86_64

How reproducible:
Always

Steps to Reproduce:
1.Create a 2X2 volume , mount it via cifs
2.create a directory/file
3.Do ll on the mount point
4.Execute arequal-checksum on the mount point
5.Check client logs

Actual results:
As soon as ll or arequal-checksum is executed huge messages seen in cifs client logs.
[2014-08-07 09:14:10.534787] I [dht-layout.c:754:dht_layout_dir_mismatch] 0-newbug-dht: /..: Disk layout missing, gfid = 00000000-0000-0000-0000-000000000001
[2014-08-07 09:14:10.534912] I [MSGID: 109018] [dht-common.c:696:dht_revalidate_cbk] 0-newbug-dht: Mismatching layouts for /.., gfid = 00000000-0000-0000-0000-000000000001
[2014-08-07 09:14:10.535024] I [dht-layout.c:754:dht_layout_dir_mismatch] 0-newbug-dht: /..: Disk layout missing, gfid = 00000000-0000-0000-0000-000000000001
[2014-08-07 09:14:10.536122] I [dht-layout.c:663:dht_layout_normalize] 0-newbug-dht: Found anomalies in /.. (gfid = 00000000-0000-0000-0000-000000000001). Holes=1 overlaps=0
[2014-08-07 09:14:10.541898] I [dht-layout.c:754:dht_layout_dir_mismatch] 0-newbug-dht: /..: Disk layout missing, gfid = 00000000-0000-0000-0000-000000000001
[2014-08-07 09:14:10.542055] I [dht-layout.c:754:dht_layout_dir_mismatch] 0-newbug-dht: /..: Disk layout missing, gfid = 00000000-0000-0000-0000-000000000001
[2014-08-07 09:14:10.543073] I [dht-layout.c:663:dht_layout_normalize] 0-newbug-dht: Found anomalies in /.. (gfid = 00000000-0000-0000-0000-000000000001). Holes=1 overlaps=0
The message "I [MSGID: 109018] [dht-common.c:696:dht_revalidate_cbk] 0-newbug-dht: Mismatching layouts for /.., gfid = 00000000-0000-0000-0000-000000000001" repeated 3 times between [2014-08-07 09:14:10.534912] and [2014-08-07 09:14:10.542148]



Expected results:
There should be such messages related to anomalies found,DHT mismatch.

Additional info:

Comment 3 surabhi 2014-09-16 07:26:30 UTC
While doing an add-brick test saw following error :

[root@dhcp159-197 /]# gluster vol add-brick afr-vol 10.16.159.197:/rhs/brick1/afr-vol/b10 10.16.159.210:/rhs/brick1/afr-vol/b11
volume add-brick: failed: parent directory /rhs/brick1/afr-vol is already part of a volume
The error looks like the xattr is been set on parent directory so it is not allowing the add-brick.

Workaround: To remove the xattr on parent dir and then it allows to do add-brick.

Comment 4 Raghavendra Talur 2014-09-16 10:00:44 UTC
Root Cause:

When a lookup is sent on path "/.." for a gluster volume through gfapi,
the lookup is being passed down all the way till the posix xlator.

When dht discovers that the dir(parent of the brickpath) does not have any
xattrs set, it initiates a heal and sets attrs.

Example: /bricks/brick1.
Here brick1 is the brick path and bricks dir is parent of brick1.

Now we have parent dir of bricks with xattrs set.


1. Any new add-brick operation with a brickpath involving bricks as parent dir will fail because parent dir has gluster xattrs set. Example: /bricks/brick2. 

2. Add brick won't fail for /newbricks/brick2 as brickpath.

Comment 6 Raghavendra Talur 2014-11-14 11:30:59 UTC
Patch posted at https://code.engineering.redhat.com/gerrit/#/c/36685/

Comment 7 surabhi 2014-12-02 07:08:52 UTC
Verified the BZ with glusterfs-3.6.0.33-1.el6rhs.x86_64.
Executing are-equal checksum on the cifs mount doesn't logs errors in client logs.
Also executed add-brick and remove-brick operations to verify if xattrs  are set on parent of root.The add-brick succeeds without setting the xattr on parent.
Performed sanity test and verified basic ACL's as well.
Moving the BZ to verified.

Comment 8 Divya 2015-01-07 10:54:45 UTC
Raghavendra,

Please review the edited doc text and sign-off.

Comment 9 Raghavendra Talur 2015-01-13 09:02:31 UTC
The doc text looks good to me.

Comment 11 errata-xmlrpc 2015-01-15 13:39:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0038.html