Bug 1127658 - SMB:On Cifs mount creating files and doing list or running arequal checksum fills client log with "found anomalies for /.. gfid" issue
Summary: SMB:On Cifs mount creating files and doing list or running arequal checksum f...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: samba
Version: rhgs-3.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: RHGS 3.0.3
Assignee: Raghavendra Talur
QA Contact: surabhi
URL:
Whiteboard:
Depends On: 1128648
Blocks: 1087818 1162694
TreeView+ depends on / blocked
 
Reported: 2014-08-07 10:08 UTC by surabhi
Modified: 2015-05-13 17:41 UTC (History)
8 users (show)

Fixed In Version: glusterfs-3.6.0.33-1
Doc Type: Bug Fix
Doc Text:
Previously, when the gluster volume was accessed through libgfapi, xattrs were being set on parent of the brick directories. This led to add-brick failures if new bricks were to be under the same parent directory. With this fix, xattrs are not set on the parent directory. However, existing xattrs on parent directory would remain and users must manually remove it if any add-brick failures are encountered.
Clone Of:
: 1128648 (view as bug list)
Environment:
Last Closed: 2015-01-15 13:39:05 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:0038 0 normal SHIPPED_LIVE Red Hat Storage 3.0 enhancement and bug fix update #3 2015-01-15 18:35:28 UTC

Description surabhi 2014-08-07 10:08:52 UTC
Description of problem:
On 2x2 replicate volume cifs mount creating file/directories and executing ll or arequal-checksum causes client logs filled with " Mismatching layouts for /.., gfid" messages.
Found anomalies in /.. (gfid = 00000000-0000-0000-0000-000000000001). Holes=1 overlaps=0


Version-Release number of selected component (if applicable):
glusterfs-3.6.0.27-1.el6rhs.x86_64

How reproducible:
Always

Steps to Reproduce:
1.Create a 2X2 volume , mount it via cifs
2.create a directory/file
3.Do ll on the mount point
4.Execute arequal-checksum on the mount point
5.Check client logs

Actual results:
As soon as ll or arequal-checksum is executed huge messages seen in cifs client logs.
[2014-08-07 09:14:10.534787] I [dht-layout.c:754:dht_layout_dir_mismatch] 0-newbug-dht: /..: Disk layout missing, gfid = 00000000-0000-0000-0000-000000000001
[2014-08-07 09:14:10.534912] I [MSGID: 109018] [dht-common.c:696:dht_revalidate_cbk] 0-newbug-dht: Mismatching layouts for /.., gfid = 00000000-0000-0000-0000-000000000001
[2014-08-07 09:14:10.535024] I [dht-layout.c:754:dht_layout_dir_mismatch] 0-newbug-dht: /..: Disk layout missing, gfid = 00000000-0000-0000-0000-000000000001
[2014-08-07 09:14:10.536122] I [dht-layout.c:663:dht_layout_normalize] 0-newbug-dht: Found anomalies in /.. (gfid = 00000000-0000-0000-0000-000000000001). Holes=1 overlaps=0
[2014-08-07 09:14:10.541898] I [dht-layout.c:754:dht_layout_dir_mismatch] 0-newbug-dht: /..: Disk layout missing, gfid = 00000000-0000-0000-0000-000000000001
[2014-08-07 09:14:10.542055] I [dht-layout.c:754:dht_layout_dir_mismatch] 0-newbug-dht: /..: Disk layout missing, gfid = 00000000-0000-0000-0000-000000000001
[2014-08-07 09:14:10.543073] I [dht-layout.c:663:dht_layout_normalize] 0-newbug-dht: Found anomalies in /.. (gfid = 00000000-0000-0000-0000-000000000001). Holes=1 overlaps=0
The message "I [MSGID: 109018] [dht-common.c:696:dht_revalidate_cbk] 0-newbug-dht: Mismatching layouts for /.., gfid = 00000000-0000-0000-0000-000000000001" repeated 3 times between [2014-08-07 09:14:10.534912] and [2014-08-07 09:14:10.542148]



Expected results:
There should be such messages related to anomalies found,DHT mismatch.

Additional info:

Comment 3 surabhi 2014-09-16 07:26:30 UTC
While doing an add-brick test saw following error :

[root@dhcp159-197 /]# gluster vol add-brick afr-vol 10.16.159.197:/rhs/brick1/afr-vol/b10 10.16.159.210:/rhs/brick1/afr-vol/b11
volume add-brick: failed: parent directory /rhs/brick1/afr-vol is already part of a volume
The error looks like the xattr is been set on parent directory so it is not allowing the add-brick.

Workaround: To remove the xattr on parent dir and then it allows to do add-brick.

Comment 4 Raghavendra Talur 2014-09-16 10:00:44 UTC
Root Cause:

When a lookup is sent on path "/.." for a gluster volume through gfapi,
the lookup is being passed down all the way till the posix xlator.

When dht discovers that the dir(parent of the brickpath) does not have any
xattrs set, it initiates a heal and sets attrs.

Example: /bricks/brick1.
Here brick1 is the brick path and bricks dir is parent of brick1.

Now we have parent dir of bricks with xattrs set.


1. Any new add-brick operation with a brickpath involving bricks as parent dir will fail because parent dir has gluster xattrs set. Example: /bricks/brick2. 

2. Add brick won't fail for /newbricks/brick2 as brickpath.

Comment 6 Raghavendra Talur 2014-11-14 11:30:59 UTC
Patch posted at https://code.engineering.redhat.com/gerrit/#/c/36685/

Comment 7 surabhi 2014-12-02 07:08:52 UTC
Verified the BZ with glusterfs-3.6.0.33-1.el6rhs.x86_64.
Executing are-equal checksum on the cifs mount doesn't logs errors in client logs.
Also executed add-brick and remove-brick operations to verify if xattrs  are set on parent of root.The add-brick succeeds without setting the xattr on parent.
Performed sanity test and verified basic ACL's as well.
Moving the BZ to verified.

Comment 8 Divya 2015-01-07 10:54:45 UTC
Raghavendra,

Please review the edited doc text and sign-off.

Comment 9 Raghavendra Talur 2015-01-13 09:02:31 UTC
The doc text looks good to me.

Comment 11 errata-xmlrpc 2015-01-15 13:39:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0038.html


Note You need to log in before you can comment on or make changes to this bug.