Bug 1138386 - directory ownership says root as owner ship when the directories are created in parallel on two different mounts
Summary: directory ownership says root as owner ship when the directories are created ...
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: unclassified
Version: 3.5.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1196033
TreeView+ depends on / blocked
 
Reported: 2014-09-04 17:14 UTC by Pranith Kumar K
Modified: 2016-06-17 16:24 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1196033 (view as bug list)
Environment:
Last Closed: 2016-06-17 16:24:39 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Pranith Kumar K 2014-09-04 17:14:53 UTC
Description of problem:
Mail from Peter:
I have a replicated Gluster setup, 2 servers (fs-1 and fs-2) x 1 brick.  I have two clients (also on fs-1 and fs-2) which mount the Gluster volume at /mnt/gfs (/mnt/gfs type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)).  These clients have scripts which perform various file operations.  One operation they perform looks like this (note this is pseudocode, the actual script is PHP):

1. @mkdir(/mnt/gfs/somedir, 0550);
2. chown(1234, /mnt/gfs/somedir);
3. chgrp(1234, /mnt/gfs/somedir);

Note that line 1 may fail on either client because the directory may have been created on the other client.  These errors are suppressed/ignored.  When this operation is performed simultaneously on both clients, it usually succeeds in creating a directory with the expected permissions and ownership.  Intermittently however, we see that these directories are not owned by the expected user and group.

I've created a PHP script which can be run on two clients simultaneously to reproduce the error: https://gist.github.com/pdrakeweb/ae046b4c70a42309be43

The only log entry I can find that appears to be related is from fs-1's mnt-gfs.log file:

[2014-08-22 12:27:57.661778] I [dht-layout.c:640:dht_layout_normalize] 0-test-fs-cluster-1-dht: found anomalies in /test-target/test1408710477.7. holes=1 overlaps=0

This occurs in both Gluster 3.4.1 and 3.5.2 (the only two versions I have tested for this).  I am unable to reproduce the problem on a local (non-gluster) filesystem.  I'd appreciate any insight people might have into what is going on here and whether this is a bug in Gluster.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Pranith Kumar K 2014-09-04 17:18:43 UTC
I am able to reproduce the bug consistently. Disabling stat-prefetch reduced the number of times the errors come but it hasn't eliminated the issue.

Following the strace output was interesting. The problem always seems to be because the uid is not matching:
stat("/mnt/fuse1/test-target/test1409848960.3", {st_dev=makedev(0, 41), st_ino=12165775161408537538, st_mode=S_IFDIR|0550, st_nlink=2, *st_uid=0*, st_gid=9999, st_blksize=131072, st_blocks=1, st_size=6, st_atime=2014/09/04-22:12:40, st_mtime=2014/09/04-22:12:40, st_ctime=2014/09/04-22:12:40}) = 0

uid is coming as 0 and gid is 9999. If we do a stat after the run is over it is showing things correctly.

Comment 2 Pranith Kumar K 2014-09-04 17:27:57 UTC
The issue is not happening on plain distribute or replicate with no distribute in the graph on my tests. Not sure why it is only happening with dht+afr. Will update the bug once I find more.

Comment 3 Pranith Kumar K 2014-09-05 10:17:44 UTC
RCA for the bug:
Mount-1: Creates a new directory uid:gid is 0:0
Mount-2: Tries to create a new directory fails with EEXIST
Mount-2: Does chown with uid as 9999 uid:gid at the end is 9999:0
Mount-1: Needs to set dht layout so triggers self-heal as part of that it sets the uid:gid back to 0:0
mount-2: Does chown with gid as 9999 uid:gid at the end is 0:9999
mount-2: Gets uid:gid and gets 0:9999 instead of 9999:9999
mount-1: Does chown with uid as 9999 uid:gid at the end is 9999:9999
mount-1: Does chown with gid as 9999 uid:gid at the end is 9999:9999

I am not sure what exactly needs to be fixed in dht.

Comment 4 Pranith Kumar K 2014-09-05 10:18:59 UTC
(In reply to Pranith Kumar K from comment #3)
> RCA for the bug:
> Mount-1: Creates a new directory uid:gid is 0:0
Mount-2: Tries to create the same directory above and fails with EEXIST
All the following operations happen on this same directory from here on
> Mount-2: Does chown with uid as 9999 uid:gid at the end is 9999:0
> Mount-1: Needs to set dht layout so triggers self-heal as part of that it
> sets the uid:gid back to 0:0
> mount-2: Does chown with gid as 9999 uid:gid at the end is 0:9999
> mount-2: Gets uid:gid and gets 0:9999 instead of 9999:9999
> mount-1: Does chown with uid as 9999 uid:gid at the end is 9999:9999
> mount-1: Does chown with gid as 9999 uid:gid at the end is 9999:9999
> 
> I am not sure what exactly needs to be fixed in dht.

Comment 5 Anand Avati 2015-04-20 20:57:40 UTC
REVIEW: http://review.gluster.org/10306 (dht: fix racy setattr(chown) behavior) posted (#1) for review on master by Jeff Darcy (jdarcy)

Comment 6 Niels de Vos 2016-06-17 16:24:39 UTC
This bug is getting closed because the 3.5 is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.


Note You need to log in before you can comment on or make changes to this bug.