Bug 1196033 - directory ownership says root as owner ship when the directories are created in parallel on two different mounts
Summary: directory ownership says root as owner ship when the directories are created ...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: distribute
Version: rhgs-3.0
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Mohit Agrawal
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard: dht-dir-attr-xattr-heal, dht-perms
Depends On: 1138386
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-02-25 06:55 UTC by Rachana Patel
Modified: 2018-04-16 18:00 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 1138386
Environment:
Last Closed: 2018-04-16 18:00:30 UTC
Embargoed:


Attachments (Terms of Use)

Description Rachana Patel 2015-02-25 06:55:31 UTC
version:-
glusterfs-3.6.0.45-1


+++ This bug was initially created as a clone of Bug #1138386 +++

Description of problem:
Mail from Peter:
I have a replicated Gluster setup, 2 servers (fs-1 and fs-2) x 1 brick.  I have two clients (also on fs-1 and fs-2) which mount the Gluster volume at /mnt/gfs (/mnt/gfs type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)).  These clients have scripts which perform various file operations.  One operation they perform looks like this (note this is pseudocode, the actual script is PHP):

1. @mkdir(/mnt/gfs/somedir, 0550);
2. chown(1234, /mnt/gfs/somedir);
3. chgrp(1234, /mnt/gfs/somedir);

Note that line 1 may fail on either client because the directory may have been created on the other client.  These errors are suppressed/ignored.  When this operation is performed simultaneously on both clients, it usually succeeds in creating a directory with the expected permissions and ownership.  Intermittently however, we see that these directories are not owned by the expected user and group.

I've created a PHP script which can be run on two clients simultaneously to reproduce the error: https://gist.github.com/pdrakeweb/ae046b4c70a42309be43

The only log entry I can find that appears to be related is from fs-1's mnt-gfs.log file:

[2014-08-22 12:27:57.661778] I [dht-layout.c:640:dht_layout_normalize] 0-test-fs-cluster-1-dht: found anomalies in /test-target/test1408710477.7. holes=1 overlaps=0

This occurs in both Gluster 3.4.1 and 3.5.2 (the only two versions I have tested for this).  I am unable to reproduce the problem on a local (non-gluster) filesystem.  I'd appreciate any insight people might have into what is going on here and whether this is a bug in Gluster.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Pranith Kumar K on 2014-09-04 13:18:43 EDT ---

I am able to reproduce the bug consistently. Disabling stat-prefetch reduced the number of times the errors come but it hasn't eliminated the issue.

Following the strace output was interesting. The problem always seems to be because the uid is not matching:
stat("/mnt/fuse1/test-target/test1409848960.3", {st_dev=makedev(0, 41), st_ino=12165775161408537538, st_mode=S_IFDIR|0550, st_nlink=2, *st_uid=0*, st_gid=9999, st_blksize=131072, st_blocks=1, st_size=6, st_atime=2014/09/04-22:12:40, st_mtime=2014/09/04-22:12:40, st_ctime=2014/09/04-22:12:40}) = 0

uid is coming as 0 and gid is 9999. If we do a stat after the run is over it is showing things correctly.

--- Additional comment from Pranith Kumar K on 2014-09-04 13:27:57 EDT ---

The issue is not happening on plain distribute or replicate with no distribute in the graph on my tests. Not sure why it is only happening with dht+afr. Will update the bug once I find more.

--- Additional comment from Pranith Kumar K on 2014-09-05 06:17:44 EDT ---

RCA for the bug:
Mount-1: Creates a new directory uid:gid is 0:0
Mount-2: Tries to create a new directory fails with EEXIST
Mount-2: Does chown with uid as 9999 uid:gid at the end is 9999:0
Mount-1: Needs to set dht layout so triggers self-heal as part of that it sets the uid:gid back to 0:0
mount-2: Does chown with gid as 9999 uid:gid at the end is 0:9999
mount-2: Gets uid:gid and gets 0:9999 instead of 9999:9999
mount-1: Does chown with uid as 9999 uid:gid at the end is 9999:9999
mount-1: Does chown with gid as 9999 uid:gid at the end is 9999:9999

I am not sure what exactly needs to be fixed in dht.

--- Additional comment from Pranith Kumar K on 2014-09-05 06:18:59 EDT ---

(In reply to Pranith Kumar K from comment #3)
> RCA for the bug:
> Mount-1: Creates a new directory uid:gid is 0:0
Mount-2: Tries to create the same directory above and fails with EEXIST
All the following operations happen on this same directory from here on
> Mount-2: Does chown with uid as 9999 uid:gid at the end is 9999:0
> Mount-1: Needs to set dht layout so triggers self-heal as part of that it
> sets the uid:gid back to 0:0
> mount-2: Does chown with gid as 9999 uid:gid at the end is 0:9999
> mount-2: Gets uid:gid and gets 0:9999 instead of 9999:9999
> mount-1: Does chown with uid as 9999 uid:gid at the end is 9999:9999
> mount-1: Does chown with gid as 9999 uid:gid at the end is 9999:9999
> 
> I am not sure what exactly needs to be fixed in dht.


Note You need to log in before you can comment on or make changes to this bug.