Bug 1382261

Summary: nonroot user getting permission denied error inconsistently when trying to access a directory (bricks are down)
Product: Red Hat Gluster Storage Reporter: Nag Pavan Chilakam <nchilaka>
Component: distributeAssignee: Susant Kumar Palai <spalai>
Status: CLOSED CURRENTRELEASE QA Contact: Prasad Desala <tdesala>
Severity: high Docs Contact:
Priority: high    
Version: rhgs-3.2CC: amukherj, nbalacha, nchilaka, rhinduja, rhs-bugs, sheggodu, spalai, storage-qa-internal
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: dht-perms
Fixed In Version: glusterfs-3.12.2-14 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-26 10:09:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Nag Pavan Chilakam 2016-10-06 07:18:49 UTC
Description of problem:
========================
While running systemic testing 
https://docs.google.com/spreadsheets/d/1iP5Mi1TewBFVh8HTmlcBm9072Bgsbgkr3CLcGmawDys/edit#gid=632186609



I was glancing through the logs when I saw this error on one of the client logs as below:
[2016-10-05 14:41:04.302381] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-distrepvol-client-2: changing port to 49154 (from 0)
[2016-10-05 14:41:04.308277] E [socket.c:2309:socket_connect_finish] 0-distrepvol-client-2: connection to 10.70.35.3:49154 failed (Connection refused)
[2016-10-05 14:41:04.322875] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-distrepvol-client-7: changing port to 49155 (from 0)
[2016-10-05 14:41:04.329144] E [socket.c:2309:socket_connect_finish] 0-distrepvol-client-7: connection to 10.70.37.150:49155 failed (Connection refused)
[2016-10-05 14:41:05.339299] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-distrepvol-client-0: changing port to 49154 (from 0)
[2016-10-05 14:41:05.345626] E [socket.c:2309:socket_connect_finish] 0-distrepvol-client-0: connection to 10.70.35.191:49154 failed (Connection refused)
[2016-10-05 14:41:05.583683] E [MSGID: 114031] [client-rpc-fops.c:2847:client3_3_opendir_cbk] 0-distrepvol-client-3: remote operation failed. Path: /rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73 (8aa88cb3-d299-49f6-a821-a097c5d4c97a) [Permission denied]
[2016-10-05 14:41:05.598765] W [MSGID: 114031] [client-rpc-fops.c:2648:client3_3_readdirp_cbk] 0-distrepvol-client-3: remote operation failed [Permission denied]
[2016-10-05 14:41:06.322790] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-distrepvol-client-5: changing port to 49155 (from 0)
[2016-10-05 14:41:06.328507] E [socket.c:2309:socket_connect_finish] 0-distrepvol-client-5: connection to 10.70.37.187:49155 failed (Connection refused)
[2016-10-05 14:41:07.330769] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-distrepvol-client-2: changing port to 49154 (from 0)


Note that some of the bricks(one brick in each replica pair ) are brought down.

(As a non-root user)Hence I went to the mount point and tried to access the directory , and when i tried to access the directory, I see that sometimes I get permission denied error, while sometimes i get the access as below

See In below 5 attempts I was denied twice and allowed the next three times:
[nchilaka@rhs-client12 glusterfs]$ ls /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73: Permission denied
[nchilaka@rhs-client12 glusterfs]$ ls /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73: Permission denied
[nchilaka@rhs-client12 glusterfs]$ ls /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73
[nchilaka@rhs-client12 glusterfs]$ 
[nchilaka@rhs-client12 glusterfs]$ ls /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73
[nchilaka@rhs-client12 glusterfs]$ ls /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73


Following is the permission for the directory:
drwxr-xr-x. 2 root root 4096 Oct  3 10:25 level5.73


Note that I have 4 el6.x clients from which I have triggered a parallel lookup, as a non root user(running under a discrete screen session)
This client is one of them


I am observing same issue from another client too as non-root user:
[nchilaka@rhs-client11 glusterfs]$ ls -lR /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/
/mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/:
total 0
[nchilaka@rhs-client11 glusterfs]$ ls -lR /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied



As a root user, I triggered a lookup in a loop( ls -lR /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/), But i didn't see any permission denied error even once

I reran the same case on a nonroot user, where for the first ~20 times the lookup succeeeded, but the remaining ~80 times it threw permission denied erro


[nchilaka@rhs-client12 ~]$ for i in {1..100};do  ls -lR /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/ ; done
/mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/:
total 0
/mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/:
total 0
/mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/:
total 0
/mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/:
total 0
/mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/:
total 0
/mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/:
total 0
/mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/:
total 0
/mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/:
total 0
/mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/:
total 0
/mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/:
total 0
/mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/:
total 0
/mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/:
total 0
/mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/:
total 0
/mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/:
total 0
/mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/:
total 0
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied


Version-Release number of selected component (if applicable):
===================
[nchilaka@rhs-client12 ~]$ rpm -qa|grep gluster
glusterfs-client-xlators-3.8.4-1.el6.x86_64
cglusterfs-fuse-3.8.4-1.el6.x86_64
atglusterfs-api-3.8.4-1.el6.x86_64
 glusterfs-3.8.4-1.el6.x86_64
/eglusterfs-libs-3.8.4-1.el6.x86_64
glusterfs-debuginfo-3.8.4-1.el6.x86_64
[nchilaka@rhs-client12 ~]$ cat /etc/redhat-*
cat: /etc/redhat-lsb: Is a directory
Red Hat Enterprise Linux Server release 6.6 (Santiago)

(note :seen this error on other el6.x clients too like el6.7 and el6.8

Comment 2 Nag Pavan Chilakam 2016-10-06 07:48:14 UTC
An update, seeing this problem even on rhel 7.2 client

Comment 3 Nag Pavan Chilakam 2016-10-06 10:10:48 UTC
 Your comment was:

    One More observation:

    Following Logs are seen on only one brick in the 8x2 volume:: the first brick on 4th node...Which is up and running 
    i.e Brick 10.70.37.150:/rhs/brick1/distrepvol   49154     0          Y       3918 

    (could be the brick to which the dir is hashing to by default)


    [2016-10-06 09:01:56.539646] E [MSGID: 115056] [server-rpc-fops.c:667:server_opendir_cbk] 0-distrepvol-server: 14115751: OPENDIR /rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73 (8aa88cb3-d299-49f6-a821-a097c5d4c97a) ==> (Permission denied) [Permission denied]
    [2016-10-06 09:01:56.542680] E [MSGID: 115056] [server-rpc-fops.c:1806:server_readdirp_cbk] 0-distrepvol-server: 14115755: READDIRP -2 (8aa88cb3-d299-49f6-a821-a097c5d4c97a) ==> (Permission denied) [Permission denied]





    [root@dhcp35-191 ~]# gluster v status
    Status of volume: distrepvol
    Gluster process                             TCP Port  RDMA Port  Online  Pid
    ------------------------------------------------------------------------------
    Brick 10.70.35.191:/rhs/brick1/distrepvol   N/A       N/A        N       N/A  
    Brick 10.70.37.187:/rhs/brick1/distrepvol   49154     0          Y       3867 
    Brick 10.70.35.3:/rhs/brick1/distrepvol     N/A       N/A        N       N/A  
    Brick 10.70.37.150:/rhs/brick1/distrepvol   49154     0          Y       3918 
    Brick 10.70.35.191:/rhs/brick2/distrepvol   49155     0          Y       7568 
    Brick 10.70.37.187:/rhs/brick2/distrepvol   N/A       N/A        N       N/A  
    Brick 10.70.35.3:/rhs/brick2/distrepvol     49155     0          Y       5341 
    Brick 10.70.37.150:/rhs/brick2/distrepvol   N/A       N/A        N       N/A  
    Snapshot Daemon on localhost                49152     0          Y       8358 
    Self-heal Daemon on localhost               N/A       N/A        Y       7588 
    Quota Daemon on localhost                   N/A       N/A        Y       8211 
    Snapshot Daemon on 10.70.37.187             49152     0          Y       4428 
    Self-heal Daemon on 10.70.37.187            N/A       N/A        Y       3907 
    Quota Daemon on 10.70.37.187                N/A       N/A        Y       4330 
    Snapshot Daemon on 10.70.37.150             49152     0          Y       4477 
    Self-heal Daemon on 10.70.37.150            N/A       N/A        Y       3957 
    Quota Daemon on 10.70.37.150                N/A       N/A        Y       4380 
    Snapshot Daemon on 10.70.35.3               49152     0          Y       5858 
    Self-heal Daemon on 10.70.35.3              N/A       N/A        Y       5361 
    Quota Daemon on 10.70.35.3                  N/A       N/A        Y       5762 
     
    Task Status of Volume distrepvol
    ------------------------------------------------------------------------------
    There are no active volume tasks

Comment 4 Nithya Balachandran 2016-10-27 09:50:30 UTC
This sounds like the permissions for this directory are different on this brick but I cannot confirm as the brick is no longer available due to BZ#1385606.

Comment 5 Atin Mukherjee 2016-11-07 13:21:36 UTC
Nag,

Is there a way this can be recreated? With out that this BZ can not be further looked at as there is no other relevant information available in the bug.

Comment 6 Nag Pavan Chilakam 2016-11-22 15:19:52 UTC
I am able to recreate this.
You can log on rhs-client24 machine and as a noon-root user nchilaka(pswd:redhat)
and do a ls -lRt of the whole mount. It will take a lot of time, so better to o/p error to a file and run in screen
I see lot of permision denied error.
For eg do ls /mnt/sysvol//test-arena/kernel-untar/dir.26/linux-4.8.6/Documentation/devicetree/bindings/timer
 and check the fuse mount log. you can see permission denied error

Comment 7 Susant Kumar Palai 2016-11-23 09:27:05 UTC
(In reply to nchilaka from comment #6)
> I am able to recreate this.
> You can log on rhs-client24 machine and as a noon-root user
> nchilaka(pswd:redhat)
> and do a ls -lRt of the whole mount. It will take a lot of time, so better
> to o/p error to a file and run in screen
> I see lot of permision denied error.
> For eg do ls
> /mnt/sysvol//test-arena/kernel-untar/dir.26/linux-4.8.6/Documentation/
> devicetree/bindings/timer
>  and check the fuse mount log. you can see permission denied error

Had a look at Nag's system. There is a permission split brain among different subvols. 

Subvol replicate-3 (10.70.35.3:/rhs/brick2/sysvol  10.70.37.66:/rhs/brick2/sysvol) has permission "drwxrwxr-x".

While replicate-1 (10.70.35.3:/rhs/brick1/sysvol 10.70.37.66:/rhs/brick1/sysvol) has permission "drwx------".

The permission of the directories were changed most likely when one of the subvol was down. Hence, permission error is seen "sometimes". If the permission update happens from replicate-3, then there will be no problem, but if the permission update happens from replicate-1, user will see a permission error.

Comment 8 Susant Kumar Palai 2016-11-24 09:35:00 UTC
The fix for the above scenario depends on the ctime of the directories. Now which can change with different factors e.g afr selfheal or quotad etc. If the ctime of a directory with bad attr is bumped up as a consequence of above reasons, we may heal a wrong attr.

Given the fix would be intrusive, would like to move this bug out of 3.2.0 scope.

Comment 10 Nag Pavan Chilakam 2017-02-23 13:38:50 UTC
seeing this issue even when all bricks were up on final regression run of systemic testing on 3.8.4-14

Comment 17 Susant Kumar Palai 2018-11-26 10:02:56 UTC
Moving this to ON_QA as I found patch: https://review.gluster.org/#/c/glusterfs/+/20108/ should be fixing this. 

Tested locally.