Bug 1382261 - nonroot user getting permission denied error inconsistently when trying to access a directory (bricks are down)
Summary: nonroot user getting permission denied error inconsistently when trying to a...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: distribute
Version: rhgs-3.2
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Susant Kumar Palai
QA Contact: Prasad Desala
URL:
Whiteboard: dht-perms
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-06 07:18 UTC by Nag Pavan Chilakam
Modified: 2018-11-26 10:09 UTC (History)
8 users (show)

Fixed In Version: glusterfs-3.12.2-14
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-26 10:09:48 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Nag Pavan Chilakam 2016-10-06 07:18:49 UTC
Description of problem:
========================
While running systemic testing 
https://docs.google.com/spreadsheets/d/1iP5Mi1TewBFVh8HTmlcBm9072Bgsbgkr3CLcGmawDys/edit#gid=632186609



I was glancing through the logs when I saw this error on one of the client logs as below:
[2016-10-05 14:41:04.302381] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-distrepvol-client-2: changing port to 49154 (from 0)
[2016-10-05 14:41:04.308277] E [socket.c:2309:socket_connect_finish] 0-distrepvol-client-2: connection to 10.70.35.3:49154 failed (Connection refused)
[2016-10-05 14:41:04.322875] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-distrepvol-client-7: changing port to 49155 (from 0)
[2016-10-05 14:41:04.329144] E [socket.c:2309:socket_connect_finish] 0-distrepvol-client-7: connection to 10.70.37.150:49155 failed (Connection refused)
[2016-10-05 14:41:05.339299] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-distrepvol-client-0: changing port to 49154 (from 0)
[2016-10-05 14:41:05.345626] E [socket.c:2309:socket_connect_finish] 0-distrepvol-client-0: connection to 10.70.35.191:49154 failed (Connection refused)
[2016-10-05 14:41:05.583683] E [MSGID: 114031] [client-rpc-fops.c:2847:client3_3_opendir_cbk] 0-distrepvol-client-3: remote operation failed. Path: /rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73 (8aa88cb3-d299-49f6-a821-a097c5d4c97a) [Permission denied]
[2016-10-05 14:41:05.598765] W [MSGID: 114031] [client-rpc-fops.c:2648:client3_3_readdirp_cbk] 0-distrepvol-client-3: remote operation failed [Permission denied]
[2016-10-05 14:41:06.322790] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-distrepvol-client-5: changing port to 49155 (from 0)
[2016-10-05 14:41:06.328507] E [socket.c:2309:socket_connect_finish] 0-distrepvol-client-5: connection to 10.70.37.187:49155 failed (Connection refused)
[2016-10-05 14:41:07.330769] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-distrepvol-client-2: changing port to 49154 (from 0)


Note that some of the bricks(one brick in each replica pair ) are brought down.

(As a non-root user)Hence I went to the mount point and tried to access the directory , and when i tried to access the directory, I see that sometimes I get permission denied error, while sometimes i get the access as below

See In below 5 attempts I was denied twice and allowed the next three times:
[nchilaka@rhs-client12 glusterfs]$ ls /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73: Permission denied
[nchilaka@rhs-client12 glusterfs]$ ls /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73: Permission denied
[nchilaka@rhs-client12 glusterfs]$ ls /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73
[nchilaka@rhs-client12 glusterfs]$ 
[nchilaka@rhs-client12 glusterfs]$ ls /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73
[nchilaka@rhs-client12 glusterfs]$ ls /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73


Following is the permission for the directory:
drwxr-xr-x. 2 root root 4096 Oct  3 10:25 level5.73


Note that I have 4 el6.x clients from which I have triggered a parallel lookup, as a non root user(running under a discrete screen session)
This client is one of them


I am observing same issue from another client too as non-root user:
[nchilaka@rhs-client11 glusterfs]$ ls -lR /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/
/mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/:
total 0
[nchilaka@rhs-client11 glusterfs]$ ls -lR /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied



As a root user, I triggered a lookup in a loop( ls -lR /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/), But i didn't see any permission denied error even once

I reran the same case on a nonroot user, where for the first ~20 times the lookup succeeeded, but the remaining ~80 times it threw permission denied erro


[nchilaka@rhs-client12 ~]$ for i in {1..100};do  ls -lR /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/ ; done
/mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/:
total 0
/mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/:
total 0
/mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/:
total 0
/mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/:
total 0
/mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/:
total 0
/mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/:
total 0
/mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/:
total 0
/mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/:
total 0
/mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/:
total 0
/mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/:
total 0
/mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/:
total 0
/mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/:
total 0
/mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/:
total 0
/mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/:
total 0
/mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/:
total 0
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied
ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied


Version-Release number of selected component (if applicable):
===================
[nchilaka@rhs-client12 ~]$ rpm -qa|grep gluster
glusterfs-client-xlators-3.8.4-1.el6.x86_64
cglusterfs-fuse-3.8.4-1.el6.x86_64
atglusterfs-api-3.8.4-1.el6.x86_64
 glusterfs-3.8.4-1.el6.x86_64
/eglusterfs-libs-3.8.4-1.el6.x86_64
glusterfs-debuginfo-3.8.4-1.el6.x86_64
[nchilaka@rhs-client12 ~]$ cat /etc/redhat-*
cat: /etc/redhat-lsb: Is a directory
Red Hat Enterprise Linux Server release 6.6 (Santiago)

(note :seen this error on other el6.x clients too like el6.7 and el6.8

Comment 2 Nag Pavan Chilakam 2016-10-06 07:48:14 UTC
An update, seeing this problem even on rhel 7.2 client

Comment 3 Nag Pavan Chilakam 2016-10-06 10:10:48 UTC
 Your comment was:

    One More observation:

    Following Logs are seen on only one brick in the 8x2 volume:: the first brick on 4th node...Which is up and running 
    i.e Brick 10.70.37.150:/rhs/brick1/distrepvol   49154     0          Y       3918 

    (could be the brick to which the dir is hashing to by default)


    [2016-10-06 09:01:56.539646] E [MSGID: 115056] [server-rpc-fops.c:667:server_opendir_cbk] 0-distrepvol-server: 14115751: OPENDIR /rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73 (8aa88cb3-d299-49f6-a821-a097c5d4c97a) ==> (Permission denied) [Permission denied]
    [2016-10-06 09:01:56.542680] E [MSGID: 115056] [server-rpc-fops.c:1806:server_readdirp_cbk] 0-distrepvol-server: 14115755: READDIRP -2 (8aa88cb3-d299-49f6-a821-a097c5d4c97a) ==> (Permission denied) [Permission denied]





    [root@dhcp35-191 ~]# gluster v status
    Status of volume: distrepvol
    Gluster process                             TCP Port  RDMA Port  Online  Pid
    ------------------------------------------------------------------------------
    Brick 10.70.35.191:/rhs/brick1/distrepvol   N/A       N/A        N       N/A  
    Brick 10.70.37.187:/rhs/brick1/distrepvol   49154     0          Y       3867 
    Brick 10.70.35.3:/rhs/brick1/distrepvol     N/A       N/A        N       N/A  
    Brick 10.70.37.150:/rhs/brick1/distrepvol   49154     0          Y       3918 
    Brick 10.70.35.191:/rhs/brick2/distrepvol   49155     0          Y       7568 
    Brick 10.70.37.187:/rhs/brick2/distrepvol   N/A       N/A        N       N/A  
    Brick 10.70.35.3:/rhs/brick2/distrepvol     49155     0          Y       5341 
    Brick 10.70.37.150:/rhs/brick2/distrepvol   N/A       N/A        N       N/A  
    Snapshot Daemon on localhost                49152     0          Y       8358 
    Self-heal Daemon on localhost               N/A       N/A        Y       7588 
    Quota Daemon on localhost                   N/A       N/A        Y       8211 
    Snapshot Daemon on 10.70.37.187             49152     0          Y       4428 
    Self-heal Daemon on 10.70.37.187            N/A       N/A        Y       3907 
    Quota Daemon on 10.70.37.187                N/A       N/A        Y       4330 
    Snapshot Daemon on 10.70.37.150             49152     0          Y       4477 
    Self-heal Daemon on 10.70.37.150            N/A       N/A        Y       3957 
    Quota Daemon on 10.70.37.150                N/A       N/A        Y       4380 
    Snapshot Daemon on 10.70.35.3               49152     0          Y       5858 
    Self-heal Daemon on 10.70.35.3              N/A       N/A        Y       5361 
    Quota Daemon on 10.70.35.3                  N/A       N/A        Y       5762 
     
    Task Status of Volume distrepvol
    ------------------------------------------------------------------------------
    There are no active volume tasks

Comment 4 Nithya Balachandran 2016-10-27 09:50:30 UTC
This sounds like the permissions for this directory are different on this brick but I cannot confirm as the brick is no longer available due to BZ#1385606.

Comment 5 Atin Mukherjee 2016-11-07 13:21:36 UTC
Nag,

Is there a way this can be recreated? With out that this BZ can not be further looked at as there is no other relevant information available in the bug.

Comment 6 Nag Pavan Chilakam 2016-11-22 15:19:52 UTC
I am able to recreate this.
You can log on rhs-client24 machine and as a noon-root user nchilaka(pswd:redhat)
and do a ls -lRt of the whole mount. It will take a lot of time, so better to o/p error to a file and run in screen
I see lot of permision denied error.
For eg do ls /mnt/sysvol//test-arena/kernel-untar/dir.26/linux-4.8.6/Documentation/devicetree/bindings/timer
 and check the fuse mount log. you can see permission denied error

Comment 7 Susant Kumar Palai 2016-11-23 09:27:05 UTC
(In reply to nchilaka from comment #6)
> I am able to recreate this.
> You can log on rhs-client24 machine and as a noon-root user
> nchilaka(pswd:redhat)
> and do a ls -lRt of the whole mount. It will take a lot of time, so better
> to o/p error to a file and run in screen
> I see lot of permision denied error.
> For eg do ls
> /mnt/sysvol//test-arena/kernel-untar/dir.26/linux-4.8.6/Documentation/
> devicetree/bindings/timer
>  and check the fuse mount log. you can see permission denied error

Had a look at Nag's system. There is a permission split brain among different subvols. 

Subvol replicate-3 (10.70.35.3:/rhs/brick2/sysvol  10.70.37.66:/rhs/brick2/sysvol) has permission "drwxrwxr-x".

While replicate-1 (10.70.35.3:/rhs/brick1/sysvol 10.70.37.66:/rhs/brick1/sysvol) has permission "drwx------".

The permission of the directories were changed most likely when one of the subvol was down. Hence, permission error is seen "sometimes". If the permission update happens from replicate-3, then there will be no problem, but if the permission update happens from replicate-1, user will see a permission error.

Comment 8 Susant Kumar Palai 2016-11-24 09:35:00 UTC
The fix for the above scenario depends on the ctime of the directories. Now which can change with different factors e.g afr selfheal or quotad etc. If the ctime of a directory with bad attr is bumped up as a consequence of above reasons, we may heal a wrong attr.

Given the fix would be intrusive, would like to move this bug out of 3.2.0 scope.

Comment 10 Nag Pavan Chilakam 2017-02-23 13:38:50 UTC
seeing this issue even when all bricks were up on final regression run of systemic testing on 3.8.4-14

Comment 17 Susant Kumar Palai 2018-11-26 10:02:56 UTC
Moving this to ON_QA as I found patch: https://review.gluster.org/#/c/glusterfs/+/20108/ should be fixing this. 

Tested locally.


Note You need to log in before you can comment on or make changes to this bug.