Hide Forgot
Description of problem: ======================== While running systemic testing https://docs.google.com/spreadsheets/d/1iP5Mi1TewBFVh8HTmlcBm9072Bgsbgkr3CLcGmawDys/edit#gid=632186609 I was glancing through the logs when I saw this error on one of the client logs as below: [2016-10-05 14:41:04.302381] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-distrepvol-client-2: changing port to 49154 (from 0) [2016-10-05 14:41:04.308277] E [socket.c:2309:socket_connect_finish] 0-distrepvol-client-2: connection to 10.70.35.3:49154 failed (Connection refused) [2016-10-05 14:41:04.322875] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-distrepvol-client-7: changing port to 49155 (from 0) [2016-10-05 14:41:04.329144] E [socket.c:2309:socket_connect_finish] 0-distrepvol-client-7: connection to 10.70.37.150:49155 failed (Connection refused) [2016-10-05 14:41:05.339299] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-distrepvol-client-0: changing port to 49154 (from 0) [2016-10-05 14:41:05.345626] E [socket.c:2309:socket_connect_finish] 0-distrepvol-client-0: connection to 10.70.35.191:49154 failed (Connection refused) [2016-10-05 14:41:05.583683] E [MSGID: 114031] [client-rpc-fops.c:2847:client3_3_opendir_cbk] 0-distrepvol-client-3: remote operation failed. Path: /rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73 (8aa88cb3-d299-49f6-a821-a097c5d4c97a) [Permission denied] [2016-10-05 14:41:05.598765] W [MSGID: 114031] [client-rpc-fops.c:2648:client3_3_readdirp_cbk] 0-distrepvol-client-3: remote operation failed [Permission denied] [2016-10-05 14:41:06.322790] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-distrepvol-client-5: changing port to 49155 (from 0) [2016-10-05 14:41:06.328507] E [socket.c:2309:socket_connect_finish] 0-distrepvol-client-5: connection to 10.70.37.187:49155 failed (Connection refused) [2016-10-05 14:41:07.330769] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-distrepvol-client-2: changing port to 49154 (from 0) Note that some of the bricks(one brick in each replica pair ) are brought down. (As a non-root user)Hence I went to the mount point and tried to access the directory , and when i tried to access the directory, I see that sometimes I get permission denied error, while sometimes i get the access as below See In below 5 attempts I was denied twice and allowed the next three times: [nchilaka@rhs-client12 glusterfs]$ ls /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73 ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73: Permission denied [nchilaka@rhs-client12 glusterfs]$ ls /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73 ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73: Permission denied [nchilaka@rhs-client12 glusterfs]$ ls /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73 [nchilaka@rhs-client12 glusterfs]$ [nchilaka@rhs-client12 glusterfs]$ ls /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73 [nchilaka@rhs-client12 glusterfs]$ ls /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73 Following is the permission for the directory: drwxr-xr-x. 2 root root 4096 Oct 3 10:25 level5.73 Note that I have 4 el6.x clients from which I have triggered a parallel lookup, as a non root user(running under a discrete screen session) This client is one of them I am observing same issue from another client too as non-root user: [nchilaka@rhs-client11 glusterfs]$ ls -lR /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/ /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: total 0 [nchilaka@rhs-client11 glusterfs]$ ls -lR /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/ ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied As a root user, I triggered a lookup in a loop( ls -lR /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/), But i didn't see any permission denied error even once I reran the same case on a nonroot user, where for the first ~20 times the lookup succeeeded, but the remaining ~80 times it threw permission denied erro [nchilaka@rhs-client12 ~]$ for i in {1..100};do ls -lR /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/ ; done /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: total 0 /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: total 0 /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: total 0 /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: total 0 /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: total 0 /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: total 0 /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: total 0 /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: total 0 /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: total 0 /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: total 0 /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: total 0 /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: total 0 /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: total 0 /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: total 0 /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: total 0 ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied ls: cannot open directory /mnt/distrepvol/rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73/: Permission denied Version-Release number of selected component (if applicable): =================== [nchilaka@rhs-client12 ~]$ rpm -qa|grep gluster glusterfs-client-xlators-3.8.4-1.el6.x86_64 cglusterfs-fuse-3.8.4-1.el6.x86_64 atglusterfs-api-3.8.4-1.el6.x86_64 glusterfs-3.8.4-1.el6.x86_64 /eglusterfs-libs-3.8.4-1.el6.x86_64 glusterfs-debuginfo-3.8.4-1.el6.x86_64 [nchilaka@rhs-client12 ~]$ cat /etc/redhat-* cat: /etc/redhat-lsb: Is a directory Red Hat Enterprise Linux Server release 6.6 (Santiago) (note :seen this error on other el6.x clients too like el6.7 and el6.8
An update, seeing this problem even on rhel 7.2 client
Your comment was: One More observation: Following Logs are seen on only one brick in the 8x2 volume:: the first brick on 4th node...Which is up and running i.e Brick 10.70.37.150:/rhs/brick1/distrepvol 49154 0 Y 3918 (could be the brick to which the dir is hashing to by default) [2016-10-06 09:01:56.539646] E [MSGID: 115056] [server-rpc-fops.c:667:server_opendir_cbk] 0-distrepvol-server: 14115751: OPENDIR /rootdir1/renames/dir_samenames/level1.1/level2.2/level3.14/level4.62/level5.73 (8aa88cb3-d299-49f6-a821-a097c5d4c97a) ==> (Permission denied) [Permission denied] [2016-10-06 09:01:56.542680] E [MSGID: 115056] [server-rpc-fops.c:1806:server_readdirp_cbk] 0-distrepvol-server: 14115755: READDIRP -2 (8aa88cb3-d299-49f6-a821-a097c5d4c97a) ==> (Permission denied) [Permission denied] [root@dhcp35-191 ~]# gluster v status Status of volume: distrepvol Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.191:/rhs/brick1/distrepvol N/A N/A N N/A Brick 10.70.37.187:/rhs/brick1/distrepvol 49154 0 Y 3867 Brick 10.70.35.3:/rhs/brick1/distrepvol N/A N/A N N/A Brick 10.70.37.150:/rhs/brick1/distrepvol 49154 0 Y 3918 Brick 10.70.35.191:/rhs/brick2/distrepvol 49155 0 Y 7568 Brick 10.70.37.187:/rhs/brick2/distrepvol N/A N/A N N/A Brick 10.70.35.3:/rhs/brick2/distrepvol 49155 0 Y 5341 Brick 10.70.37.150:/rhs/brick2/distrepvol N/A N/A N N/A Snapshot Daemon on localhost 49152 0 Y 8358 Self-heal Daemon on localhost N/A N/A Y 7588 Quota Daemon on localhost N/A N/A Y 8211 Snapshot Daemon on 10.70.37.187 49152 0 Y 4428 Self-heal Daemon on 10.70.37.187 N/A N/A Y 3907 Quota Daemon on 10.70.37.187 N/A N/A Y 4330 Snapshot Daemon on 10.70.37.150 49152 0 Y 4477 Self-heal Daemon on 10.70.37.150 N/A N/A Y 3957 Quota Daemon on 10.70.37.150 N/A N/A Y 4380 Snapshot Daemon on 10.70.35.3 49152 0 Y 5858 Self-heal Daemon on 10.70.35.3 N/A N/A Y 5361 Quota Daemon on 10.70.35.3 N/A N/A Y 5762 Task Status of Volume distrepvol ------------------------------------------------------------------------------ There are no active volume tasks
This sounds like the permissions for this directory are different on this brick but I cannot confirm as the brick is no longer available due to BZ#1385606.
Nag, Is there a way this can be recreated? With out that this BZ can not be further looked at as there is no other relevant information available in the bug.
I am able to recreate this. You can log on rhs-client24 machine and as a noon-root user nchilaka(pswd:redhat) and do a ls -lRt of the whole mount. It will take a lot of time, so better to o/p error to a file and run in screen I see lot of permision denied error. For eg do ls /mnt/sysvol//test-arena/kernel-untar/dir.26/linux-4.8.6/Documentation/devicetree/bindings/timer and check the fuse mount log. you can see permission denied error
(In reply to nchilaka from comment #6) > I am able to recreate this. > You can log on rhs-client24 machine and as a noon-root user > nchilaka(pswd:redhat) > and do a ls -lRt of the whole mount. It will take a lot of time, so better > to o/p error to a file and run in screen > I see lot of permision denied error. > For eg do ls > /mnt/sysvol//test-arena/kernel-untar/dir.26/linux-4.8.6/Documentation/ > devicetree/bindings/timer > and check the fuse mount log. you can see permission denied error Had a look at Nag's system. There is a permission split brain among different subvols. Subvol replicate-3 (10.70.35.3:/rhs/brick2/sysvol 10.70.37.66:/rhs/brick2/sysvol) has permission "drwxrwxr-x". While replicate-1 (10.70.35.3:/rhs/brick1/sysvol 10.70.37.66:/rhs/brick1/sysvol) has permission "drwx------". The permission of the directories were changed most likely when one of the subvol was down. Hence, permission error is seen "sometimes". If the permission update happens from replicate-3, then there will be no problem, but if the permission update happens from replicate-1, user will see a permission error.
The fix for the above scenario depends on the ctime of the directories. Now which can change with different factors e.g afr selfheal or quotad etc. If the ctime of a directory with bad attr is bumped up as a consequence of above reasons, we may heal a wrong attr. Given the fix would be intrusive, would like to move this bug out of 3.2.0 scope.
seeing this issue even when all bricks were up on final regression run of systemic testing on 3.8.4-14
Moving this to ON_QA as I found patch: https://review.gluster.org/#/c/glusterfs/+/20108/ should be fixing this. Tested locally.