Bug 1794019 - Mounts fails after reboot of 1/3 gluster nodes
Summary: Mounts fails after reboot of 1/3 gluster nodes
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: 7
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
Assignee: Mohit Agrawal
QA Contact:
URL:
Whiteboard:
Depends On: 1793035 1793852 1794020 1804512
Blocks: 1788913
TreeView+ depends on / blocked
 
Reported: 2020-01-22 14:07 UTC by Mohit Agrawal
Modified: 2020-02-19 03:20 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1793852
Environment:
Last Closed: 2020-02-10 08:03:34 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gluster.org Gerrit 24060 0 None Merged server: Mount fails after reboot 1/3 gluster nodes 2020-02-10 08:03:33 UTC

Description Mohit Agrawal 2020-01-22 14:07:07 UTC
+++ This bug was initially created as a clone of Bug #1793852 +++

+++ This bug was initially created as a clone of Bug #1793035 +++

This bug was initially created as a copy of Bug #1788913

I am copying this bug because: 



Description of problem:
App pods backed up by glusterfs storage is going into CrashLoopBackOff because Liveness probe failed and Liveness probe errored.

In the Liveness probe, I am verifying that we are able to write data on the mounted glusterfs volumes
Script used for creating app pod https://raw.githubusercontent.com/iamniting/ocs/master/scripts/cirros-create-with-io.sh

Version-Release number of selected component (if applicable):
openshift v3.11.157
kubernetes v1.11.0+d4cacc0
docker-1.13.1-104.git4ef4b30.el7.x86_64
glusterfs-6.0-25.el7.x86_64
rhgs3-rhgs-server-rhel7:3.11.5-2

How reproducible:


Steps to Reproduce:
1. Create 1800 volumes.
2. Create app pods in such a way that they use all brick processes. (eg. There will be 1800/250 brick processes running. Create 100 app pods on each brick processes)
3. Restart glusterfs pod.
4. All pods are running which is consuming the first 3 brick processes. After the 751st volume, most of the app pods are in CrashLoopBackOff state.

Actual results:
$ oc get pods | grep cirros | wc -l
600
$ oc get pods | grep cirros | grep 0/1 | wc -l
205

Expected results:


Additional info:
I am using AWS for the testing. I have 19 nodes. Glusters nodes are tuned up to host 2000 volumes on it.

--- Additional comment from Raghavendra Talur on 2020-01-20 14:25:39 UTC ---

Test case:

Create more than 250(around 300) 1x3 volumes and have brick mux on.
Mount all volumes on a client machine
Reboot any one of the 3 gluster nodes
All the mounts should continue to work.

Observation:
Some of the mount go to failed state and even mount fails with the error
[2020-01-14 17:27:54.009038] E [MSGID: 114058] [client-handshake.c:1449:client_query_portmap_cbk] 0-ocs_glusterfs_claim0735_e7683536-362a-11ea-901b-068049869906-client-2: failed to get the po
rt number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2020-01-14 17:28:34.607674] E [MSGID: 114044] [client-handshake.c:1031:client_setvolume_cbk] 0-ocs_glusterfs_claim0735_e7683536-362a-11ea-901b-068049869906-client-2: SETVOLUME on remote-host
 failed: Authentication failed [Permission denied]
[2020-01-14 17:28:34.607731] E [fuse-bridge.c:6292:notify] 0-fuse: Server authenication failed. Shutting down.

--- Additional comment from Sunil Kumar Acharya on 2020-01-20 14:44:42 UTC ---

How reproducible is this bug?

>glusterfs-6.0-25.el7.x86_64

Can we try with the latest release of glusterfs.

--- Additional comment from Mohit Agrawal on 2020-01-20 15:27:55 UTC ---

Hi,

I have tried to reproduce it on non-OCS environment, I am not able to reproduce it.

1) Setup 300 volumes and enable brick mux and start all the volumes
2) for i in {1..300}; do mkdir /mnt$i; mount -t glusterfs <gluster-1>:/test$i /mnt$i; done
3) Reboot gluster-1 node
4) grep -il "permission" /var/log/glusterfs/mnt*
   Not getting any error specific to permission denied.

Regards,
Mohit Agrawal

--- Additional comment from Raghavendra Talur on 2020-01-20 16:53:14 UTC ---

Prasanth,

It would be great if Nitin can help someone in RHGS QE to reproduce the issue with RHGS setup. It should be pretty easy based on how frequently we were to do that in OCS.
Without a reproducer, we will not be able to make much progress.

--- Additional comment from Prasanth on 2020-01-21 08:07:41 UTC ---

(In reply to Raghavendra Talur from comment #4)
> Prasanth,
> 
> It would be great if Nitin can help someone in RHGS QE to reproduce the
> issue with RHGS setup. It should be pretty easy based on how frequently we
> were to do that in OCS.
> Without a reproducer, we will not be able to make much progress.

Talur, there is no separate RHGS QE anymore in Gluster world as the teams had been merged a few months back! 
I'll assign this to one of the member from the Functional team to try reproducing this issue and provide you with the required details.

--- Additional comment from Prasanth on 2020-01-21 09:06:30 UTC ---

Nitin, since you've been working on reproducing the original bug, can you help Talur in reproducing the same in a RHGS setup (see #Comment 1)and share the test results?

--- Additional comment from nchilaka on 2020-01-21 12:43:28 UTC ---

I was able to reproduce this problem on Standalone RHGS on 6.0.28
(clearing needinfo on Nitin)

1) created 3 node cluster(rhsqa10/11/12), bmux enabled
2) 500 vols of either arbiter of x3 created
3) mounted all 500 volumes on 3 clients(rhsqa6/7/8)
4) started linux untar on 4 volumes parallely on each of the 3 clients(basically, 4 screen sessions in each client, with linux untar being done sequentially on set of 25 vols in each session)
5) did node reboot of rhsqa11
6) post reboot, checked all clients for vol mount sanity
rhsqa6->all vols mounted
rhsqa7-> arb_b68n0p9myx2id failed to mount
[root@rhsqa7 glusterfs]# grep -r "failed: Authentication failed"
mnt-arb_b68n0p9myx2id.log:[2020-01-21 12:32:06.281545] E [MSGID: 114044] [client-handshake.c:1031:client_setvolume_cbk] 0-arb_b68n0p9myx2id-client-1: SETVOLUME on remote-host failed: Authentication failed [Permission denied]

[root@rhsqa7 glusterfs]# df -h|grep arb_b68n0p9myx2id
[root@rhsqa7 glusterfs]# 


rhsqa8: arb_paqdf1yn54t4p failed to mount


[root@rhsqa8 glusterfs]# grep -r "failed: Authentication failed"
mnt-arb_paqdf1yn54t4p.log:[2020-01-21 12:32:06.309217] E [MSGID: 114044] [client-handshake.c:1031:client_setvolume_cbk] 0-arb_paqdf1yn54t4p-client-2: SETVOLUME on remote-host failed: Authentication failed 
[root@rhsqa8 glusterfs]# df -h|grep arb_paqdf1yn54t4p
[root@rhsqa8 glusterfs]# 


all machines available for debugging (IO still in progress)
rhsqa6.lab.eng.blr.redhat.com
root/redhat

--- Additional comment from Mohit Agrawal on 2020-01-21 15:37:35 UTC ---

Hi,

@Nag
Thanks for reproducing the issue. would it be possible for you to test the patch if I do share a build with you?

Below are the client logs throwing at the time of getting client_setvolume_cbk failed.
As we can see here client is getting error only for brick (client-1) not for other clients.
It means other clients are already connected. Here we can see client is getting AUTH_FAILED event 
and when fuse gets AUTH_FAILED it calls fini so client is unmounted.

>>>>>>>>>>>>>>>.

[2020-01-21 11:43:01.806402] I [fuse-bridge.c:5840:fuse_graph_sync] 0-fuse: switched to graph 0
[2020-01-21 12:26:48.626004] I [MSGID: 114018] [client.c:2331:client_rpc_notify] 0-arb_b68n0p9myx2id-client-1: disconnected from arb_b68n0p9myx2id-client-1. Client process will keep trying to connect to glusterd until brick's port is available
[2020-01-21 12:31:56.095015] E [MSGID: 114058] [client-handshake.c:1449:client_query_portmap_cbk] 0-arb_b68n0p9myx2id-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2020-01-21 12:31:56.095094] I [MSGID: 114018] [client.c:2331:client_rpc_notify] 0-arb_b68n0p9myx2id-client-1: disconnected from arb_b68n0p9myx2id-client-1. Client process will keep trying to connect to glusterd until brick's port is available
[2020-01-21 12:32:06.071586] I [rpc-clnt.c:2035:rpc_clnt_reconfig] 0-arb_b68n0p9myx2id-client-1: changing port to 49152 (from 0)
[2020-01-21 12:32:06.281470] W [MSGID: 114043] [client-handshake.c:997:client_setvolume_cbk] 0-arb_b68n0p9myx2id-client-1: failed to set the volume [Permission denied]
[2020-01-21 12:32:06.281528] W [MSGID: 114007] [client-handshake.c:1026:client_setvolume_cbk] 0-arb_b68n0p9myx2id-client-1: failed to get 'process-uuid' from reply dict [Invalid argument]
[2020-01-21 12:32:06.281545] E [MSGID: 114044] [client-handshake.c:1031:client_setvolume_cbk] 0-arb_b68n0p9myx2id-client-1: SETVOLUME on remote-host failed: Authentication failed [Permission denied]
[2020-01-21 12:32:06.281558] I [MSGID: 114049] [client-handshake.c:1115:client_setvolume_cbk] 0-arb_b68n0p9myx2id-client-1: sending AUTH_FAILED event
[2020-01-21 12:32:06.281596] E [fuse-bridge.c:6358:notify] 0-fuse: Server authenication failed. Shutting down.
[2020-01-21 12:32:06.281609] I [fuse-bridge.c:6900:fini] 0-fuse: Unmounting '/mnt/arb_b68n0p9myx2id'.
[2020-01-21 12:32:06.309745] I [fuse-bridge.c:6106:fuse_thread_proc] 0-fuse: initating unmount of /mnt/arb_b68n0p9myx2id
[2020-01-21 12:32:06.309916] I [fuse-bridge.c:6905:fini] 0-fuse: Closing fuse connection to '/mnt/arb_b68n0p9myx2id'.
[2020-01-21 12:32:06.311119] W [glusterfsd.c:1581:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7ea5) [0x7f3a88e0cea5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55a28f6002b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55a28f60011b] ) 0-: received signum (15), shutting down

>>>>>>>>>>>>>>>>>>>>

The client was getting Permission denied because brick was not attached at that moment,server_setvolume execute below code before authenticating a client request.
Here we can see if get_xlator_by_name is returning NULL we are updating this to xl so it means the brick process is assuming if no xlator(volname) is found in graph connect with already running brick but gf_authenticate failed and return EPERM.

LOCK(&ctx->volfile_lock);
    {
        xl = get_xlator_by_name(this, name);
        if (!xl)
            xl = this;
    }
    UNLOCK(&ctx->volfile_lock);


We need to correct this condition to avoid the issue. This code was changed from this patch(https://review.gluster.org/#/c/glusterfs/+/18048/).

Thanks,
Mohit Agrawal

--- Additional comment from Mohit Agrawal on 2020-01-22 02:54:28 UTC ---

Please find test build at the link
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=25931817

--- Additional comment from Mohit Agrawal on 2020-01-22 04:41:03 UTC ---

I have prepared a new build please test this
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=25932471

--- Additional comment from Mohit Agrawal on 2020-01-22 04:41:37 UTC ---

An upstream patch to resolve the same
https://review.gluster.org/#/c/glusterfs/+/24053/

--- Additional comment from Mohit Agrawal on 2020-01-22 05:04:17 UTC ---

Reproducer:
1) created 3 node cluster(host10/11/12), bmux enabled
2) 500 vols of either arbiter of x3 created
3) mounted all 500 volumes on 3 clients(rhsqa6/7/8)
4) started linux untar on 4 volumes parallely on each of the 3 clients(basically, 4 screen sessions in each client, with linux untar being done sequentially on set of 25 vols in each session)
5) did node reboot of rhsqa11
6) post reboot, checked all clients for vol mount sanity
client_host->all vols mounted
client_host-> arb_b68n0p9myx2id failed to mount
[root@client_host glusterfs]# grep -r "failed: Authentication failed"
mnt-arb_b68n0p9myx2id.log:[2020-01-21 12:32:06.281545] E [MSGID: 114044] [client-handshake.c:1031:client_setvolume_cbk] 0-arb_b68n0p9myx2id-client-1: SETVOLUME on remote-host failed: Authentication failed [Permission denied]

RCA:
Below are the client logs throwing at the time of getting client_setvolume_cbk failed.
As we can see here client is getting error only for brick (client-1) not for other clients.
It means other clients are already connected. Here we can see client is getting AUTH_FAILED event 
and when fuse gets AUTH_FAILED it calls fini so client is unmounted.

>>>>>>>>>>>>>>>.

[2020-01-21 11:43:01.806402] I [fuse-bridge.c:5840:fuse_graph_sync] 0-fuse: switched to graph 0
[2020-01-21 12:26:48.626004] I [MSGID: 114018] [client.c:2331:client_rpc_notify] 0-arb_b68n0p9myx2id-client-1: disconnected from arb_b68n0p9myx2id-client-1. Client process will keep trying to connect to glusterd until brick's port is available
[2020-01-21 12:31:56.095015] E [MSGID: 114058] [client-handshake.c:1449:client_query_portmap_cbk] 0-arb_b68n0p9myx2id-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2020-01-21 12:31:56.095094] I [MSGID: 114018] [client.c:2331:client_rpc_notify] 0-arb_b68n0p9myx2id-client-1: disconnected from arb_b68n0p9myx2id-client-1. Client process will keep trying to connect to glusterd until brick's port is available
[2020-01-21 12:32:06.071586] I [rpc-clnt.c:2035:rpc_clnt_reconfig] 0-arb_b68n0p9myx2id-client-1: changing port to 49152 (from 0)
[2020-01-21 12:32:06.281470] W [MSGID: 114043] [client-handshake.c:997:client_setvolume_cbk] 0-arb_b68n0p9myx2id-client-1: failed to set the volume [Permission denied]
[2020-01-21 12:32:06.281528] W [MSGID: 114007] [client-handshake.c:1026:client_setvolume_cbk] 0-arb_b68n0p9myx2id-client-1: failed to get 'process-uuid' from reply dict [Invalid argument]
[2020-01-21 12:32:06.281545] E [MSGID: 114044] [client-handshake.c:1031:client_setvolume_cbk] 0-arb_b68n0p9myx2id-client-1: SETVOLUME on remote-host failed: Authentication failed [Permission denied]
[2020-01-21 12:32:06.281558] I [MSGID: 114049] [client-handshake.c:1115:client_setvolume_cbk] 0-arb_b68n0p9myx2id-client-1: sending AUTH_FAILED event
[2020-01-21 12:32:06.281596] E [fuse-bridge.c:6358:notify] 0-fuse: Server authenication failed. Shutting down.
[2020-01-21 12:32:06.281609] I [fuse-bridge.c:6900:fini] 0-fuse: Unmounting '/mnt/arb_b68n0p9myx2id'.
[2020-01-21 12:32:06.309745] I [fuse-bridge.c:6106:fuse_thread_proc] 0-fuse: initating unmount of /mnt/arb_b68n0p9myx2id
[2020-01-21 12:32:06.309916] I [fuse-bridge.c:6905:fini] 0-fuse: Closing fuse connection to '/mnt/arb_b68n0p9myx2id'.
[2020-01-21 12:32:06.311119] W [glusterfsd.c:1581:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7ea5) [0x7f3a88e0cea5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55a28f6002b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55a28f60011b] ) 0-: received signum (15), shutting down

>>>>>>>>>>>>>>>>>>>>

The client was getting Permission denied because brick was not attached at that moment,server_setvolume execute below code before authenticating a client request.
Here we can see if get_xlator_by_name is returning NULL we are updating this to xl so it means the brick process is assuming if no xlator(volname) is found in graph connect with already running brick but gf_authenticate failed and return EPERM.

LOCK(&ctx->volfile_lock);
    {
        xl = get_xlator_by_name(this, name);
        if (!xl)
            xl = this;
    }
    UNLOCK(&ctx->volfile_lock);


We need to correct this condition to avoid the issue. This code was changed from this patch(https://review.gluster.org/#/c/glusterfs/+/18048/).


Thanks,
Mohit Agrawal

--- Additional comment from Mohit Agrawal on 2020-01-22 05:04:55 UTC ---

The patch is posted to resolve the same
https://review.gluster.org/24053

--- Additional comment from Worker Ant on 2020-01-22 14:05:28 UTC ---

REVIEW: https://review.gluster.org/24053 (server: Mount fails after reboot 1/3 gluster nodes) merged (#6) on master by MOHIT AGRAWAL

Comment 1 Worker Ant 2020-01-22 14:12:24 UTC
REVIEW: https://review.gluster.org/24060 (server: Mount fails after reboot 1/3 gluster nodes) posted (#1) for review on release-7 by MOHIT AGRAWAL

Comment 2 Worker Ant 2020-02-10 08:03:34 UTC
REVIEW: https://review.gluster.org/24060 (server: Mount fails after reboot 1/3 gluster nodes) merged (#2) on release-7 by Rinku Kothiya


Note You need to log in before you can comment on or make changes to this bug.