Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1464072

Summary:	cns-brick-multiplexing: brick process fails to restart after gluster pod failure
Product:	[Community] GlusterFS	Reporter:	Mohit Agrawal <moagrawa>
Component:	core	Assignee:	Mohit Agrawal <moagrawa>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	urgent	Docs Contact:
Priority:	high
Version:	mainline	CC:	amukherj, bugs, hchiramm, kramdoss, pprakash, rhs-bugs, rtalur, storage-qa-internal
Target Milestone:	---	Keywords:	Triaged
Target Release:	---
Hardware:	Unspecified
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-3.12.0	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1463221	Environment:
Last Closed:	2017-09-05 17:34:44 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1463221, 1491059, 1491060

Description Mohit Agrawal 2017-06-22 11:32:43 UTC

+++ This bug was initially created as a clone of Bug #1463221 +++

Description of problem:

After a gluster pod was deleted, new gluster pod got created but the brick process doesn't seem to come up.

This issue is seen with 100 volumes running and mounted on 100 different application pods. This is not seen with a single vol mapped to single pod.

sh-4.2# gluster vol status vol_2bd7e27c196eddac0127b182c7749be9 
Status of volume: vol_2bd7e27c196eddac0127b182c7749be9
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.46.190:/var/lib/heketi/mounts/v
g_91bf878e839259d5983c1b8116da8911/brick_34
ec5d5f91f442aecf5be89875517595/brick        49152     0          Y       1590 
Brick 10.70.47.59:/var/lib/heketi/mounts/vg
_a5597e6aa7af855a39d96b64a47787ca/brick_a22
1df1e052866d65e30204d7a682ecf/brick         N/A       N/A        N       N/A  
Brick 10.70.46.65:/var/lib/heketi/mounts/vg
_cb53b766a8f3325469c78bf558f2ae4e/brick_0e3
86e7bb5235cc7e7989e66eca431d1/brick         49152     0          Y       1559 
Self-heal Daemon on localhost               N/A       N/A        Y       1478 
Self-heal Daemon on 10.70.46.65             N/A       N/A        Y       40193
Self-heal Daemon on 10.70.46.190            N/A       N/A        Y       7260 
 
Task Status of Volume vol_2bd7e27c196eddac0127b182c7749be9
------------------------------------------------------------------------------
There are no active volume tasks


Version-Release number of selected component (if applicable):
glusterfs-server-3.8.4-28.el7rhgs.x86_64

How reproducible:
1/1

Steps to Reproduce:
1. on a 3 node cns cluster, create 100 application pods and mount 100 cns volumes. 
2. Run some IO
3. delete a gluster pod
4. New gluster pod got created as expected and glusterd started
5. check for volume status 

Actual results:
brick process for the node rebooted (pod deleted) did not start. As a result, self-heal doesn't kick in.

Expected results:
brick process should start automatically and self heal should complete.

Additional info:
logs shall be attached shortly

--- Additional comment from Red Hat Bugzilla Rules Engine on 2017-06-20 07:57:51 EDT ---

This bug is automatically being proposed for the current release of Red Hat Gluster Storage 3 under active development, by setting the release flag 'rhgs‑3.3.0' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from krishnaram Karthick on 2017-06-20 08:10:14 EDT ---

snippet of glusterd logs:

[2017-06-20 12:09:15.098669] I [MSGID: 114057] [client-handshake.c:1450:select_server_supported_programs] 0-vol_f8e169486633be98ac020b65e6e9daa4-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2017-06-20 12:09:15.098884] W [MSGID: 114043] [client-handshake.c:1104:client_setvolume_cbk] 0-vol_f8e169486633be98ac020b65e6e9daa4-client-0: failed to set the volume [No such file or directory]
[2017-06-20 12:09:15.098908] W [MSGID: 114007] [client-handshake.c:1133:client_setvolume_cbk] 0-vol_f8e169486633be98ac020b65e6e9daa4-client-0: failed to get 'process-uuid' from reply dict [Invalid argument]
[2017-06-20 12:09:15.098918] E [MSGID: 114044] [client-handshake.c:1139:client_setvolume_cbk] 0-vol_f8e169486633be98ac020b65e6e9daa4-client-0: SETVOLUME on remote-host failed [No such file or directory]
[2017-06-20 12:09:15.098926] I [MSGID: 114051] [client-handshake.c:1259:client_setvolume_cbk] 0-vol_f8e169486633be98ac020b65e6e9daa4-client-0: sending CHILD_CONNECTING event
[2017-06-20 12:09:15.099002] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-vol_f8e169486633be98ac020b65e6e9daa4-client-0: disconnected from vol_f8e169486633be98ac020b65e6e9daa4-client-0. Client process will keep trying to connect to glusterd until brick's port is available
[2017-06-20 12:09:15.101888] I [MSGID: 114057] [client-handshake.c:1450:select_server_supported_programs] 0-vol_f365965e923c11e497fcb4ab5a49ccdd-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2017-06-20 12:09:15.102173] W [MSGID: 114043] [client-handshake.c:1104:client_setvolume_cbk] 0-vol_f365965e923c11e497fcb4ab5a49ccdd-client-2: failed to set the volume [No such file or directory]
[2017-06-20 12:09:15.102210] W [MSGID: 114007] [client-handshake.c:1133:client_setvolume_cbk] 0-vol_f365965e923c11e497fcb4ab5a49ccdd-client-2: failed to get 'process-uuid' from reply dict [Invalid argument]
[2017-06-20 12:09:15.102226] E [MSGID: 114044] [client-handshake.c:1139:client_setvolume_cbk] 0-vol_f365965e923c11e497fcb4ab5a49ccdd-client-2: SETVOLUME on remote-host failed [No such file or directory]
[2017-06-20 12:09:15.102237] I [MSGID: 114051] [client-handshake.c:1259:client_setvolume_cbk] 0-vol_f365965e923c11e497fcb4ab5a49ccdd-client-2: sending CHILD_CONNECTING event
[2017-06-20 12:09:15.102364] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-vol_f365965e923c11e497fcb4ab5a49ccdd-client-2: disconnected from vol_f365965e923c11e497fcb4ab5a49ccdd-client-2. Client process will keep trying to connect to glusterd until brick's port is available

--- Additional comment from krishnaram Karthick on 2017-06-20 08:37:54 EDT ---

Logs are available here -> http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1463221/

--- Additional comment from krishnaram Karthick on 2017-06-20 08:39:45 EDT ---

setup details:

root/aplo is the credential to all of these nodes, in case dev needs to have a look at the live setup.

[root@dhcp46-161 ~]# oc get nodes
NAME                                STATUS                     AGE       VERSION
dhcp46-161.lab.eng.blr.redhat.com   Ready,SchedulingDisabled   22d       v1.6.1+5115d708d7
dhcp46-190.lab.eng.blr.redhat.com   Ready                      22d       v1.6.1+5115d708d7
dhcp46-65.lab.eng.blr.redhat.com    Ready                      22d       v1.6.1+5115d708d7
dhcp47-59.lab.eng.blr.redhat.com    Ready                      22d       v1.6.1+5115d708d7

gluster pod running on dhcp47-59.lab.eng.blr.redhat.com was deleted. So, logs from there would be helpful.

--- Additional comment from Atin Mukherjee on 2017-06-20 11:40:12 EDT ---

Karthick,

Please reserve this setup for sometime. After a quick glance I don't see anything abnormal from the logs. I'd like to have a look at the set up tomorrow.

--- Additional comment from Atin Mukherjee on 2017-06-21 04:30:02 EDT ---

Looking at the setup and the logs, here is my theory what has happened here:

1. brick processes were running with pid (say 1000) before the pod was deleted.
2. When the new pod came up, one non gluster process picked up the same pid 1000 after the very first brick was brought up with pid 1001 and then the subsequent brick start attempts tried to connect to the brick assuming it was running with pid 1000, the following log sequence does confirm that brick went for a connect instead of an attach

[2017-06-20 11:14:51.444692] I [glusterd-utils.c:5443:glusterd_brick_start] 0-maa
nagement: discovered already-running brick /var/lib/heketi/mounts/vg_a5597e6aa7aa
f855a39d96b64a47787ca/brick_15ef1d866be79abfea2446a7c887d192/brick
[2017-06-20 11:14:51.444731] I [MSGID: 106143] [glusterd-pmap.c:277:pmap_registrr
y_bind] 0-pmap: adding brick /var/lib/heketi/mounts/vg_a5597e6aa7af855a39d96b64aa
47787ca/brick_15ef1d866be79abfea2446a7c887d192/brick on port 49152
[2017-06-20 11:14:51.444775] I [rpc-clnt.c:1060:rpc_clnt_connection_init] 0-manaa
gement: setting frame-timeout to 600
[2017-06-20 11:14:51.444877] I [glusterd-utils.c:5443:glusterd_brick_start] 0-maa
nagement: discovered already-running brick /var/lib/heketi/mounts/vg_a5597e6aa7aa
f855a39d96b64a47787ca/brick_d523daa447130512618fd72da50cad72/brick
[2017-06-20 11:14:51.444901] I [MSGID: 106143] [glusterd-pmap.c:277:pmap_registrr
y_bind] 0-pmap: adding brick /var/lib/heketi/mounts/vg_a5597e6aa7af855a39d96b64aa
47787ca/brick_d523daa447130512618fd72da50cad72/brick on port 49152


3. and the theory is more concrete when we see that glusterd gets disconnect notifications due to this
[2017-06-20 11:14:51.507841] I [socket.c:2465:socket_event_handler] 0-transport::
 EPOLLERR - disconnecting now
[2017-06-20 11:14:51.509668] I [MSGID: 106005] [glusterd-handler.c:5687:__glustee
rd_brick_rpc_notify] 0-management: Brick 10.70.47.59:/var/lib/heketi/mounts/vg_66
1f1e5206bf507bc54aef6984e91157b/brick_08a4ad42cdf24e5e3b56bd940c6b243b/brick hass
 disconnected from glusterd.
[2017-06-20 11:14:51.511046] I [socket.c:2465:socket_event_handler] 0-transport::
 EPOLLERR - disconnecting now
[2017-06-20 11:14:51.512293] I [MSGID: 106005] [glusterd-handler.c:5687:__glustee
rd_brick_rpc_notify] 0-management: Brick 10.70.47.59:/var/lib/heketi/mounts/vg_ee
863cbc857b5d64d58d31afd42383234/brick_1146fd5fda435fedf6241a0f3c84a3e6/brick hass
 disconnected from glusterd.
[2017-06-20 11:14:51.513633] I [socket.c:2465:socket_event_handler] 0-transport::
 EPOLLERR - disconnecting now
[2017-06-20 11:14:51.516157] I [MSGID: 106005] [glusterd-handler.c:5687:__glustee
rd_brick_rpc_notify] 0-management: Brick 10.70.47.59:/var/lib/heketi/mounts/vg_ee
863cbc857b5d64d58d31afd42383234/brick_7e1cfecf225612a67464e891b391ebc5/brick has disconnected from glusterd.


Ideally there are two problems here:

1. the pidfiles shouldn't be in /var/lib/glusterd as these files shouldn't be persisted on node reboot.

2. is_gf_pid_running should be enhanced in a way so that we should have a stricter check of if its a gluster process.

Doing 2 would actually solve the problem for now. However we should try to finish the ongoing work https://review.gluster.org/#/c/13580/ .

--- Additional comment from Red Hat Bugzilla Rules Engine on 2017-06-22 03:53:05 EDT ---

This bug is automatically being provided 'pm_ack+' for the release flag 'rhgs‑3.3.0', the current release of Red Hat Gluster Storage 3 under active development, having been appropriately marked for the release, and having been provided ACK from Development and QE

--- Additional comment from Atin Mukherjee on 2017-06-22 04:03:48 EDT ---

I've a disagreement on the severity level set to high. As per my analysis and test, I was never able to reproduce this issue. So until QE confirms that this issue is every time reproducible, I'd like to set the severity to medium.

--- Additional comment from Red Hat Bugzilla Rules Engine on 2017-06-22 04:03:53 EDT ---

Since this bug has been approved for the RHGS 3.3.0 release of Red Hat Gluster Storage 3, through release flag 'rhgs-3.3.0+', and through the Internal Whiteboard entry of '3.3.0', the Target Release is being automatically set to 'RHGS 3.3.0'

--- Additional comment from Mohit Agrawal on 2017-06-22 06:28:22 EDT ---

I think we can resolve the same without change is_gf_pid_running.
Below is the code path in glusterd_brick_start in which we are trying to connect with brick if it is running.
 
I think before call glusterd_brick_connect here we can validate socketpath, if pid(glusterfs) is not running the socketpath value will not be valid we can move out(goto) from the scope instead of calling glusterd_brick_connect.

>>>>>>>>>>>>>

if (is_brick_mx_enabled ())
                                glusterd_get_sock_from_brick_pid (pid, socketpath,
                                                                  sizeof(socketpath));
                        else
                                glusterd_set_brick_socket_filepath (volinfo, brickinfo,
                                                                    socketpath,
                                                                    sizeof (socketpath));
                        gf_log (this->name, GF_LOG_DEBUG,
                                "Using %s as sockfile for brick %s of volume %s ",
                                socketpath, brickinfo->path, volinfo->volname);
                        (void) glusterd_brick_connect (volinfo, brickinfo,
                                        socketpath);
                }
                return 0;
        }

>>>>>>>>>>>>>>>>>

  

Regards
Mohit Agrawal

--- Additional comment from Atin Mukherjee on 2017-06-22 06:33:21 EDT ---

(In reply to Mohit Agrawal from comment #10)
> I think we can resolve the same without change is_gf_pid_running.
> Below is the code path in glusterd_brick_start in which we are trying to
> connect with brick if it is running.
>  
> I think before call glusterd_brick_connect here we can validate socketpath,
> if pid(glusterfs) is not running the socketpath value will not be valid we
> can move out(goto) from the scope instead of calling glusterd_brick_connect.
> 
> >>>>>>>>>>>>>
> 
> if (is_brick_mx_enabled ())
>                                 glusterd_get_sock_from_brick_pid (pid,
> socketpath,
>                                                                  
> sizeof(socketpath));
>                         else
>                                 glusterd_set_brick_socket_filepath (volinfo,
> brickinfo,
>                                                                    
> socketpath,
>                                                                     sizeof
> (socketpath));
>                         gf_log (this->name, GF_LOG_DEBUG,
>                                 "Using %s as sockfile for brick %s of volume
> %s ",
>                                 socketpath, brickinfo->path,
> volinfo->volname);
>                         (void) glusterd_brick_connect (volinfo, brickinfo,
>                                         socketpath);
>                 }
>                 return 0;
>         }
> 
> >>>>>>>>>>>>>>>>>

We can't move to out as we are actually supposed to start the brick, you'd basically need to come out of the if block and proceed further. Otherwise I like the idea.

> 
>   
> 
> Regards
> Mohit Agrawal

--- Additional comment from Mohit Agrawal on 2017-06-22 06:39:17 EDT ---

I want to say control will come out from gf_is_service_running scope and start to execute the code from below point 

       ret = _mk_rundir_p (volinfo);
        if (ret)
                goto out;

        other_brick = find_compatible_brick (conf, volinfo, brickinfo,
                                             &other_vol);
        if (other_brick) {
                ret = attach_brick (this, brickinfo, other_brick,
                                    volinfo, other_vol);
                if (ret == 0) {
                        goto out;
                }
        }

Regards
Mohit Agrawal

Comment 1 Worker Ant 2017-06-22 11:35:09 UTC

REVIEW: https://review.gluster.org/17601 (glusterd: brick process fails to restart after gluster pod failure) posted (#1) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 2 Worker Ant 2017-06-22 12:20:58 UTC

REVIEW: https://review.gluster.org/17601 (glusterd: brick process fails to restart after gluster pod failure) posted (#2) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 3 Worker Ant 2017-06-27 04:11:49 UTC

REVIEW: https://review.gluster.org/17601 (glusterd: brick process fails to restart after gluster pod failure) posted (#3) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 4 Worker Ant 2017-06-27 10:57:59 UTC

COMMIT: https://review.gluster.org/17601 committed in master by Atin Mukherjee (amukherj) 
------
commit b71059960f8c67d9a058244d2a1c748be4fe1323
Author: Mohit Agrawal <moagrawa>
Date:   Thu Jun 22 16:57:04 2017 +0530

    glusterd: brick process fails to restart after gluster pod failure
    
    Problem: In container environment sometime after delete gluster pod
             and created new gluster pod brick process doesn't seem
             to come up.
    
    Solution: On the basis of logs it seems glusterd is trying to attach
              with non glusterfs process.Change the code of function
              glusterd_get_sock_from_brick_pid to fetch socketpath from argument
              of running brick process.
    
    BUG: 1464072
    Change-Id: Ida6af00066341b683bbb4440d7a0d8042581656a
    Signed-off-by: Mohit Agrawal <moagrawa>
    Reviewed-on: https://review.gluster.org/17601
    Smoke: Gluster Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Reviewed-by: Atin Mukherjee <amukherj>

Comment 5 Shyamsundar 2017-09-05 17:34:44 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report.

glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html
[2] https://www.gluster.org/pipermail/gluster-users/