Bug 1336332

Summary:	glusterfs processes doesn't stop after invoking stop-all-gluster-processes.sh
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Prasanna Kumar Kalever <prasanna.kalever>
Component:	core	Assignee:	Prasanna Kumar Kalever <prasanna.kalever>
Status:	CLOSED ERRATA	QA Contact:	Rahul Hinduja <rhinduja>
Severity:	medium	Docs Contact:
Priority:	high
Version:	rhgs-3.1	CC:	amukherj, knarra, mchangir, prasanna.kalever, rcyriac, rgowdapp, rhinduja, rhs-bugs, sabose, storage-qa-internal
Target Milestone:	---	Keywords:	ZStream
Target Release:	RHGS 3.1.3
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-3.7.9-6	Doc Type:	Bug Fix
Doc Text:	Previously, the stop-all-gluster-processes.sh hook script did not terminate mount processes. This meant that when hosts were put into maintenance mode using the stop-all-gluster-processes.sh hook script, glusterfs mount services on the node were not stopped. This resulted in upgrade failure. Additionally, a bug in waitpid meant that unmount exit status was not being reported correctly, so volumes erroneously appeared to unmount successfully. These issues have now been corrected, and the stop-all-gluster-processes.sh hook script works as expected.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-06-23 05:23:34 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1331938, 1334750
Bug Blocks:	1258386, 1311817

Description Prasanna Kumar Kalever 2016-05-16 08:16:44 UTC

Description of problem:
From an RHEVM engine, when a host is put into maintenance with "stop glusterd service" via a hook script stop-all-gluster-processes.sh, it should stop bricks, gluster node services, geo-rep processes and mount processes. But this does not stop glusterfs services on the node which are mount processes.

Version-Release number of selected component (if applicable):
3.1

How reproducible:
Always

Steps to Reproduce:
1. create a volume 
2. mount it somewhere
3. invoke extras/stop-all-gluster-processes.sh
4. ps aux | grep glusterfs

Actual results:
glusterfsd, gsync (geo-rep) and other gluster node services but not glusterfs.

Expected results:
All glusterfsd, gsync (geo-rep), gluster node services and glusterfs process should be stopped on the node.

Additional info:
Ideally, RHEVM would have called umount on that node and umount volume should stop glusterfs processes but umount always doesn't return actual exit status
which should also be taken care

Comment 2 Atin Mukherjee 2016-05-18 09:52:43 UTC

Sahina,

Could you add the reason of the blocker proposal?

~Atin

Comment 4 Sahina Bose 2016-05-18 10:20:44 UTC

(In reply to Atin Mukherjee from comment #2)
> Sahina,
> 
> Could you add the reason of the blocker proposal?
> 
> ~Atin

During the upgrade process from within RHEV - the host is moved to maintenance where umount of gluster volumes are called, and the script stop-all-gluster-processes called to stop the gluster processes. Due to this bug, gluster processes are not stopped and upgrade fails. See Bug 1330975.

Comment 5 Atin Mukherjee 2016-05-19 06:48:27 UTC

Do we have a QE agreement here to consider it as blocker?

Comment 6 RamaKasturi 2016-05-20 07:09:29 UTC

yes,QE  consider this to be a blocker because when a host is moved to maintenance from UI, umount of gluster volumes are called and stop-all-gluster-processes.sh script  stops glusterfsd , glusterd and does not kill glusterfs process. Since glusterfs is running on the node, upgrading gluster fails.

Comment 8 Atin Mukherjee 2016-05-20 10:21:12 UTC

(In reply to RamaKasturi from comment #6)
> yes,QE  consider this to be a blocker because when a host is moved to
> maintenance from UI, umount of gluster volumes are called and
> stop-all-gluster-processes.sh script  stops glusterfsd , glusterd and does
> not kill glusterfs process. Since glusterfs is running on the node,
> upgrading gluster fails.

Please ack it.

Comment 11 Milind Changire 2016-05-23 08:31:47 UTC

BZ added to RHEL 6 Errata for RHGS 3.1.3 ... moving to ON_QA

Comment 12 Rahul Hinduja 2016-05-28 17:48:52 UTC

Verified with the build:
glusterfs-3.7.9-6.el7rhgs.x86_64

It kills glusterfs , gsync processes but do not kill glusterd process as mentioned below:

[root@dhcp37-162 ~]# ps aux | grep glusterfs | wc -l
16
[root@dhcp37-162 ~]# 
[root@dhcp37-162 ~]# 
[root@dhcp37-162 ~]# /usr/share/glusterfs/scripts/stop-all-gluster-processes.sh 
sending SIGTERM to mount process with pid: 1880
sending SIGTERM to pid: 2198
sending SIGTERM to pid: 2206
sending SIGTERM to pid: 2217
sending SIGTERM to pid: 13253
sending SIGTERM to pid: 1354
sending SIGTERM to pid: 1374
sending SIGTERM to pid: 1537
sending SIGTERM to pid: 1739
sending SIGTERM to geo-rep gsync process 13253
13298
13299
13304
13305
13311
13325
13334
13346

sending SIGKILL to pid: 13253
[root@dhcp37-162 ~]# ps aux | grep glusterfs
root     14655  0.0  0.0 112648   968 pts/0    S+   17:37   0:00 grep --color=auto glusterfs
[root@dhcp37-162 ~]# ps aux | grep gluster
root     13262  0.0  0.3 604200 30968 ?        Ssl  May27   0:09 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
root     14657  0.0  0.0 112648   972 pts/0    S+   17:37   0:00 grep --color=auto gluster
[root@dhcp37-162 ~]# service glusterd status
Redirecting to /bin/systemctl status  glusterd.service
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2016-05-27 10:38:49 UTC; 1 day 6h ago
  Process: 13261 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 13262 (glusterd)
   CGroup: /system.slice/glusterd.service
           ├─ 2270 /sbin/rpc.statd
           └─13262 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO

May 28 07:18:10 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[2521]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 3
May 28 07:18:10 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[2521]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
May 28 17:14:33 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13304]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
May 28 17:14:33 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13304]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
May 28 17:14:33 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13304]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 3
May 28 17:14:33 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13304]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 4
May 28 17:14:34 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13298]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
May 28 17:14:34 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13298]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
May 28 17:14:34 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13298]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 3
May 28 17:14:34 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13298]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 4
[root@dhcp37-162 ~]#

Comment 14 Atin Mukherjee 2016-05-30 05:10:25 UTC

Rahul,

After talking to RHSC team, here is the conclusion I arrived at:

The script is written in a way that during maintenance phase GlusterD will be brought down separately as its a systemd service and all other Gluster processes to be brought down by executing the script.

HTH,
Atin

Comment 15 Rahul Hinduja 2016-05-30 09:05:15 UTC

Based on comment 12, comment 13 and comment 14 moving this bug to verified state. The original issue of glusterfs process being not killed via script stop-all-gluster-processes.sh is reolved.

Comment 21 errata-xmlrpc 2016-06-23 05:23:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240