Bug 1336332 - glusterfs processes doesn't stop after invoking stop-all-gluster-processes.sh
Summary: glusterfs processes doesn't stop after invoking stop-all-gluster-processes.sh
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: core
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: RHGS 3.1.3
Assignee: Prasanna Kumar Kalever
QA Contact: Rahul Hinduja
URL:
Whiteboard:
Depends On: 1331938 1334750
Blocks: Gluster-HC-1 1311817
TreeView+ depends on / blocked
 
Reported: 2016-05-16 08:16 UTC by Prasanna Kumar Kalever
Modified: 2016-09-17 14:37 UTC (History)
10 users (show)

Fixed In Version: glusterfs-3.7.9-6
Doc Type: Bug Fix
Doc Text:
Previously, the stop-all-gluster-processes.sh hook script did not terminate mount processes. This meant that when hosts were put into maintenance mode using the stop-all-gluster-processes.sh hook script, glusterfs mount services on the node were not stopped. This resulted in upgrade failure. Additionally, a bug in waitpid meant that unmount exit status was not being reported correctly, so volumes erroneously appeared to unmount successfully. These issues have now been corrected, and the stop-all-gluster-processes.sh hook script works as expected.
Clone Of:
Environment:
Last Closed: 2016-06-23 05:23:34 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1240 0 normal SHIPPED_LIVE Red Hat Gluster Storage 3.1 Update 3 2016-06-23 08:51:28 UTC

Description Prasanna Kumar Kalever 2016-05-16 08:16:44 UTC
Description of problem:
From an RHEVM engine, when a host is put into maintenance with "stop glusterd service" via a hook script stop-all-gluster-processes.sh, it should stop bricks, gluster node services, geo-rep processes and mount processes. But this does not stop glusterfs services on the node which are mount processes.

Version-Release number of selected component (if applicable):
3.1

How reproducible:
Always

Steps to Reproduce:
1. create a volume 
2. mount it somewhere
3. invoke extras/stop-all-gluster-processes.sh
4. ps aux | grep glusterfs

Actual results:
glusterfsd, gsync (geo-rep) and other gluster node services but not glusterfs.

Expected results:
All glusterfsd, gsync (geo-rep), gluster node services and glusterfs process should be stopped on the node.

Additional info:
Ideally, RHEVM would have called umount on that node and umount volume should stop glusterfs processes but umount always doesn't return actual exit status
which should also be taken care

Comment 2 Atin Mukherjee 2016-05-18 09:52:43 UTC
Sahina,

Could you add the reason of the blocker proposal?

~Atin

Comment 4 Sahina Bose 2016-05-18 10:20:44 UTC
(In reply to Atin Mukherjee from comment #2)
> Sahina,
> 
> Could you add the reason of the blocker proposal?
> 
> ~Atin

During the upgrade process from within RHEV - the host is moved to maintenance where umount of gluster volumes are called, and the script stop-all-gluster-processes called to stop the gluster processes. Due to this bug, gluster processes are not stopped and upgrade fails. See Bug 1330975.

Comment 5 Atin Mukherjee 2016-05-19 06:48:27 UTC
Do we have a QE agreement here to consider it as blocker?

Comment 6 RamaKasturi 2016-05-20 07:09:29 UTC
yes,QE  consider this to be a blocker because when a host is moved to maintenance from UI, umount of gluster volumes are called and stop-all-gluster-processes.sh script  stops glusterfsd , glusterd and does not kill glusterfs process. Since glusterfs is running on the node, upgrading gluster fails.

Comment 8 Atin Mukherjee 2016-05-20 10:21:12 UTC
(In reply to RamaKasturi from comment #6)
> yes,QE  consider this to be a blocker because when a host is moved to
> maintenance from UI, umount of gluster volumes are called and
> stop-all-gluster-processes.sh script  stops glusterfsd , glusterd and does
> not kill glusterfs process. Since glusterfs is running on the node,
> upgrading gluster fails.

Please ack it.

Comment 11 Milind Changire 2016-05-23 08:31:47 UTC
BZ added to RHEL 6 Errata for RHGS 3.1.3 ... moving to ON_QA

Comment 12 Rahul Hinduja 2016-05-28 17:48:52 UTC
Verified with the build:
glusterfs-3.7.9-6.el7rhgs.x86_64

It kills glusterfs , gsync processes but do not kill glusterd process as mentioned below:

[root@dhcp37-162 ~]# ps aux | grep glusterfs | wc -l
16
[root@dhcp37-162 ~]# 
[root@dhcp37-162 ~]# 
[root@dhcp37-162 ~]# /usr/share/glusterfs/scripts/stop-all-gluster-processes.sh 
sending SIGTERM to mount process with pid: 1880
sending SIGTERM to pid: 2198
sending SIGTERM to pid: 2206
sending SIGTERM to pid: 2217
sending SIGTERM to pid: 13253
sending SIGTERM to pid: 1354
sending SIGTERM to pid: 1374
sending SIGTERM to pid: 1537
sending SIGTERM to pid: 1739
sending SIGTERM to geo-rep gsync process 13253
13298
13299
13304
13305
13311
13325
13334
13346

sending SIGKILL to pid: 13253
[root@dhcp37-162 ~]# ps aux | grep glusterfs
root     14655  0.0  0.0 112648   968 pts/0    S+   17:37   0:00 grep --color=auto glusterfs
[root@dhcp37-162 ~]# ps aux | grep gluster
root     13262  0.0  0.3 604200 30968 ?        Ssl  May27   0:09 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
root     14657  0.0  0.0 112648   972 pts/0    S+   17:37   0:00 grep --color=auto gluster
[root@dhcp37-162 ~]# service glusterd status
Redirecting to /bin/systemctl status  glusterd.service
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2016-05-27 10:38:49 UTC; 1 day 6h ago
  Process: 13261 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 13262 (glusterd)
   CGroup: /system.slice/glusterd.service
           ├─ 2270 /sbin/rpc.statd
           └─13262 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO

May 28 07:18:10 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[2521]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 3
May 28 07:18:10 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[2521]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
May 28 17:14:33 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13304]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
May 28 17:14:33 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13304]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
May 28 17:14:33 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13304]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 3
May 28 17:14:33 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13304]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 4
May 28 17:14:34 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13298]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
May 28 17:14:34 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13298]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
May 28 17:14:34 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13298]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 3
May 28 17:14:34 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13298]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 4
[root@dhcp37-162 ~]#

Comment 14 Atin Mukherjee 2016-05-30 05:10:25 UTC
Rahul,

After talking to RHSC team, here is the conclusion I arrived at:

The script is written in a way that during maintenance phase GlusterD will be brought down separately as its a systemd service and all other Gluster processes to be brought down by executing the script.

HTH,
Atin

Comment 15 Rahul Hinduja 2016-05-30 09:05:15 UTC
Based on comment 12, comment 13 and comment 14 moving this bug to verified state. The original issue of glusterfs process being not killed via script stop-all-gluster-processes.sh is reolved.

Comment 21 errata-xmlrpc 2016-06-23 05:23:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240


Note You need to log in before you can comment on or make changes to this bug.