Description of problem: From an RHEVM engine, when a host is put into maintenance with "stop glusterd service" via a hook script stop-all-gluster-processes.sh, it should stop bricks, gluster node services, geo-rep processes and mount processes. But this does not stop glusterfs services on the node which are mount processes. Version-Release number of selected component (if applicable): 3.1 How reproducible: Always Steps to Reproduce: 1. create a volume 2. mount it somewhere 3. invoke extras/stop-all-gluster-processes.sh 4. ps aux | grep glusterfs Actual results: glusterfsd, gsync (geo-rep) and other gluster node services but not glusterfs. Expected results: All glusterfsd, gsync (geo-rep), gluster node services and glusterfs process should be stopped on the node. Additional info: Ideally, RHEVM would have called umount on that node and umount volume should stop glusterfs processes but umount always doesn't return actual exit status which should also be taken care
Sahina, Could you add the reason of the blocker proposal? ~Atin
(In reply to Atin Mukherjee from comment #2) > Sahina, > > Could you add the reason of the blocker proposal? > > ~Atin During the upgrade process from within RHEV - the host is moved to maintenance where umount of gluster volumes are called, and the script stop-all-gluster-processes called to stop the gluster processes. Due to this bug, gluster processes are not stopped and upgrade fails. See Bug 1330975.
Do we have a QE agreement here to consider it as blocker?
yes,QE consider this to be a blocker because when a host is moved to maintenance from UI, umount of gluster volumes are called and stop-all-gluster-processes.sh script stops glusterfsd , glusterd and does not kill glusterfs process. Since glusterfs is running on the node, upgrading gluster fails.
(In reply to RamaKasturi from comment #6) > yes,QE consider this to be a blocker because when a host is moved to > maintenance from UI, umount of gluster volumes are called and > stop-all-gluster-processes.sh script stops glusterfsd , glusterd and does > not kill glusterfs process. Since glusterfs is running on the node, > upgrading gluster fails. Please ack it.
BZ added to RHEL 6 Errata for RHGS 3.1.3 ... moving to ON_QA
Verified with the build: glusterfs-3.7.9-6.el7rhgs.x86_64 It kills glusterfs , gsync processes but do not kill glusterd process as mentioned below: [root@dhcp37-162 ~]# ps aux | grep glusterfs | wc -l 16 [root@dhcp37-162 ~]# [root@dhcp37-162 ~]# [root@dhcp37-162 ~]# /usr/share/glusterfs/scripts/stop-all-gluster-processes.sh sending SIGTERM to mount process with pid: 1880 sending SIGTERM to pid: 2198 sending SIGTERM to pid: 2206 sending SIGTERM to pid: 2217 sending SIGTERM to pid: 13253 sending SIGTERM to pid: 1354 sending SIGTERM to pid: 1374 sending SIGTERM to pid: 1537 sending SIGTERM to pid: 1739 sending SIGTERM to geo-rep gsync process 13253 13298 13299 13304 13305 13311 13325 13334 13346 sending SIGKILL to pid: 13253 [root@dhcp37-162 ~]# ps aux | grep glusterfs root 14655 0.0 0.0 112648 968 pts/0 S+ 17:37 0:00 grep --color=auto glusterfs [root@dhcp37-162 ~]# ps aux | grep gluster root 13262 0.0 0.3 604200 30968 ? Ssl May27 0:09 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO root 14657 0.0 0.0 112648 972 pts/0 S+ 17:37 0:00 grep --color=auto gluster [root@dhcp37-162 ~]# service glusterd status Redirecting to /bin/systemctl status glusterd.service ● glusterd.service - GlusterFS, a clustered file-system server Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2016-05-27 10:38:49 UTC; 1 day 6h ago Process: 13261 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS) Main PID: 13262 (glusterd) CGroup: /system.slice/glusterd.service ├─ 2270 /sbin/rpc.statd └─13262 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO May 28 07:18:10 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[2521]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 3 May 28 07:18:10 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[2521]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 May 28 17:14:33 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13304]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 May 28 17:14:33 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13304]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 May 28 17:14:33 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13304]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 3 May 28 17:14:33 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13304]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 4 May 28 17:14:34 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13298]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 May 28 17:14:34 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13298]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 May 28 17:14:34 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13298]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 3 May 28 17:14:34 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13298]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 4 [root@dhcp37-162 ~]#
Rahul, After talking to RHSC team, here is the conclusion I arrived at: The script is written in a way that during maintenance phase GlusterD will be brought down separately as its a systemd service and all other Gluster processes to be brought down by executing the script. HTH, Atin
Based on comment 12, comment 13 and comment 14 moving this bug to verified state. The original issue of glusterfs process being not killed via script stop-all-gluster-processes.sh is reolved.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240