Bug 1336332
Summary: | glusterfs processes doesn't stop after invoking stop-all-gluster-processes.sh | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Prasanna Kumar Kalever <prasanna.kalever> |
Component: | core | Assignee: | Prasanna Kumar Kalever <prasanna.kalever> |
Status: | CLOSED ERRATA | QA Contact: | Rahul Hinduja <rhinduja> |
Severity: | medium | Docs Contact: | |
Priority: | high | ||
Version: | rhgs-3.1 | CC: | amukherj, knarra, mchangir, prasanna.kalever, rcyriac, rgowdapp, rhinduja, rhs-bugs, sabose, storage-qa-internal |
Target Milestone: | --- | Keywords: | ZStream |
Target Release: | RHGS 3.1.3 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.7.9-6 | Doc Type: | Bug Fix |
Doc Text: |
Previously, the stop-all-gluster-processes.sh hook script did not terminate mount processes. This meant that when hosts were put into maintenance mode using the stop-all-gluster-processes.sh hook script, glusterfs mount services on the node were not stopped. This resulted in upgrade failure.
Additionally, a bug in waitpid meant that unmount exit status was not being reported correctly, so volumes erroneously appeared to unmount successfully.
These issues have now been corrected, and the stop-all-gluster-processes.sh hook script works as expected.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2016-06-23 05:23:34 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1331938, 1334750 | ||
Bug Blocks: | 1258386, 1311817 |
Description
Prasanna Kumar Kalever
2016-05-16 08:16:44 UTC
Sahina, Could you add the reason of the blocker proposal? ~Atin (In reply to Atin Mukherjee from comment #2) > Sahina, > > Could you add the reason of the blocker proposal? > > ~Atin During the upgrade process from within RHEV - the host is moved to maintenance where umount of gluster volumes are called, and the script stop-all-gluster-processes called to stop the gluster processes. Due to this bug, gluster processes are not stopped and upgrade fails. See Bug 1330975. Do we have a QE agreement here to consider it as blocker? yes,QE consider this to be a blocker because when a host is moved to maintenance from UI, umount of gluster volumes are called and stop-all-gluster-processes.sh script stops glusterfsd , glusterd and does not kill glusterfs process. Since glusterfs is running on the node, upgrading gluster fails. (In reply to RamaKasturi from comment #6) > yes,QE consider this to be a blocker because when a host is moved to > maintenance from UI, umount of gluster volumes are called and > stop-all-gluster-processes.sh script stops glusterfsd , glusterd and does > not kill glusterfs process. Since glusterfs is running on the node, > upgrading gluster fails. Please ack it. BZ added to RHEL 6 Errata for RHGS 3.1.3 ... moving to ON_QA Verified with the build: glusterfs-3.7.9-6.el7rhgs.x86_64 It kills glusterfs , gsync processes but do not kill glusterd process as mentioned below: [root@dhcp37-162 ~]# ps aux | grep glusterfs | wc -l 16 [root@dhcp37-162 ~]# [root@dhcp37-162 ~]# [root@dhcp37-162 ~]# /usr/share/glusterfs/scripts/stop-all-gluster-processes.sh sending SIGTERM to mount process with pid: 1880 sending SIGTERM to pid: 2198 sending SIGTERM to pid: 2206 sending SIGTERM to pid: 2217 sending SIGTERM to pid: 13253 sending SIGTERM to pid: 1354 sending SIGTERM to pid: 1374 sending SIGTERM to pid: 1537 sending SIGTERM to pid: 1739 sending SIGTERM to geo-rep gsync process 13253 13298 13299 13304 13305 13311 13325 13334 13346 sending SIGKILL to pid: 13253 [root@dhcp37-162 ~]# ps aux | grep glusterfs root 14655 0.0 0.0 112648 968 pts/0 S+ 17:37 0:00 grep --color=auto glusterfs [root@dhcp37-162 ~]# ps aux | grep gluster root 13262 0.0 0.3 604200 30968 ? Ssl May27 0:09 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO root 14657 0.0 0.0 112648 972 pts/0 S+ 17:37 0:00 grep --color=auto gluster [root@dhcp37-162 ~]# service glusterd status Redirecting to /bin/systemctl status glusterd.service ● glusterd.service - GlusterFS, a clustered file-system server Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2016-05-27 10:38:49 UTC; 1 day 6h ago Process: 13261 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS) Main PID: 13262 (glusterd) CGroup: /system.slice/glusterd.service ├─ 2270 /sbin/rpc.statd └─13262 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO May 28 07:18:10 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[2521]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 3 May 28 07:18:10 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[2521]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 May 28 17:14:33 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13304]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 May 28 17:14:33 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13304]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 May 28 17:14:33 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13304]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 3 May 28 17:14:33 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13304]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 4 May 28 17:14:34 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13298]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 May 28 17:14:34 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13298]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 May 28 17:14:34 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13298]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 3 May 28 17:14:34 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13298]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 4 [root@dhcp37-162 ~]# Rahul, After talking to RHSC team, here is the conclusion I arrived at: The script is written in a way that during maintenance phase GlusterD will be brought down separately as its a systemd service and all other Gluster processes to be brought down by executing the script. HTH, Atin Based on comment 12, comment 13 and comment 14 moving this bug to verified state. The original issue of glusterfs process being not killed via script stop-all-gluster-processes.sh is reolved. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240 |