Bug 1336332
| Summary: | glusterfs processes doesn't stop after invoking stop-all-gluster-processes.sh | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Prasanna Kumar Kalever <prasanna.kalever> |
| Component: | core | Assignee: | Prasanna Kumar Kalever <prasanna.kalever> |
| Status: | CLOSED ERRATA | QA Contact: | Rahul Hinduja <rhinduja> |
| Severity: | medium | Docs Contact: | |
| Priority: | high | ||
| Version: | rhgs-3.1 | CC: | amukherj, knarra, mchangir, prasanna.kalever, rcyriac, rgowdapp, rhinduja, rhs-bugs, sabose, storage-qa-internal |
| Target Milestone: | --- | Keywords: | ZStream |
| Target Release: | RHGS 3.1.3 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | glusterfs-3.7.9-6 | Doc Type: | Bug Fix |
| Doc Text: |
Previously, the stop-all-gluster-processes.sh hook script did not terminate mount processes. This meant that when hosts were put into maintenance mode using the stop-all-gluster-processes.sh hook script, glusterfs mount services on the node were not stopped. This resulted in upgrade failure.
Additionally, a bug in waitpid meant that unmount exit status was not being reported correctly, so volumes erroneously appeared to unmount successfully.
These issues have now been corrected, and the stop-all-gluster-processes.sh hook script works as expected.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-06-23 05:23:34 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1331938, 1334750 | ||
| Bug Blocks: | 1258386, 1311817 | ||
|
Description
Prasanna Kumar Kalever
2016-05-16 08:16:44 UTC
Sahina, Could you add the reason of the blocker proposal? ~Atin (In reply to Atin Mukherjee from comment #2) > Sahina, > > Could you add the reason of the blocker proposal? > > ~Atin During the upgrade process from within RHEV - the host is moved to maintenance where umount of gluster volumes are called, and the script stop-all-gluster-processes called to stop the gluster processes. Due to this bug, gluster processes are not stopped and upgrade fails. See Bug 1330975. Do we have a QE agreement here to consider it as blocker? yes,QE consider this to be a blocker because when a host is moved to maintenance from UI, umount of gluster volumes are called and stop-all-gluster-processes.sh script stops glusterfsd , glusterd and does not kill glusterfs process. Since glusterfs is running on the node, upgrading gluster fails. (In reply to RamaKasturi from comment #6) > yes,QE consider this to be a blocker because when a host is moved to > maintenance from UI, umount of gluster volumes are called and > stop-all-gluster-processes.sh script stops glusterfsd , glusterd and does > not kill glusterfs process. Since glusterfs is running on the node, > upgrading gluster fails. Please ack it. BZ added to RHEL 6 Errata for RHGS 3.1.3 ... moving to ON_QA Verified with the build:
glusterfs-3.7.9-6.el7rhgs.x86_64
It kills glusterfs , gsync processes but do not kill glusterd process as mentioned below:
[root@dhcp37-162 ~]# ps aux | grep glusterfs | wc -l
16
[root@dhcp37-162 ~]#
[root@dhcp37-162 ~]#
[root@dhcp37-162 ~]# /usr/share/glusterfs/scripts/stop-all-gluster-processes.sh
sending SIGTERM to mount process with pid: 1880
sending SIGTERM to pid: 2198
sending SIGTERM to pid: 2206
sending SIGTERM to pid: 2217
sending SIGTERM to pid: 13253
sending SIGTERM to pid: 1354
sending SIGTERM to pid: 1374
sending SIGTERM to pid: 1537
sending SIGTERM to pid: 1739
sending SIGTERM to geo-rep gsync process 13253
13298
13299
13304
13305
13311
13325
13334
13346
sending SIGKILL to pid: 13253
[root@dhcp37-162 ~]# ps aux | grep glusterfs
root 14655 0.0 0.0 112648 968 pts/0 S+ 17:37 0:00 grep --color=auto glusterfs
[root@dhcp37-162 ~]# ps aux | grep gluster
root 13262 0.0 0.3 604200 30968 ? Ssl May27 0:09 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
root 14657 0.0 0.0 112648 972 pts/0 S+ 17:37 0:00 grep --color=auto gluster
[root@dhcp37-162 ~]# service glusterd status
Redirecting to /bin/systemctl status glusterd.service
● glusterd.service - GlusterFS, a clustered file-system server
Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2016-05-27 10:38:49 UTC; 1 day 6h ago
Process: 13261 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 13262 (glusterd)
CGroup: /system.slice/glusterd.service
├─ 2270 /sbin/rpc.statd
└─13262 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
May 28 07:18:10 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[2521]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 3
May 28 07:18:10 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[2521]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
May 28 17:14:33 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13304]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
May 28 17:14:33 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13304]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
May 28 17:14:33 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13304]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 3
May 28 17:14:33 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13304]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 4
May 28 17:14:34 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13298]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
May 28 17:14:34 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13298]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
May 28 17:14:34 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13298]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 3
May 28 17:14:34 dhcp37-162.lab.eng.blr.redhat.com GlusterFS[13298]: [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 4
[root@dhcp37-162 ~]#
Rahul, After talking to RHSC team, here is the conclusion I arrived at: The script is written in a way that during maintenance phase GlusterD will be brought down separately as its a systemd service and all other Gluster processes to be brought down by executing the script. HTH, Atin Based on comment 12, comment 13 and comment 14 moving this bug to verified state. The original issue of glusterfs process being not killed via script stop-all-gluster-processes.sh is reolved. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240 |