1242749 – On killing the ganesha process, systemd restarts nfs-ganesha process by itself on rhel7.1

Bug 1242749 - On killing the ganesha process, systemd restarts nfs-ganesha process by itself on rhel7.1

Summary: On killing the ganesha process, systemd restarts nfs-ganesha process by itsel...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Deadline:	2015-08-28
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	nfs-ganesha
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	RHGS 3.1.1
Assignee:	Kaleb KEITHLEY
QA Contact:	Apeksha
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1202842 1216951 1251815 1256227
TreeView+	depends on / blocked

Reported:	2015-07-14 06:43 UTC by Apeksha
Modified:	2015-10-05 10:43 UTC (History)
CC List:	11 users (show)
Fixed In Version:	nfs-ganesha-2.2.0-6
Doc Type:	Bug Fix
Doc Text:	Previously, on RHEL 7.1, nfs-ganesha restarted automatically if the process exits due to a signal that is not specified as a clean exit status. As a consequence, in such cases, cluster services may not detect that the nfs-ganesha service has been restarted on a node and was unable to take the appropriate action (for example, to put the entire cluster in grace). With this fix, nfs-ganesha.service file used by systemd has been corrected to not restart in case of any unexpected failures.
Clone Of:
Environment:
Last Closed:	2015-10-05 10:43:27 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2015:1846	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.1 update	2015-10-05 14:43:08 UTC

Description Apeksha 2015-07-14 06:43:17 UTC

Description of problem:
On killing the ganesha process, systemd restarts nfs-ganesha process by itself on rhel7.1

Version-Release number of selected component (if applicable):
glusterfs-3.7.1-9.el7rhgs.x86_64
nfs-ganesha-2.2.0-5.el7rhgs.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Setup the ganesha 4 node cluster on rhel7.1
2. Now kill the ganesha process on one of the servers
3. systemd restarts the nfs-ganesha process by itself, failover doesnt happen 

[root@nfs3 ~]# ps aux | grep ganesha
root     18967  0.2  1.3 2212224 109824 ?      Ssl  01:10   0:00 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT
root     20826  0.0  0.0 112640   928 pts/0    S+   01:13   0:00 grep --color=auto ganesha
[root@nfs3 ~]# 
[root@nfs3 ~]# 
[root@nfs3 ~]# 
[root@nfs3 ~]# 
[root@nfs3 ~]# kill -9 18967
[root@nfs3 ~]# ps aux | grep ganesha
root     20957 17.0  1.2 2015616 102168 ?      Ssl  01:13   0:00 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT
root     21004  0.0  0.0 112640   924 pts/0    S+   01:13   0:00 grep --color=auto ganesha

Actual results:systemd restarts the nfs-ganesha process by itself, failover doesnt happen 

Expected results: nfs-ganesha process must not restart, failover must happen


Additional info:

var/log/messages:

Jul 14 01:13:30 nfs3 rpc.statd[11887]: Received SM_UNMON_ALL request from nfs3 while not monitoring any hosts
Jul 14 01:19:36 nfs3 systemd-logind: New session 227 of user root.
Jul 14 01:19:36 nfs3 systemd: Starting Session 227 of user root.
Jul 14 01:19:36 nfs3 systemd: Started Session 227 of user root.
Jul 14 01:19:37 nfs3 systemd-logind: Removed session 227.
Jul 14 01:19:52 nfs3 systemd: nfs-ganesha.service: main process exited, code=killed, status=9/KILL
Jul 14 01:19:52 nfs3 systemd: Unit nfs-ganesha.service entered failed state.
Jul 14 01:19:53 nfs3 systemd: nfs-ganesha.service holdoff time over, scheduling restart.
Jul 14 01:19:53 nfs3 systemd: Stopping NFS-Ganesha file server...
Jul 14 01:19:53 nfs3 systemd: Starting NFS status monitor for NFSv2/3 locking....
Jul 14 01:19:53 nfs3 rpc.statd: Statd service already running!
Jul 14 01:19:53 nfs3 systemd: nfs-ganesha-lock.service: control process exited, code=exited status=1
Jul 14 01:19:53 nfs3 systemd: Failed to start NFS status monitor for NFSv2/3 locking..
Jul 14 01:19:53 nfs3 systemd: Unit nfs-ganesha-lock.service entered failed state.
Jul 14 01:19:53 nfs3 systemd: Starting NFS-Ganesha file server...
Jul 14 01:19:53 nfs3 systemd: Started NFS-Ganesha file server.

Comment 2 Soumya Koduri 2015-07-14 07:25:58 UTC

From nfs-ganesha.service,

[Service]
Type=forking
Environment="NOFILE=1048576"
EnvironmentFile=/etc/sysconfig/ganesha
ExecStart=/usr/bin/ganesha.nfsd $OPTIONS
ExecStartPost=-/bin/bash -c "prlimit --pid $MAINPID --nofile=$NOFILE:$NOFILE"
ExecReload=/bin/dbus-send --system --dest=org.ganesha.nfsd --type=method_call /org/ganesha/nfsd/admin org.ganesha.nfsd.admin.reload
ExecStop=/bin/dbus-send --system --dest=org.ganesha.nfsd --type=method_call /org/ganesha/nfsd/admin org.ganesha.nfsd.admin.shutdown
Restart=on-abort

From the man page,
Restart=

Configures whether the service shall be restarted when the service process exits, is killed, or a timeout is reached. The service process may be the main service process, but it may also be one of the processes specified with ExecStartPre=, ExecStartPost=, ExecStop=, ExecStopPost=, or ExecReload=. When the death of the process is a result of systemd operation (e.g. service stop or restart), the service will not be restarted. Timeouts include missing the watchdog "keep-alive ping" deadline and a service start, reload, and stop operation timeouts.

Takes one of no, on-success, on-failure, on-abnormal, on-watchdog, on-abort, or always. If set to no (the default), the service will not be restarted. If set to on-success, it will be restarted only when the service process exits cleanly. In this context, a clean exit means an exit code of 0, or one of the signals SIGHUP, SIGINT, SIGTERM or SIGPIPE, and additionally, exit statuses and signals specified in SuccessExitStatus=. If set to on-failure, the service will be restarted when the process exits with a non-zero exit code, is terminated by a signal (including on core dump, but excluding the aforementioned four signals), when an operation (such as service reload) times out, and when the configured watchdog timeout is triggered. If set to on-abnormal, the service will be restarted when the process is terminated by a signal (including on core dump, excluding the aforementioned four signals), when an operation times out, or when the watchdog timeout is triggered. If set to on-abort, the service will be restarted only if the service process exits due to an uncaught signal not specified as a clean exit status. If set to on-watchdog, the service will be restarted only if the watchdog timeout for the service expires. If set to always, the service will be restarted regardless of whether it exited cleanly or not, got terminated abnormally by a signal, or hit a timeout.

So we may need to set Restart=no (not sure if can be done via CLI) in nfs-ganesha.service file.

Comment 3 Meghana 2015-07-14 09:44:17 UTC

On killing the ganesha process, systemd restarts nfs-ganesha process by itself on rhel7.1

The nfs-ganesha.service file has been configured for restart=on-abort and this will trigger a restart automatically.
This needs to be prevented in our clustered environment. The whole cluster needs to go to grace before a new NFS-Ganesha instance is started.

Comment 5 monti lawrence 2015-07-23 16:03:45 UTC

Doc text is edited. Please sign off to be included in Known Issues.

Comment 6 Meghana 2015-07-27 09:12:05 UTC

Doc text looks good to me

Comment 10 Niels de Vos 2015-08-24 10:03:28 UTC

Upstream merged https://review.gerrithub.io/243810

Comment 11 Vivek Agarwal 2015-08-24 10:04:17 UTC

based on comment 10, moving this to post.

Comment 12 Apeksha 2015-08-28 10:42:10 UTC

Verified on nfs-ganesha-2.2.0-6.el7rhgs.x86_64

Comment 14 Divya 2015-09-28 10:56:34 UTC

Soumya,

Please review and sign-off the edited doc text.

Comment 15 Soumya Koduri 2015-09-30 08:07:45 UTC

Doc text looks good to me.

Comment 17 errata-xmlrpc 2015-10-05 10:43:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1846.html

Note You need to log in before you can comment on or make changes to this bug.