Description of problem: On killing the ganesha process, systemd restarts nfs-ganesha process by itself on rhel7.1 Version-Release number of selected component (if applicable): glusterfs-3.7.1-9.el7rhgs.x86_64 nfs-ganesha-2.2.0-5.el7rhgs.x86_64 How reproducible: Always Steps to Reproduce: 1. Setup the ganesha 4 node cluster on rhel7.1 2. Now kill the ganesha process on one of the servers 3. systemd restarts the nfs-ganesha process by itself, failover doesnt happen [root@nfs3 ~]# ps aux | grep ganesha root 18967 0.2 1.3 2212224 109824 ? Ssl 01:10 0:00 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT root 20826 0.0 0.0 112640 928 pts/0 S+ 01:13 0:00 grep --color=auto ganesha [root@nfs3 ~]# [root@nfs3 ~]# [root@nfs3 ~]# [root@nfs3 ~]# [root@nfs3 ~]# kill -9 18967 [root@nfs3 ~]# ps aux | grep ganesha root 20957 17.0 1.2 2015616 102168 ? Ssl 01:13 0:00 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT root 21004 0.0 0.0 112640 924 pts/0 S+ 01:13 0:00 grep --color=auto ganesha Actual results:systemd restarts the nfs-ganesha process by itself, failover doesnt happen Expected results: nfs-ganesha process must not restart, failover must happen Additional info: var/log/messages: Jul 14 01:13:30 nfs3 rpc.statd[11887]: Received SM_UNMON_ALL request from nfs3 while not monitoring any hosts Jul 14 01:19:36 nfs3 systemd-logind: New session 227 of user root. Jul 14 01:19:36 nfs3 systemd: Starting Session 227 of user root. Jul 14 01:19:36 nfs3 systemd: Started Session 227 of user root. Jul 14 01:19:37 nfs3 systemd-logind: Removed session 227. Jul 14 01:19:52 nfs3 systemd: nfs-ganesha.service: main process exited, code=killed, status=9/KILL Jul 14 01:19:52 nfs3 systemd: Unit nfs-ganesha.service entered failed state. Jul 14 01:19:53 nfs3 systemd: nfs-ganesha.service holdoff time over, scheduling restart. Jul 14 01:19:53 nfs3 systemd: Stopping NFS-Ganesha file server... Jul 14 01:19:53 nfs3 systemd: Starting NFS status monitor for NFSv2/3 locking.... Jul 14 01:19:53 nfs3 rpc.statd: Statd service already running! Jul 14 01:19:53 nfs3 systemd: nfs-ganesha-lock.service: control process exited, code=exited status=1 Jul 14 01:19:53 nfs3 systemd: Failed to start NFS status monitor for NFSv2/3 locking.. Jul 14 01:19:53 nfs3 systemd: Unit nfs-ganesha-lock.service entered failed state. Jul 14 01:19:53 nfs3 systemd: Starting NFS-Ganesha file server... Jul 14 01:19:53 nfs3 systemd: Started NFS-Ganesha file server.
From nfs-ganesha.service, [Service] Type=forking Environment="NOFILE=1048576" EnvironmentFile=/etc/sysconfig/ganesha ExecStart=/usr/bin/ganesha.nfsd $OPTIONS ExecStartPost=-/bin/bash -c "prlimit --pid $MAINPID --nofile=$NOFILE:$NOFILE" ExecReload=/bin/dbus-send --system --dest=org.ganesha.nfsd --type=method_call /org/ganesha/nfsd/admin org.ganesha.nfsd.admin.reload ExecStop=/bin/dbus-send --system --dest=org.ganesha.nfsd --type=method_call /org/ganesha/nfsd/admin org.ganesha.nfsd.admin.shutdown Restart=on-abort From the man page, Restart= Configures whether the service shall be restarted when the service process exits, is killed, or a timeout is reached. The service process may be the main service process, but it may also be one of the processes specified with ExecStartPre=, ExecStartPost=, ExecStop=, ExecStopPost=, or ExecReload=. When the death of the process is a result of systemd operation (e.g. service stop or restart), the service will not be restarted. Timeouts include missing the watchdog "keep-alive ping" deadline and a service start, reload, and stop operation timeouts. Takes one of no, on-success, on-failure, on-abnormal, on-watchdog, on-abort, or always. If set to no (the default), the service will not be restarted. If set to on-success, it will be restarted only when the service process exits cleanly. In this context, a clean exit means an exit code of 0, or one of the signals SIGHUP, SIGINT, SIGTERM or SIGPIPE, and additionally, exit statuses and signals specified in SuccessExitStatus=. If set to on-failure, the service will be restarted when the process exits with a non-zero exit code, is terminated by a signal (including on core dump, but excluding the aforementioned four signals), when an operation (such as service reload) times out, and when the configured watchdog timeout is triggered. If set to on-abnormal, the service will be restarted when the process is terminated by a signal (including on core dump, excluding the aforementioned four signals), when an operation times out, or when the watchdog timeout is triggered. If set to on-abort, the service will be restarted only if the service process exits due to an uncaught signal not specified as a clean exit status. If set to on-watchdog, the service will be restarted only if the watchdog timeout for the service expires. If set to always, the service will be restarted regardless of whether it exited cleanly or not, got terminated abnormally by a signal, or hit a timeout. So we may need to set Restart=no (not sure if can be done via CLI) in nfs-ganesha.service file.
On killing the ganesha process, systemd restarts nfs-ganesha process by itself on rhel7.1 The nfs-ganesha.service file has been configured for restart=on-abort and this will trigger a restart automatically. This needs to be prevented in our clustered environment. The whole cluster needs to go to grace before a new NFS-Ganesha instance is started.
Doc text is edited. Please sign off to be included in Known Issues.
Doc text looks good to me
Upstream merged https://review.gerrithub.io/243810
based on comment 10, moving this to post.
Verified on nfs-ganesha-2.2.0-6.el7rhgs.x86_64
Soumya, Please review and sign-off the edited doc text.
Doc text looks good to me.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1846.html