Description of problem: NFS-ganesha is not getting started properly after node reboot. Version-Release number of selected component (if applicable): nfs-ganesha-gluster-2.4.1-9.el7rhgs.x86_64 nfs-ganesha-2.4.1-9.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-18.el7rhgs.x86_64 glusterfs-3.8.4-18.el7rhgs.x86_64 How reproducible: Every time Steps to Reproduce: 1. Create 2 node HA cluster for NFS-Ganesha 2. Make nfs-ganesha service enable after reboot 3. Reboot any of the node 4. After reboot nfs-ganesha service status shows failed state. Actual results: nfs-ganesha service status shows failed state Expected results: nfs-ganesha service should be up & running Additional info: /var/log/messages reports like below : >>>> May 3 12:21:21 example.com nfs-ganesha[1881]: [main] main :MAIN :EVENT :ganesha.nfsd Starting: Ganesha Version /builddir/build/BUILD/nfs-ganesha-2.4.1/src, built at Mar 8 2017 01:58:38 on May 3 12:21:21 example.com nfs-ganesha[1883]: [main] main :NFS STARTUP :CRIT :Error (token scan) while parsing (/etc/ganesha/ganesha.conf) May 3 12:21:21 example.com nfs-ganesha[1883]: [main] config_errs_to_log :CONFIG :CRIT :Config File (<unknown file>:0): new file (/etc/ganesha/ganesha.conf) open error (No such file or directory), ignored May 3 12:21:21 example.com nfs-ganesha[1883]: [main] main :NFS STARTUP :FATAL :Fatal errors. Server exiting... May 3 12:21:21 example.com systemd: nfs-ganesha.service: main process exited, code=exited, status=2/INVALIDARGUMENT May 3 12:21:21 example.com systemd: Unit nfs-ganesha.service entered failed state. May 3 12:21:21 example.com systemd: nfs-ganesha.service failed. May 3 12:21:24 example.com systemd: Mounting /var/run/gluster/shared_storage... May 3 12:21:24 example.com systemd: Mounting FUSE Control File System... >>>>
Additional Info : - With RHGS 3.2 /var/run/gluster/shared_storage/nfs-ganesha is used to store config related to nfs-ganesha which are ganesha.conf and ganesha-ha.conf - During boot-up it is still not possible to assure that shared_storage will get mounted before nfs-ganesha start-up - Since due to unavailability of configuration files stored in /var/run/gluster/shared_storage/nfs-ganesha start-up of nfs-ganesha fails. So could we make nfs-ganesha start automatically after node reboot rather than doing it manually ?
To be able to start nfs-ganesha automatically post reboot, we need shared_storage to be available. There was already a bug filed wrt shared_storage issues (post reboot) by QE - bug1335090. We need to first fix that BZ (mostly by adding service script file to reliably bring up shared_storage mount point). Post that as part of this bug, we can make nfs-ganesha dependent on that newly added shared_storage service.
Created attachment 1290683 [details] Start-nfs-ganesha-only-if-share-storage-mount-got-su.patch
I have submitted a patch https://bugzilla.redhat.com/show_bug.cgi?id=1466007 to fix the dependency issue. Following is the change --- a/src/scripts/systemd/nfs-ganesha.service +++ b/src/scripts/systemd/nfs-ganesha.service @@ -28,6 +28,8 @@ ExecStart=/bin/bash -c "${NUMACTL} ${NUMAOPTS} /usr/bin/ganesha.nfsd ${OPTIONS} ExecStartPost=-/bin/bash -c "prlimit --pid $MAINPID --nofile=$NOFILE:$NOFILE" ExecReload=/bin/dbus-send --system --dest=org.ganesha.nfsd --type=method_call /org/ganesha/nfsd/admin org.ganesha.nfsd.admin.reload ExecStop=/bin/dbus-send --system --dest=org.ganesha.nfsd --type=method_call /org/ganesha/nfsd/admin org.ganesha.nfsd.admin.shutdown +Restart=on-failure +RestartSec=3 +RestartPreventExitStatus=SIGABRT SIGKILL SIGSEGV By this change systemd will restart nfs-ganesha in every 3 sec when it entered in a failed state(which excludes abrt/segfault/oom kill/ manual kill[kill -9]) Is okay to have this ability? Please provide your thoughts on the same
"..will restart nfs-ganesha in every 3 sec when it entered in a failed state.." What happens after node reboot after this fix? Will ganesha services keep on restarting?
(In reply to Alok from comment #11) > "..will restart nfs-ganesha in every 3 sec when it entered in a failed > state.." > What happens after node reboot after this fix? Will ganesha services keep on > restarting? Yes it keep on restarting until it got succeed
Jiffin, I had a 4 node ganesha cluster up and running created from gdeploy.I rebooted all the nodes at once to test -https://bugzilla.redhat.com/show_bug.cgi?id=1335090 When all the nodes came up after reboot,shared_storage was mounted on all the nodes.Out of 4 nodes,on 1 node ganesha service is not automatically started after node reboot and shows in failed state. According to this fix,ganesha service should come up on its own on all the nodes. Please confirm the behaviour? ganesha.log ======== 19/07/2017 23:51:01 : epoch 8a370000 : dhcp42-119.lab.eng.blr.redhat.com : ganesha.nfsd-4187[main] main :MAIN :EVENT :ganesha.nfsd Starting: Ganesha Version 2.4.1 19/07/2017 23:51:04 : epoch 8a370000 : dhcp42-119.lab.eng.blr.redhat.com : ganesha.nfsd-4198[main] main :NFS STARTUP :CRIT :Error (token scan) while parsing (/etc/ganesha/ganesha.conf) 19/07/2017 23:51:04 : epoch 8a370000 : dhcp42-119.lab.eng.blr.redhat.com : ganesha.nfsd-4198[main] config_errs_to_log :CONFIG :CRIT :Config File (<unknown file>:0): new file (/etc/ganesha/ganesha.conf) open error (Transport endpoint is not connected), ignored 19/07/2017 23:51:04 : epoch 8a370000 : dhcp42-119.lab.eng.blr.redhat.com : ganesha.nfsd-4198[main] main :NFS STARTUP :FATAL :Fatal errors. Server exiting... 19/07/2017 23:51:07 : epoch 8a370000 : dhcp42-119.lab.eng.blr.redhat.com : ganesha.nfsd-11026[main] main :MAIN :EVENT :ganesha.nfsd Starting: Ganesha Version 2.4.1 19/07/2017 23:51:07 : epoch 8a370000 : dhcp42-119.lab.eng.blr.redhat.com : ganesha.nfsd-11037[main] main :NFS STARTUP :CRIT :Error (token scan) while parsing (/etc/ganesha/ganesha.conf) 19/07/2017 23:51:07 : epoch 8a370000 : dhcp42-119.lab.eng.blr.redhat.com : ganesha.nfsd-11037[main] config_errs_to_log :CONFIG :CRIT :Config File (<unknown file>:0): new file (/etc/ganesha/ganesha.conf) open error (No such file or directory), ignored 19/07/2017 23:51:07 : epoch 8a370000 : dhcp42-119.lab.eng.blr.redhat.com : ganesha.nfsd-11037[main] main :NFS STARTUP :FATAL :Fatal errors. Server exiting... ======== [root@dhcp42-125 ~]# ll /var/lib/nfs lrwxrwxrwx. 1 root root 81 Jul 19 23:51 /var/lib/nfs -> /var/run/gluster/shared_storage/nfs-ganesha/dhcp42-125.lab.eng.blr.redhat.com/nfs ]# df -hT Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/rhel_dhcp42--119-root xfs 17G 4.4G 13G 26% / devtmpfs devtmpfs 3.9G 0 3.9G 0% /dev tmpfs tmpfs 3.9G 54M 3.8G 2% /dev/shm tmpfs tmpfs 3.9G 8.5M 3.9G 1% /run tmpfs tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup /dev/sda1 xfs 1014M 233M 782M 23% /boot /dev/mapper/vg2-lv2 xfs 20G 50M 20G 1% /gluster/brick2 /dev/mapper/vg3-lv3 xfs 20G 56M 20G 1% /gluster/brick3 /dev/mapper/vg1-lv1 xfs 20G 57M 20G 1% /gluster/brick1 tmpfs tmpfs 783M 0 783M 0% /run/user/0 localhost:/gluster_shared_storage fuse.glusterfs 17G 14G 3.1G 82% /run/gluster/shared_storage # service nfs-ganesha status Redirecting to /bin/systemctl status nfs-ganesha.service ● nfs-ganesha.service - NFS-Ganesha file server Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha.service; enabled; vendor preset: disabled) Active: inactive (dead) since Wed 2017-07-19 23:51:07 IST; 15min ago Docs: http://github.com/nfs-ganesha/nfs-ganesha/wiki Process: 11040 ExecStop=/bin/dbus-send --system --dest=org.ganesha.nfsd --type=method_call /org/ganesha/nfsd/admin org.ganesha.nfsd.admin.shutdown (code=exited, status=0/SUCCESS) Process: 11039 ExecStartPost=/bin/bash -c prlimit --pid $MAINPID --nofile=$NOFILE:$NOFILE (code=exited, status=1/FAILURE) Process: 11026 ExecStart=/bin/bash -c ${NUMACTL} ${NUMAOPTS} /usr/bin/ganesha.nfsd ${OPTIONS} ${EPOCH} (code=exited, status=0/SUCCESS) Main PID: 4198 (code=exited, status=2) Jul 19 23:51:07 dhcp42-119.lab.eng.blr.redhat.com systemd[1]: Starting NFS-Ganesha file server... Jul 19 23:51:07 dhcp42-119.lab.eng.blr.redhat.com bash[11039]: prlimit: invalid PID argument: '--nofile=1048576:1048576' Jul 19 23:51:07 dhcp42-119.lab.eng.blr.redhat.com systemd[1]: Started NFS-Ganesha file server. # rpm -qa | grep ganesha nfs-ganesha-gluster-2.4.4-16.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-34.el7rhgs.x86_64 nfs-ganesha-2.4.4-16.el7rhgs.x86_64
IMO " and NFS-Ganesha will start post reboot " can be removed, otherwise doc text looks good to me
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:2779
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days