Bug 1451981 - [GSS] NFS-ganesha is not getting started properly after node reboot. [NEEDINFO]
Summary: [GSS] NFS-ganesha is not getting started properly after node reboot.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: nfs-ganesha
Version: rhgs-3.2
Hardware: x86_64
OS: Unspecified
unspecified
high
Target Milestone: ---
: RHGS 3.3.0
Assignee: Jiffin
QA Contact: Manisha Saini
URL:
Whiteboard:
Depends On: 1335090 1452527
Blocks: 1417151
TreeView+ depends on / blocked
 
Reported: 2017-05-18 05:04 UTC by Abhishek Kumar
Modified: 2021-03-11 15:13 UTC (History)
15 users (show)

Fixed In Version: nfs-ganesha-2.4.4-11
Doc Type: Bug Fix
Doc Text:
The NFS-Ganesha configuration file is stored in shared storage and if the shared storage is not mounted, then the NFS-Ganesha service will not start. With this fix, system init scripts have been defined and updated to make sure that shared storage is mounted before starting the NFS-Ganesha service and NFS-Ganesha will start post reboot.
Clone Of:
Environment:
Last Closed: 2017-09-21 04:47:57 UTC
Target Upstream Version:
jthottan: needinfo? (bmohanra)


Attachments (Terms of Use)
Start-nfs-ganesha-only-if-share-storage-mount-got-su.patch (1.12 KB, patch)
2017-06-22 12:22 UTC, Jiffin
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:2779 0 normal SHIPPED_LIVE nfs-ganesha bug fix and enhancement update 2017-09-21 08:17:17 UTC

Description Abhishek Kumar 2017-05-18 05:04:14 UTC
Description of problem:

NFS-ganesha is not getting started properly after node reboot.

Version-Release number of selected component (if applicable):

nfs-ganesha-gluster-2.4.1-9.el7rhgs.x86_64
nfs-ganesha-2.4.1-9.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-18.el7rhgs.x86_64
glusterfs-3.8.4-18.el7rhgs.x86_64

How reproducible:

Every time

Steps to Reproduce:
1. Create 2 node HA cluster for NFS-Ganesha
2. Make nfs-ganesha service enable after reboot
3. Reboot any of the node
4. After reboot nfs-ganesha service status shows failed state.

Actual results:

nfs-ganesha service status shows failed state

Expected results:

nfs-ganesha service should be up & running

Additional info:
/var/log/messages reports like below :

>>>>
May  3 12:21:21 example.com nfs-ganesha[1881]: [main] main :MAIN :EVENT :ganesha.nfsd Starting: Ganesha Version /builddir/build/BUILD/nfs-ganesha-2.4.1/src, built at Mar  8 2017 01:58:38 on
May  3 12:21:21 example.com nfs-ganesha[1883]: [main] main :NFS STARTUP :CRIT :Error (token scan) while parsing (/etc/ganesha/ganesha.conf)
May  3 12:21:21 example.com nfs-ganesha[1883]: [main] config_errs_to_log :CONFIG :CRIT :Config File (<unknown file>:0): new file (/etc/ganesha/ganesha.conf) open error (No such file or directory), ignored
May  3 12:21:21 example.com nfs-ganesha[1883]: [main] main :NFS STARTUP :FATAL :Fatal errors.  Server exiting...
May  3 12:21:21 example.com systemd: nfs-ganesha.service: main process exited, code=exited, status=2/INVALIDARGUMENT
May  3 12:21:21 example.com systemd: Unit nfs-ganesha.service entered failed state.
May  3 12:21:21 example.com systemd: nfs-ganesha.service failed.
May  3 12:21:24 example.com systemd: Mounting /var/run/gluster/shared_storage...
May  3 12:21:24 example.com systemd: Mounting FUSE Control File System...
>>>>

Comment 2 Abhishek Kumar 2017-05-18 05:10:49 UTC
Additional Info :

- With RHGS 3.2 /var/run/gluster/shared_storage/nfs-ganesha is used to store config related to nfs-ganesha which are ganesha.conf and ganesha-ha.conf
    
- During boot-up it is still not possible to assure that shared_storage will get mounted before nfs-ganesha start-up
    
- Since due to unavailability of configuration files stored in /var/run/gluster/shared_storage/nfs-ganesha start-up of nfs-ganesha fails.


So could we make nfs-ganesha start automatically after node reboot rather than doing it manually ?

Comment 4 Soumya Koduri 2017-05-19 06:45:27 UTC
To be able to start nfs-ganesha automatically post reboot, we need shared_storage to be available. There was already a bug filed wrt shared_storage issues (post reboot) by QE - bug1335090. We need to first fix that BZ (mostly by adding service script file to reliably bring up shared_storage mount point). Post that as part of this bug, we can make nfs-ganesha dependent on that newly added shared_storage service.

Comment 8 Jiffin 2017-06-22 12:22:54 UTC
Created attachment 1290683 [details]
Start-nfs-ganesha-only-if-share-storage-mount-got-su.patch

Comment 10 Jiffin 2017-07-13 05:57:01 UTC
I have submitted a patch https://bugzilla.redhat.com/show_bug.cgi?id=1466007 to fix the dependency issue.
Following is the change
--- a/src/scripts/systemd/nfs-ganesha.service
+++ b/src/scripts/systemd/nfs-ganesha.service
@@ -28,6 +28,8 @@ ExecStart=/bin/bash -c "${NUMACTL} ${NUMAOPTS} /usr/bin/ganesha.nfsd ${OPTIONS}
 ExecStartPost=-/bin/bash -c "prlimit --pid $MAINPID --nofile=$NOFILE:$NOFILE"
 ExecReload=/bin/dbus-send --system   --dest=org.ganesha.nfsd --type=method_call /org/ganesha/nfsd/admin  org.ganesha.nfsd.admin.reload
 ExecStop=/bin/dbus-send --system   --dest=org.ganesha.nfsd --type=method_call /org/ganesha/nfsd/admin org.ganesha.nfsd.admin.shutdown
+Restart=on-failure
+RestartSec=3
+RestartPreventExitStatus=SIGABRT SIGKILL SIGSEGV

By this change systemd will restart nfs-ganesha in every 3 sec when it entered in a failed state(which excludes abrt/segfault/oom kill/ manual kill[kill -9])

Is okay to have this ability?
Please provide your thoughts on the same

Comment 11 Alok 2017-07-13 08:45:07 UTC
"..will restart nfs-ganesha in every 3 sec when it entered in a failed state.."
What happens after node reboot after this fix? Will ganesha services keep on restarting?

Comment 12 Jiffin 2017-07-13 11:48:20 UTC
(In reply to Alok from comment #11)
> "..will restart nfs-ganesha in every 3 sec when it entered in a failed
> state.."
> What happens after node reboot after this fix? Will ganesha services keep on
> restarting?
Yes it keep on restarting until it got succeed

Comment 13 Manisha Saini 2017-07-19 18:38:59 UTC
Jiffin,

I had a 4 node ganesha cluster up and running created from gdeploy.I rebooted all the nodes at once to test -https://bugzilla.redhat.com/show_bug.cgi?id=1335090

When all the nodes came up after reboot,shared_storage was mounted on all the nodes.Out of 4 nodes,on 1 node ganesha service is not automatically started after node reboot and shows in failed state.

According to this fix,ganesha service should come up on its own on all the nodes.
Please confirm the behaviour?




ganesha.log

========

19/07/2017 23:51:01 : epoch 8a370000 : dhcp42-119.lab.eng.blr.redhat.com : ganesha.nfsd-4187[main] main :MAIN :EVENT :ganesha.nfsd Starting: Ganesha Version 2.4.1
19/07/2017 23:51:04 : epoch 8a370000 : dhcp42-119.lab.eng.blr.redhat.com : ganesha.nfsd-4198[main] main :NFS STARTUP :CRIT :Error (token scan) while parsing (/etc/ganesha/ganesha.conf)
19/07/2017 23:51:04 : epoch 8a370000 : dhcp42-119.lab.eng.blr.redhat.com : ganesha.nfsd-4198[main] config_errs_to_log :CONFIG :CRIT :Config File (<unknown file>:0): new file (/etc/ganesha/ganesha.conf) open error (Transport endpoint is not connected), ignored
19/07/2017 23:51:04 : epoch 8a370000 : dhcp42-119.lab.eng.blr.redhat.com : ganesha.nfsd-4198[main] main :NFS STARTUP :FATAL :Fatal errors.  Server exiting...
19/07/2017 23:51:07 : epoch 8a370000 : dhcp42-119.lab.eng.blr.redhat.com : ganesha.nfsd-11026[main] main :MAIN :EVENT :ganesha.nfsd Starting: Ganesha Version 2.4.1
19/07/2017 23:51:07 : epoch 8a370000 : dhcp42-119.lab.eng.blr.redhat.com : ganesha.nfsd-11037[main] main :NFS STARTUP :CRIT :Error (token scan) while parsing (/etc/ganesha/ganesha.conf)
19/07/2017 23:51:07 : epoch 8a370000 : dhcp42-119.lab.eng.blr.redhat.com : ganesha.nfsd-11037[main] config_errs_to_log :CONFIG :CRIT :Config File (<unknown file>:0): new file (/etc/ganesha/ganesha.conf) open error (No such file or directory), ignored
19/07/2017 23:51:07 : epoch 8a370000 : dhcp42-119.lab.eng.blr.redhat.com : ganesha.nfsd-11037[main] main :NFS STARTUP :FATAL :Fatal errors.  Server exiting...

========




[root@dhcp42-125 ~]# ll /var/lib/nfs
lrwxrwxrwx. 1 root root 81 Jul 19 23:51 /var/lib/nfs -> /var/run/gluster/shared_storage/nfs-ganesha/dhcp42-125.lab.eng.blr.redhat.com/nfs

]# df -hT
Filesystem                        Type            Size  Used Avail Use% Mounted on
/dev/mapper/rhel_dhcp42--119-root xfs              17G  4.4G   13G  26% /
devtmpfs                          devtmpfs        3.9G     0  3.9G   0% /dev
tmpfs                             tmpfs           3.9G   54M  3.8G   2% /dev/shm
tmpfs                             tmpfs           3.9G  8.5M  3.9G   1% /run
tmpfs                             tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/sda1                         xfs            1014M  233M  782M  23% /boot
/dev/mapper/vg2-lv2               xfs              20G   50M   20G   1% /gluster/brick2
/dev/mapper/vg3-lv3               xfs              20G   56M   20G   1% /gluster/brick3
/dev/mapper/vg1-lv1               xfs              20G   57M   20G   1% /gluster/brick1
tmpfs                             tmpfs           783M     0  783M   0% /run/user/0
localhost:/gluster_shared_storage fuse.glusterfs   17G   14G  3.1G  82% /run/gluster/shared_storage



# service nfs-ganesha status
Redirecting to /bin/systemctl status nfs-ganesha.service
● nfs-ganesha.service - NFS-Ganesha file server
   Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Wed 2017-07-19 23:51:07 IST; 15min ago
     Docs: http://github.com/nfs-ganesha/nfs-ganesha/wiki
  Process: 11040 ExecStop=/bin/dbus-send --system --dest=org.ganesha.nfsd --type=method_call /org/ganesha/nfsd/admin org.ganesha.nfsd.admin.shutdown (code=exited, status=0/SUCCESS)
  Process: 11039 ExecStartPost=/bin/bash -c prlimit --pid $MAINPID --nofile=$NOFILE:$NOFILE (code=exited, status=1/FAILURE)
  Process: 11026 ExecStart=/bin/bash -c ${NUMACTL} ${NUMAOPTS} /usr/bin/ganesha.nfsd ${OPTIONS} ${EPOCH} (code=exited, status=0/SUCCESS)
 Main PID: 4198 (code=exited, status=2)

Jul 19 23:51:07 dhcp42-119.lab.eng.blr.redhat.com systemd[1]: Starting NFS-Ganesha file server...
Jul 19 23:51:07 dhcp42-119.lab.eng.blr.redhat.com bash[11039]: prlimit: invalid PID argument: '--nofile=1048576:1048576'
Jul 19 23:51:07 dhcp42-119.lab.eng.blr.redhat.com systemd[1]: Started NFS-Ganesha file server.



# rpm -qa | grep ganesha
nfs-ganesha-gluster-2.4.4-16.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-34.el7rhgs.x86_64
nfs-ganesha-2.4.4-16.el7rhgs.x86_64

Comment 21 Jiffin 2017-09-11 11:57:32 UTC
IMO " and NFS-Ganesha will start post reboot " can be removed, otherwise doc text looks good to me

Comment 23 errata-xmlrpc 2017-09-21 04:47:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:2779


Note You need to log in before you can comment on or make changes to this bug.