Bug 1461098 - [Ganesha] Ganesha service failed to start on new node added in existing ganeshacluster
Summary: [Ganesha] Ganesha service failed to start on new node added in existing gane...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: common-ha
Version: rhgs-3.3
Hardware: Unspecified
OS: Unspecified
urgent
unspecified
Target Milestone: ---
: RHGS 3.3.0
Assignee: Kaleb KEITHLEY
QA Contact: Manisha Saini
URL:
Whiteboard:
Depends On: 1435697 1463641 1466144 1466343 1466790 1469027 1471917
Blocks: 1417151 1461186 1465605
TreeView+ depends on / blocked
 
Reported: 2017-06-13 13:55 UTC by Manisha Saini
Modified: 2017-09-21 04:59 UTC (History)
9 users (show)

Fixed In Version: glusterfs-3.8.4-31
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1461186 1463641 1465605 (view as bug list)
Environment:
Last Closed: 2017-09-21 04:59:42 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:2774 0 normal SHIPPED_LIVE glusterfs bug fix and enhancement update 2017-09-21 08:16:29 UTC

Description Manisha Saini 2017-06-13 13:55:24 UTC
Description of problem:
Ganesha service failed on new node added in existing ganesha cluster with permission denied errors while parsing /etc/ganesha/ganesha.conf.
HA cluster fails to come up on new node.


Version-Release number of selected component (if applicable):
# rpm -qa | grep ganesha
nfs-ganesha-gluster-2.4.4-8.el7rhgs.x86_64
nfs-ganesha-debuginfo-2.4.4-8.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-27.el7rhgs.x86_64
nfs-ganesha-2.4.4-8.el7rhgs.x86_64


selinux-policy-3.13.1-161.el7.noarch

How reproducible:
Consistently

Steps to Reproduce:
1.Create a 4 node ganesha cluster.
2.Add a new node to existing ganesha cluster.


Actual results:
Ganesha service failed to start on new node.New node gets failover to other node in cluster.

Expected results:
Pcs cluster should be up and running on new node.

Additional info:

Issue is not observed if I set selinux to permissive mode.


# pcs status
Cluster name: ganesha-ha-360
Stack: corosync
Current DC: dhcp42-119.lab.eng.blr.redhat.com (version 1.1.16-8.el7-94ff4df) - partition with quorum
Last updated: Tue Jun 13 19:18:50 2017
Last change: Tue Jun 13 19:09:27 2017 by root via cibadmin on dhcp42-125.lab.eng.blr.redhat.com

5 nodes configured
30 resources configured

Online: [ dhcp42-114.lab.eng.blr.redhat.com dhcp42-119.lab.eng.blr.redhat.com dhcp42-125.lab.eng.blr.redhat.com dhcp42-127.lab.eng.blr.redhat.com dhcp42-129.lab.eng.blr.redhat.com ]

Full list of resources:

 Clone Set: nfs_setup-clone [nfs_setup]
     Started: [ dhcp42-114.lab.eng.blr.redhat.com dhcp42-119.lab.eng.blr.redhat.com dhcp42-125.lab.eng.blr.redhat.com dhcp42-127.lab.eng.blr.redhat.com dhcp42-129.lab.eng.blr.redhat.com ]
 Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ dhcp42-114.lab.eng.blr.redhat.com dhcp42-119.lab.eng.blr.redhat.com dhcp42-125.lab.eng.blr.redhat.com dhcp42-127.lab.eng.blr.redhat.com dhcp42-129.lab.eng.blr.redhat.com ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ dhcp42-119.lab.eng.blr.redhat.com dhcp42-125.lab.eng.blr.redhat.com dhcp42-127.lab.eng.blr.redhat.com dhcp42-129.lab.eng.blr.redhat.com ]
     Stopped: [ dhcp42-114.lab.eng.blr.redhat.com ]
 Resource Group: dhcp42-119.lab.eng.blr.redhat.com-group
     dhcp42-119.lab.eng.blr.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started dhcp42-119.lab.eng.blr.redhat.com
     dhcp42-119.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp42-119.lab.eng.blr.redhat.com
     dhcp42-119.lab.eng.blr.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started dhcp42-119.lab.eng.blr.redhat.com
 Resource Group: dhcp42-125.lab.eng.blr.redhat.com-group
     dhcp42-125.lab.eng.blr.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started dhcp42-125.lab.eng.blr.redhat.com
     dhcp42-125.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp42-125.lab.eng.blr.redhat.com
     dhcp42-125.lab.eng.blr.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started dhcp42-125.lab.eng.blr.redhat.com
 Resource Group: dhcp42-127.lab.eng.blr.redhat.com-group
     dhcp42-127.lab.eng.blr.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started dhcp42-127.lab.eng.blr.redhat.com
     dhcp42-127.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp42-127.lab.eng.blr.redhat.com
     dhcp42-127.lab.eng.blr.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started dhcp42-127.lab.eng.blr.redhat.com
 Resource Group: dhcp42-129.lab.eng.blr.redhat.com-group
     dhcp42-129.lab.eng.blr.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started dhcp42-129.lab.eng.blr.redhat.com
     dhcp42-129.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp42-129.lab.eng.blr.redhat.com
     dhcp42-129.lab.eng.blr.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started dhcp42-129.lab.eng.blr.redhat.com
 Resource Group: dhcp42-114.lab.eng.blr.redhat.com-group
     dhcp42-114.lab.eng.blr.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Started dhcp42-129.lab.eng.blr.redhat.com
     dhcp42-114.lab.eng.blr.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started dhcp42-129.lab.eng.blr.redhat.com
     dhcp42-114.lab.eng.blr.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Started dhcp42-129.lab.eng.blr.redhat.com

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled


Ganesha.log on new node

13/06/2017 19:05:30 : epoch 4a170000 : dhcp42-114.lab.eng.blr.redhat.com : ganesha.nfsd-5351[Admin] destroy_fsals :FSAL :CRIT :Extra references (1) hanging around to FSAL PSEUDO
13/06/2017 19:05:30 : epoch 4a170000 : dhcp42-114.lab.eng.blr.redhat.com : ganesha.nfsd-5351[Admin] destroy_fsals :FSAL :EVENT :Shutting down handles for FSAL GLUSTER
13/06/2017 19:05:30 : epoch 4a170000 : dhcp42-114.lab.eng.blr.redhat.com : ganesha.nfsd-5351[Admin] destroy_fsals :FSAL :EVENT :Shutting down DS handles for FSAL GLUSTER
13/06/2017 19:05:30 : epoch 4a170000 : dhcp42-114.lab.eng.blr.redhat.com : ganesha.nfsd-5351[Admin] destroy_fsals :FSAL :EVENT :Shutting down exports for FSAL GLUSTER
13/06/2017 19:05:30 : epoch 4a170000 : dhcp42-114.lab.eng.blr.redhat.com : ganesha.nfsd-5351[Admin] destroy_fsals :FSAL :EVENT :Exports for FSAL GLUSTER shut down
13/06/2017 19:05:30 : epoch 4a170000 : dhcp42-114.lab.eng.blr.redhat.com : ganesha.nfsd-5351[Admin] destroy_fsals :FSAL :EVENT :Unloading FSAL GLUSTER
13/06/2017 19:05:30 : epoch 4a170000 : dhcp42-114.lab.eng.blr.redhat.com : ganesha.nfsd-5351[Admin] destroy_fsals :FSAL :EVENT :FSAL GLUSTER unloaded
13/06/2017 19:05:30 : epoch 4a170000 : dhcp42-114.lab.eng.blr.redhat.com : ganesha.nfsd-5351[Admin] do_shutdown :MAIN :EVENT :FSAL system destroyed.
13/06/2017 19:05:30 : epoch 4a170000 : dhcp42-114.lab.eng.blr.redhat.com : ganesha.nfsd-5351[main] nfs_start :MAIN :EVENT :NFS EXIT: regular exit
13/06/2017 19:08:13 : epoch 4a170000 : dhcp42-114.lab.eng.blr.redhat.com : ganesha.nfsd-3496[main] main :MAIN :EVENT :ganesha.nfsd Starting: Ganesha Version 2.4.1
13/06/2017 19:08:13 : epoch 4a170000 : dhcp42-114.lab.eng.blr.redhat.com : ganesha.nfsd-3497[main] main :NFS STARTUP :CRIT :Error (token scan) while parsing (/etc/ganesha/ganesha.conf)
13/06/2017 19:08:13 : epoch 4a170000 : dhcp42-114.lab.eng.blr.redhat.com : ganesha.nfsd-3497[main] config_errs_to_log :CONFIG :CRIT :Config File (<unknown file>:0): new file (/etc/ganesha/ganesha.conf) open error (Permission denied), ignored
13/06/2017 19:08:13 : epoch 4a170000 : dhcp42-114.lab.eng.blr.redhat.com : ganesha.nfsd-3497[main] main :NFS STARTUP :FATAL :Fatal errors.  Server exiting...
~                                                                               ==========================                                                                      
Fowwing AVC's are being observed in audit.log

type=PROCTITLE msg=audit(06/13/2017 19:08:13.721:3729) : proctitle=/usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -E 6431116925980835840 
type=SYSCALL msg=audit(06/13/2017 19:08:13.721:3729) : arch=x86_64 syscall=open success=no exit=EACCES(Permission denied) a0=0x560e3460b490 a1=O_RDONLY a2=0x1b6 a3=0x24 items=0 ppid=1 pid=3497 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset comm=ganesha.nfsd exe=/usr/bin/ganesha.nfsd subj=system_u:system_r:ganesha_t:s0 key=(null) 
type=AVC msg=audit(06/13/2017 19:08:13.721:3729) : avc:  denied  { search } for  pid=3497 comm=ganesha.nfsd name=/ dev="fuse" ino=1 scontext=system_u:system_r:ganesha_t:s0 tcontext=system_u:object_r:fusefs_t:s0 tclass=dir 

============================
# service nfs-ganesha status
Redirecting to /bin/systemctl status nfs-ganesha.service
● nfs-ganesha.service - NFS-Ganesha file server
   Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Tue 2017-06-13 19:08:13 IST; 16min ago
     Docs: http://github.com/nfs-ganesha/nfs-ganesha/wiki
  Process: 3500 ExecStop=/bin/dbus-send --system --dest=org.ganesha.nfsd --type=method_call /org/ganesha/nfsd/admin org.ganesha.nfsd.admin.shutdown (code=exited, status=0/SUCCESS)
  Process: 3498 ExecStartPost=/bin/bash -c prlimit --pid $MAINPID --nofile=$NOFILE:$NOFILE (code=exited, status=1/FAILURE)
  Process: 3496 ExecStart=/bin/bash -c ${NUMACTL} ${NUMAOPTS} /usr/bin/ganesha.nfsd ${OPTIONS} ${EPOCH} (code=exited, status=0/SUCCESS)
 Main PID: 3497 (code=exited, status=2)

Jun 13 19:08:13 dhcp42-114.lab.eng.blr.redhat.com systemd[1]: Starting NFS-Ganesha file server...
Jun 13 19:08:13 dhcp42-114.lab.eng.blr.redhat.com systemd[1]: nfs-ganesha.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Jun 13 19:08:13 dhcp42-114.lab.eng.blr.redhat.com bash[3498]: prlimit: failed to set the NOFILE resource limit: No such process
Jun 13 19:08:13 dhcp42-114.lab.eng.blr.redhat.com systemd[1]: Failed to start NFS-Ganesha file server.
Jun 13 19:08:13 dhcp42-114.lab.eng.blr.redhat.com systemd[1]: Unit nfs-ganesha.service entered failed state.
Jun 13 19:08:13 dhcp42-114.lab.eng.blr.redhat.com systemd[1]: nfs-ganesha.service failed.

Comment 5 Soumya Koduri 2017-06-14 09:29:29 UTC
yes..From BZ#1461186 comments, I think the command provided by Lukas is mandatory but not a work-around. The reason it changed now when compared to previous release is that there are many enhancements done and a separate package now to manage SeLinux permissions needed by ganesha service. Request Lukas to correct.

But the only thing which I do not understand is why this AVC is seen only during add-node. I expect it to throw error while setting up nfs-ganesha cluster too as ganesha service on each node tries to access shared_storage mount which is via fuse. Lukas?

Comment 13 Kaleb KEITHLEY 2017-06-20 17:03:08 UTC
Looking at all the various selinux-policy packages in RHEL and Fedora, there are policies for glusterd and nfs-ganesha in selinux-policy for RHEL7.3.z, RHEL7.4, and Fedora 26 and 27.

Since they're not in Fedora 25 (and before I saw that they're in F26 and F27) I was tempted lift the policy bits from RHEL7 and add -selinux subpackages to gluster and nfs-ganesha. I'm not sure that's a good idea though at this point.

In the mean time it appears that all I need to do for downstream is add either
  semanage boolean -m ganesha_use_fusefs --on
or the
  %selinux_{set,unset}_booleans
to %post ganesha and %postun ganesha in the glusterfs.spec.

I would like to get confirmation from lvrabec that my assumptions are correct.

Comment 15 Atin Mukherjee 2017-06-21 15:40:45 UTC
upstream patch : https://review.gluster.org/17597

Comment 18 Atin Mukherjee 2017-06-23 04:10:11 UTC
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/109845/

Comment 19 Atin Mukherjee 2017-06-23 15:32:06 UTC
One more downstream only patch https://code.engineering.redhat.com/gerrit/#/c/109945/ was required to address puddle failure issue.

Comment 21 Manisha Saini 2017-07-14 06:17:09 UTC
Kaleb,

While selinux is in enforcing mode,ganesha cluster creation fails with the following AVC's which look the same as reported in this bug

AVC's are been obserevd in audit.log


[root@dhcp42-125 ganesha]#  ausearch -m avc -m user_avc -m selinux_err -i -ts recent | grep ganesha
type=USER_AVC msg=audit(07/22/2017 01:40:46.416:1226) : pid=920 uid=dbus auid=unset ses=unset subj=system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 msg='avc:  denied  { send_msg } for msgtype=method_return dest=:1.313 spid=3131 tpid=5917 scontext=system_u:system_r:ganesha_t:s0 tcontext=system_u:system_r:cluster_t:s0 tclass=dbus  exe=/usr/bin/dbus-daemon sauid=dbus hostname=? addr=? terminal=?' 
type=PROCTITLE msg=audit(07/14/2017 11:35:17.355:4459) : proctitle=/usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -E 6442501652914110464 
type=SYSCALL msg=audit(07/14/2017 11:35:17.355:4459) : arch=x86_64 syscall=open success=no exit=EACCES(Permission denied) a0=0x5641bbdc9490 a1=O_RDONLY a2=0x1b6 a3=0x24 items=0 ppid=1 pid=28647 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset comm=ganesha.nfsd exe=/usr/bin/ganesha.nfsd subj=system_u:system_r:ganesha_t:s0 key=(null) 
type=AVC msg=audit(07/14/2017 11:35:17.355:4459) : avc:  denied  { search } for  pid=28647 comm=ganesha.nfsd name=/ dev="fuse" ino=1 scontext=system_u:system_r:ganesha_t:s0 tcontext=system_u:object_r:fusefs_t:s0 tclass=dir 
type=PROCTITLE msg=audit(07/14/2017 11:36:19.774:4473) : proctitle=/usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -E 6442503847642398720 
type=SYSCALL msg=audit(07/14/2017 11:36:19.774:4473) : arch=x86_64 syscall=open success=no exit=EACCES(Permission denied) a0=0x55b364ac6490 a1=O_RDONLY a2=0x1b6 a3=0x24 items=0 ppid=1 pid=28709 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset comm=ganesha.nfsd exe=/usr/bin/ganesha.nfsd subj=system_u:system_r:ganesha_t:s0 key=(null) 
type=AVC msg=audit(07/14/2017 11:36:19.774:4473) : avc:  denied  { search } for  pid=28709 comm=ganesha.nfsd name=/ dev="fuse" ino=1 scontext=system_u:system_r:ganesha_t:s0 tcontext=system_u:object_r:fusefs_t:s0 tclass=dir 

# rpm -qa | grep ganesha
nfs-ganesha-gluster-2.4.4-15.el7rhgs.x86_64
nfs-ganesha-2.4.4-15.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-33.el7rhgs.x86_64


# rpm -qa | grep selinux
libselinux-2.5-11.el7.x86_64
libselinux-python-2.5-11.el7.x86_64
libselinux-utils-2.5-11.el7.x86_64
selinux-policy-targeted-3.13.1-166.el7.noarch
libselinux-2.5-11.el7.i686
selinux-policy-3.13.1-166.el7.noarch



[root@dhcp42-125 ganesha]# semanage boolean -l | grep gluster
gluster_anon_write             (off  ,  off)  Allow gluster to anon write
gluster_export_all_ro          (off  ,  off)  Allow gluster to export all ro
gluster_use_execmem            (on   ,   on)  Allow gluster to use execmem
gluster_export_all_rw          (on   ,   on)  Allow gluster to export all rw
virt_use_glusterd              (off  ,  off)  Allow virt to use glusterd

[root@dhcp42-125 ganesha]# semanage boolean -l | grep ganesha
ganesha_use_fusefs             (off  ,  off)  Allow ganesha to use fusefs

Comment 22 Kaleb KEITHLEY 2017-07-14 12:08:10 UTC
It's not clear to me which process is trying to send the dbus msg and failing due to SELinux.

There are, I believe, three places that that dbus messages are sent. One by systemd/stop, two by the ganesha_grace RA, and the third is by ganesha.nfsd itself to ping the dbus daemon.

On top of which it is the target of dbus signals.

In the log output it looks like ganesha.nfsd itself has tried to send and been denied, but I'm not certain. Maybe Lukas can tell? And tell us what we need to set to prevent the denial? Thanks

Comment 24 Manisha Saini 2017-07-18 16:42:46 UTC
Kaleb,

For a new node to be added in existing ganesha cluster, ganesha_use_fusefs boolean will be off by default.And if we do add node ,then for the new node which is added in cluster,nfs -ganesha process will not come up because this boolean is not set.According to my understanding,We are setting this boolean as the part of gluster nfs-ganesha enable command (which will not run as a part of add node command).

So do we need to document that,if the new node is added then user needs to enable this boolean on new node manually before running add node command?

Comment 25 Kaleb KEITHLEY 2017-07-18 17:34:32 UTC
ganesha_use_fusefs is enabled when the glusterfs-ganesha package is installed or updated.

When a node is added to an existing ganesha cluster the ganesha_use_fusefs should be already set. (Delta the bug with updates to selinux due to the order that updates are applied originally reported on https://bugzilla.redhat.com/show_bug.cgi?id=1469561 for 3.2.z or 3.2 async. I can't find the BZ for 3.3.0.)

However this BZ is about ganesha using dbus and lvrabec says in comment 23 above that the dbus issue is fixed in selinux for RHEL7.4. I don't see how the order that updates are applied could affect the dbus fix.

Let me know if this doesn't answer your question.

Comment 26 Manisha Saini 2017-07-18 18:36:25 UTC
Thanks Kaleb for the clarification.Probably my setup was updated from intermittent ganesha builds and as a part of upgrade this boolean is not set to ON by default.

Verified this bug on 

# rpm -qa | grep ganesha
nfs-ganesha-gluster-2.4.4-16.el7rhgs.x86_64
nfs-ganesha-2.4.4-16.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-34.el7rhgs.x86_64
nfs-ganesha-debuginfo-2.4.4-16.el7rhgs.x86_64

selinux-policy-3.13.1-166.el7.noarch


Ganesha setup creation and add node works fine with this build.Moving this bug to verified state.

Comment 28 errata-xmlrpc 2017-09-21 04:59:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774


Note You need to log in before you can comment on or make changes to this bug.