Bug 1240614

Summary:	Gluster nfs started running on one of the nodes of ganesha cluster, even though ganesha was running on it
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Apeksha <akhakhar>
Component:	nfs-ganesha	Assignee:	Kaleb KEITHLEY <kkeithle>
Status:	CLOSED ERRATA	QA Contact:	Apeksha <akhakhar>
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.1	CC:	akhakhar, asrivast, jthottan, kkeithle, ndevos, nlevinki, saujain, skoduri, vagarwal
Target Milestone:	---	Keywords:	ZStream
Target Release:	RHGS 3.1.1
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-3.7.1-12	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-10-05 07:18:37 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1226817, 1251857, 1254419
Bug Blocks:	1251815

Description Apeksha 2015-07-07 11:28:16 UTC

Description of problem:
Gluster nfs started running on one of the nodes of ganesha cluster, even though ganesha was running on it

Version-Release number of selected component (if applicable):
glusterfs-3.7.1-7.el6rhs.x86_64
nfs-ganesha-2.2.0-3.el6rhs.x86_64

How reproducible:
Once

Steps to Reproduce:
1. Running root-squash automated testcases
2. Created a volume testvol, performed few rootquash testcases, deleted the volume
3. Now created 2 new volumes nfsvol1 and nfsvol2, and enabled ganesha on it.
On one of the server it got exported as nfs volumes and not ganesha volumes, nfs process started on it
On all the other nodes it got exported as ganesha volumes

Actual results:On one of the server it got exported as nfs volumes and not ganesha volumes, nfs process started on it
On all the other nodes it got exported as ganesha volumes

Expected results: On all the servers it must be exported as ganesha volume, nfs process should not start on any of the servers when ganesha process is running


Additional info:
[root@vm1 distaf]# gluster v status
Status of volume: gluster_shared_storage
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.46.64:/var/run/gluster/ss_brick 49152     0          Y       13357
Brick 10.70.46.63:/var/run/gluster/ss_brick 49152     0          Y       12916
Brick 10.70.46.59:/var/run/gluster/ss_brick 49152     0          Y       14208
Self-heal Daemon on localhost               N/A       N/A        Y       14266
Self-heal Daemon on 10.70.46.65             N/A       N/A        Y       15524
Self-heal Daemon on 10.70.46.69             N/A       N/A        Y       18675
Self-heal Daemon on 10.70.46.62             N/A       N/A        Y       27717
Self-heal Daemon on 10.70.46.60             N/A       N/A        Y       32501
Self-heal Daemon on 10.70.46.64             N/A       N/A        Y       18666
Self-heal Daemon on 10.70.46.63             N/A       N/A        Y       25384
Self-heal Daemon on 10.70.46.68             N/A       N/A        Y       18476
 
Task Status of Volume gluster_shared_storage
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: nfsvol1
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.46.59:/rhs/brick1/brick1/nfsvol
1_brick0                                    49199     0          Y       13694
Brick 10.70.46.64:/rhs/brick1/brick1/nfsvol
1_brick1                                    49199     0          Y       18151
Brick 10.70.46.63:/rhs/brick1/brick1/nfsvol
1_brick2                                    49199     0          Y       24941
Brick 10.70.46.60:/rhs/brick1/brick0/nfsvol
1_brick3                                    49198     0          Y       32057
Brick 10.70.46.62:/rhs/brick1/brick0/nfsvol
1_brick4                                    49175     0          Y       27300
Brick 10.70.46.65:/rhs/brick1/brick0/nfsvol
1_brick5                                    49175     0          Y       15127
Brick 10.70.46.68:/rhs/brick1/brick0/nfsvol
1_brick6                                    49175     0          Y       18075
Brick 10.70.46.69:/rhs/brick1/brick0/nfsvol
1_brick7                                    49175     0          Y       18264
Brick 10.70.46.59:/rhs/brick1/brick2/nfsvol
1_brick8                                    49200     0          Y       13713
Brick 10.70.46.64:/rhs/brick1/brick2/nfsvol
1_brick9                                    49200     0          Y       18169
Brick 10.70.46.63:/rhs/brick1/brick2/nfsvol
1_brick10                                   49200     0          Y       24959
Brick 10.70.46.60:/rhs/brick1/brick1/nfsvol
1_brick11                                   49199     0          Y       32075
NFS Server on localhost                     2049      0          Y       14517
Self-heal Daemon on localhost               N/A       N/A        Y       14266
NFS Server on 10.70.46.64                   N/A       N/A        N       N/A  
Self-heal Daemon on 10.70.46.64             N/A       N/A        Y       18666
NFS Server on 10.70.46.68                   N/A       N/A        N       N/A  
Self-heal Daemon on 10.70.46.68             N/A       N/A        Y       18476
NFS Server on 10.70.46.65                   N/A       N/A        N       N/A  
Self-heal Daemon on 10.70.46.65             N/A       N/A        Y       15524
NFS Server on 10.70.46.62                   N/A       N/A        N       N/A  
Self-heal Daemon on 10.70.46.62             N/A       N/A        Y       27717
NFS Server on 10.70.46.63                   N/A       N/A        N       N/A  
Self-heal Daemon on 10.70.46.63             N/A       N/A        Y       25384
NFS Server on 10.70.46.69                   N/A       N/A        N       N/A  
Self-heal Daemon on 10.70.46.69             N/A       N/A        Y       18675
NFS Server on 10.70.46.60                   N/A       N/A        N       N/A  
Self-heal Daemon on 10.70.46.60             N/A       N/A        Y       32501
 
Task Status of Volume nfsvol1
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: nfsvol2
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.46.59:/rhs/brick1/brick3/nfsvol
2_brick0                                    49201     0          Y       14177
Brick 10.70.46.64:/rhs/brick1/brick3/nfsvol
2_brick1                                    49201     0          Y       18546
Brick 10.70.46.63:/rhs/brick1/brick3/nfsvol
2_brick2                                    49201     0          Y       25323
Brick 10.70.46.60:/rhs/brick1/brick2/nfsvol
2_brick3                                    49200     0          Y       32444
Brick 10.70.46.62:/rhs/brick1/brick1/nfsvol
2_brick4                                    49176     0          Y       27672
Brick 10.70.46.65:/rhs/brick1/brick1/nfsvol
2_brick5                                    49176     0          Y       15485
Brick 10.70.46.68:/rhs/brick1/brick1/nfsvol
2_brick6                                    49176     0          Y       18437
Brick 10.70.46.69:/rhs/brick1/brick1/nfsvol
2_brick7                                    49176     0          Y       18635
Brick 10.70.46.59:/rhs/brick1/brick4/nfsvol
2_brick8                                    49202     0          Y       14195
Brick 10.70.46.64:/rhs/brick1/brick4/nfsvol
2_brick9                                    49202     0          Y       18564
Brick 10.70.46.63:/rhs/brick1/brick4/nfsvol
2_brick10                                   49202     0          Y       25341
Brick 10.70.46.60:/rhs/brick1/brick3/nfsvol
2_brick11                                   49201     0          Y       32470
NFS Server on localhost                     2049      0          Y       14517
Self-heal Daemon on localhost               N/A       N/A        Y       14266
NFS Server on 10.70.46.65                   N/A       N/A        N       N/A  
Self-heal Daemon on 10.70.46.65             N/A       N/A        Y       15524
NFS Server on 10.70.46.69                   N/A       N/A        N       N/A  
Self-heal Daemon on 10.70.46.69             N/A       N/A        Y       18675
NFS Server on 10.70.46.60                   N/A       N/A        N       N/A  
Self-heal Daemon on 10.70.46.60             N/A       N/A        Y       32501
NFS Server on 10.70.46.63                   N/A       N/A        N       N/A  
Self-heal Daemon on 10.70.46.63             N/A       N/A        Y       25384
NFS Server on 10.70.46.64                   N/A       N/A        N       N/A  
Self-heal Daemon on 10.70.46.64             N/A       N/A        Y       18666
NFS Server on 10.70.46.62                   N/A       N/A        N       N/A  
Self-heal Daemon on 10.70.46.62             N/A       N/A        Y       27717
NFS Server on 10.70.46.68                   N/A       N/A        N       N/A  
Self-heal Daemon on 10.70.46.68             N/A       N/A        Y       18476
 
Task Status of Volume nfsvol2
------------------------------------------------------------------------------
There are no active volume tasks


[root@vm1 distaf]# for i in `seq 1 8`; do echo vm$i; ssh vm$i showmount -e localhost ; echo "-----------------"; done
vm1
Export list for localhost:
/nfsvol1 *
/nfsvol2 *
-----------------
vm2
Export list for localhost:
/nfsvol1 (everyone)
/nfsvol2 (everyone)
-----------------
vm3
Export list for localhost:
/nfsvol1 (everyone)
/nfsvol2 (everyone)
-----------------
vm4
Export list for localhost:
/nfsvol1 (everyone)
/nfsvol2 (everyone)
-----------------
vm5
Export list for localhost:
/nfsvol1 (everyone)
/nfsvol2 (everyone)
-----------------
vm6
Export list for localhost:
/nfsvol1 (everyone)
/nfsvol2 (everyone)
-----------------
vm7
Export list for localhost:
/nfsvol1 (everyone)
/nfsvol2 (everyone)
-----------------
vm8
Export list for localhost:
/nfsvol1 (everyone)
/nfsvol2 (everyone)

[root@vm1 distaf]# ps -aux | grep nfs
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
root      1433  0.0  0.0 105452   896 pts/1    S+   00:46   0:00 less /var/log/glusterfs/nfs.log
root      5606  0.3 19.9 5480848 1637560 ?     Ssl  Jul06   5:43 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid
root      5889  0.0  0.0 103308   828 pts/0    S+   00:57   0:00 grep nfs
root     13694  0.0  0.4 1326056 38456 ?       Ssl  00:01   0:00 /usr/sbin/glusterfsd -s 10.70.46.59 --volfile-id nfsvol1.10.70.46.59.rhs-brick1-brick1-nfsvol1_brick0 -p /var/lib/glusterd/vols/nfsvol1/run/10.70.46.59-rhs-brick1-brick1-nfsvol1_brick0.pid -S /var/run/gluster/61980065e9cb731bc330384b1a13bc6f.socket --brick-name /rhs/brick1/brick1/nfsvol1_brick0 -l /var/log/glusterfs/bricks/rhs-brick1-brick1-nfsvol1_brick0.log --xlator-option *-posix.glusterd-uuid=ea1dc0ac-28f0-4349-95b6-3c64cc0e39c9 --brick-port 49199 --xlator-option nfsvol1-server.listen-port=49199
root     13713  0.0  0.5 1274832 45328 ?       Ssl  00:01   0:00 /usr/sbin/glusterfsd -s 10.70.46.59 --volfile-id nfsvol1.10.70.46.59.rhs-brick1-brick2-nfsvol1_brick8 -p /var/lib/glusterd/vols/nfsvol1/run/10.70.46.59-rhs-brick1-brick2-nfsvol1_brick8.pid -S /var/run/gluster/b6804ae6272a2e3c205e46212130c4a4.socket --brick-name /rhs/brick1/brick2/nfsvol1_brick8 -l /var/log/glusterfs/bricks/rhs-brick1-brick2-nfsvol1_brick8.log --xlator-option *-posix.glusterd-uuid=ea1dc0ac-28f0-4349-95b6-3c64cc0e39c9 --brick-port 49200 --xlator-option nfsvol1-server.listen-port=49200
root     14177  0.0  0.4 1070044 38060 ?       Ssl  00:02   0:00 /usr/sbin/glusterfsd -s 10.70.46.59 --volfile-id nfsvol2.10.70.46.59.rhs-brick1-brick3-nfsvol2_brick0 -p /var/lib/glusterd/vols/nfsvol2/run/10.70.46.59-rhs-brick1-brick3-nfsvol2_brick0.pid -S /var/run/gluster/a3ce24cefc8c02f3ebc20397f5204a24.socket --brick-name /rhs/brick1/brick3/nfsvol2_brick0 -l /var/log/glusterfs/bricks/rhs-brick1-brick3-nfsvol2_brick0.log --xlator-option *-posix.glusterd-uuid=ea1dc0ac-28f0-4349-95b6-3c64cc0e39c9 --brick-port 49201 --xlator-option nfsvol2-server.listen-port=49201
root     14195  0.0  0.4 993236 38704 ?        Ssl  00:02   0:00 /usr/sbin/glusterfsd -s 10.70.46.59 --volfile-id nfsvol2.10.70.46.59.rhs-brick1-brick4-nfsvol2_brick8 -p /var/lib/glusterd/vols/nfsvol2/run/10.70.46.59-rhs-brick1-brick4-nfsvol2_brick8.pid -S /var/run/gluster/8d8cec41b11d92c673f70b8dc174b4da.socket --brick-name /rhs/brick1/brick4/nfsvol2_brick8 -l /var/log/glusterfs/bricks/rhs-brick1-brick4-nfsvol2_brick8.log --xlator-option *-posix.glusterd-uuid=ea1dc0ac-28f0-4349-95b6-3c64cc0e39c9 --brick-port 49202 --xlator-option nfsvol2-server.listen-port=49202
root     14517  0.0  2.0 809320 171200 ?       Ssl  00:02   0:01 /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/gluster/1f8cbbf6b4f5acf88e0a42bccc1ed867.socket

Comment 2 Meghana 2015-07-07 13:50:10 UTC

Hi Apeksha,

I just looked at your machines.

You have an old instance of NFS-Ganesha running on the first machine. This process was started on the 6th of July and is no longer responsive. Ideally, you should have torn down the cluster and this should have all NFS-Ganesha services
stopped.

This old instance, is still showing up in service nfs-ganesha status.
When you run the command, gluster nfs-ganesha enable, we internally execute
"service nfs-ganesha start" and since an older instance shows up in, it says success. The real case is, NFS-Ganesha technically does not exist.

I have raised a bug for this in upstream,
https://bugzilla.redhat.com/show_bug.cgi?id=1119601

If you see the glusterd logs, you can clearly see that ganesha.enable on does stop the gluster NFS service,
[2015-07-07 18:32:08.206306] I [MSGID: 106540] [glusterd-utils.c:4153:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered NFSV3 successfully
[2015-07-07 18:32:08.206495] I [MSGID: 106540] [glusterd-utils.c:4162:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered NLM v4 successfully
[2015-07-07 18:32:08.206676] I [MSGID: 106540] [glusterd-utils.c:4171:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered NLM v1 successfully
[2015-07-07 18:32:08.206869] I [MSGID: 106540] [glusterd-utils.c:4180:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered ACL v3 successfully

Since NFS-Ganesha is an old instance, the next time you start a volume, gluster NFS starts on that machine. On all the other nodes, NFS-Ganesha is a new and "working" instance and GLuster-NFS does not and cannot come up on those nodes.

I had worked on the bug listed above and it is too intermittent to reproduce.
If you can reproduce this bug consistently, you can propose this as a blocker.

Comment 3 Apeksha 2015-07-08 06:05:49 UTC

Reproduced the issue again:

1. Create a 6X2 volume say testvol, perform some  root-squash tests
Export list for localhost:
/testvol (everyone)

2. Stop glusterd on server1 and start it again
 Stopping glusterd:[  OK  ]
 Starting glusterd:[  OK  ]

3. Now delete volume testvol

4. When we create a new volume say nfsvol1, enable ganesha on it, it gets exported as nfs volume
 Export list for localhost:
/nfsvol1 *

Comment 4 Meghana 2015-07-08 10:34:19 UTC

I followed the same steps as recorded and it didn't get reproduced. Neither did it get reproduced on my set up and the QE set up. I am not sure how to reproduce it. Please update if you hit it again and attach all the logs

Comment 5 Meghana 2015-07-08 10:40:11 UTC

You had also executed refresh-config before you executed these tests.

Comment 6 Meghana 2015-08-18 06:30:50 UTC

This bug is fixed as part of another fix,
https://bugzilla.redhat.com/show_bug.cgi?id=1242749

Comment 8 Apeksha 2015-08-27 11:15:01 UTC

ran the automated rootsquash testcases on glusterfs-3.7.1-12.el7rhgs.x86_64, did not hit this issue

Comment 11 errata-xmlrpc 2015-10-05 07:18:37 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1845.html