Bug 1213304
Summary: | nfs-ganesha: using features.enable command the nfs-ganesha process does come up on all four nodes | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Saurabh <saujain> | ||||||||||
Component: | ganesha-nfs | Assignee: | Kaleb KEITHLEY <kkeithle> | ||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||||||||
Severity: | high | Docs Contact: | |||||||||||
Priority: | unspecified | ||||||||||||
Version: | 3.7.0 | CC: | annair, bugs, kkeithle, mzywusko, ndevos, sankarshan, skoduri, smohan | ||||||||||
Target Milestone: | --- | Keywords: | Triaged | ||||||||||
Target Release: | --- | ||||||||||||
Hardware: | x86_64 | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2016-04-08 11:10:04 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 1186580 | ||||||||||||
Attachments: |
|
Description
Saurabh
2015-04-20 10:01:17 UTC
Created attachment 1016314 [details]
sosreport of node1
Created attachment 1016315 [details]
sosreport of node2
Created attachment 1016316 [details]
sosreport of node3
Created attachment 1016320 [details]
sosreport of node4
Putting the response as per the conversation we had yesterday, "1. In general, there is the issue with rpcbind which we have seen yesterday. For that there is a 7.1 BZ which unfortunately has not been opened for 6.6/6.x. I am not sure if it is too late for that but we need to see. I sent a quick note to Sayan to see what he has to say. Will need to wait for his inputs. But in general there is the ugly workaround of removing the state associated with rpcbind (especially as it seems to be started with rpcbind -w) : - on RHEL 7.x we need to remove the rpcbind.socket file which contains the info - on RHEL 6.6 after digging around on Saurabh's machine, I believe the file to delete is /var/cache/rpcbind/* (there are 2 .xdr files in there) If nothing, this should serve as a possible workaround for us (ugly as it is). 2. ganesha repeatedly fails to start on nfs1 and I think now that the reason is that there is someone listening on port 2049 still. I think the issue happened after the reboots yesterday when someone was listening on port 2049 preventing ganesha from binding to tcp 2049. - So first I removed the existing /var/run/ganesha pid. - next clear rpcbind state by rm -rf /var/cache/rpcbind/* What I see with the useful -tulpn option is: [root@nfs1 ~]# netstat -tulpn Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 0.0.0.0:853 0.0.0.0:* LISTEN 1518/glusterfs tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1918/sshd tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 2027/master tcp 0 0 0.0.0.0:59101 0.0.0.0:* LISTEN 1596/rpc.statd tcp 0 0 0.0.0.0:2049 0.0.0.0:* LISTEN 1518/glusterfs tcp 0 0 0.0.0.0:38465 0.0.0.0:* LISTEN 1518/glusterfs tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN 1940/nrpe 3. I then did the "nfs.disable on" on both the volumes share-vol0 and vol0. 4. [root@nfs1 ~]# netstat -an | grep 2049 [root@nfs1 ~]# 5. ganesha then starts successfully: [root@nfs1 ~]# service nfs-ganesha start Starting ganesha.nfsd: [ OK ] [root@nfs1 ~]# [root@nfs1 ~]# ps auxw | grep ganesha root 3478 0.5 0.1 1496316 8532 ? Ssl 00:05 0:00 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid root 3521 0.0 0.0 103252 820 pts/1 S+ 00:05 0:00 grep ganesha Anand" so I did the things as per comment 5 and executed the cli "gluster features.ganesha enable" post cleanup. The nfs-ganesha process came up on all nodes but the pcs cluster still didn't come up. logs, [root@nfs1 ~]# gluster features.ganesha enable Enabling NFS-Ganesha requires Gluster-NFS to bedisabled across the trusted pool. Do you still want to continue? (y/n) y ganesha enable : success [root@nfs1 ~]# pcs status Error: cluster is not currently running on this node Same is the status on all nodes Soumya and I logged into the machines and found a few issues. There has been a change in the pre-requisites by the introduction of common meta-volume. The user has to create a volume called "gluster_shared_storage" and mount it on /var/run/gluster/shared_storage. It has to be mounted before running the command/script. Also, pcsd hasn't started on all the machines. pcsd has to be started on all the nodes. Also, the HA_CONFIG was wrongly populated. HA_VOL_SERVER="IP of the server" The name of the shared volume was given instead. We have also found a bug in the ganesha-ha.sh script. Minor fixes, will fix them now. (In reply to Meghana from comment #7) > Soumya and I logged into the machines and found a few issues. > > There has been a change in the pre-requisites by the introduction of common > meta-volume. The user has to create a volume called "gluster_shared_storage" > and mount it on /var/run/gluster/shared_storage. It has to be mounted before > running the command/script. > > Also, pcsd hasn't started on all the machines. > pcsd has to be started on all the nodes. > > Also, the HA_CONFIG was wrongly populated. HA_VOL_SERVER="IP of the server" > The name of the shared volume was given instead. Alright, I was not updated about this change. REVIEW: http://review.gluster.org/10336 (NFS-Ganesha: Shared volume need not be mounted via script) posted (#1) for review on master by Meghana M (mmadhusu) works in 3.7 |