Description of problem: With the new implemenation of nfs-ganesha+multihead, we are working on providing the HA capabilities. Now when I am trying to do the HA setup as per the information, the HA setup fails. it fails through command gluster features.ganesha enable as well as while executing the script ganesha-ha.sh Version-Release number of selected component (if applicable): glusterfs-3.7dev-0.852.git3feaf16.el6.x86_64 nfs-ganesha-2.2-0.rc5.el6.x86_64 How reproducible: always Steps to Reproduce: 1. try to do the setup using the cli gluster features.ganesha enable result --> it said ganesha.enable success, but actually there was not nfs running, eventually found that the pcs cluster itself failed 2. then tried to exeucte the script manually, i.e. time bash /usr/libexec/ganesha/ganesha-ha.sh --setup /etc/ganesha/ result -> it still failed, Actual results: pcs status post script execution, pcs status Cluster name: ganesha-ha-360 Last updated: Tue Mar 31 16:14:55 2015 Last change: Tue Mar 31 15:27:43 2015 Stack: cman Current DC: nfs1 - partition with quorum Version: 1.1.11-97629de 2 Nodes configured 8 Resources configured Online: [ nfs1 nfs2 ] Full list of resources: Clone Set: nfs-mon-clone [nfs-mon] Started: [ nfs1 nfs2 ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ nfs1 nfs2 ] nfs1-cluster_ip-1 (ocf::heartbeat:IPaddr): FAILED (unmanaged) [ nfs2 nfs1 ] nfs1-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs1 nfs2-cluster_ip-1 (ocf::heartbeat:IPaddr): FAILED (unmanaged) [ nfs2 nfs1 ] nfs2-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs2 Failed actions: nfs1-cluster_ip-1_stop_0 on nfs2 'not configured' (6): call=39, status=complete, last-rc-change='Tue Mar 31 15:27:43 2015', queued=0ms, exec=31ms nfs1-cluster_ip-1_stop_0 on nfs2 'not configured' (6): call=39, status=complete, last-rc-change='Tue Mar 31 15:27:43 2015', queued=0ms, exec=31ms nfs2-cluster_ip-1_stop_0 on nfs2 'not configured' (6): call=40, status=complete, last-rc-change='Tue Mar 31 15:27:43 2015', queued=0ms, exec=27ms nfs2-cluster_ip-1_stop_0 on nfs2 'not configured' (6): call=40, status=complete, last-rc-change='Tue Mar 31 15:27:43 2015', queued=0ms, exec=27ms nfs1-cluster_ip-1_stop_0 on nfs1 'not configured' (6): call=39, status=complete, last-rc-change='Tue Mar 31 15:27:43 2015', queued=0ms, exec=26ms nfs1-cluster_ip-1_stop_0 on nfs1 'not configured' (6): call=39, status=complete, last-rc-change='Tue Mar 31 15:27:43 2015', queued=0ms, exec=26ms nfs2-cluster_ip-1_stop_0 on nfs1 'not configured' (6): call=41, status=complete, last-rc-change='Tue Mar 31 15:27:43 2015', queued=0ms, exec=36ms nfs2-cluster_ip-1_stop_0 on nfs1 'not configured' (6): call=41, status=complete, last-rc-change='Tue Mar 31 15:27:43 2015', queued=0ms, exec=36ms Expected results: pcs cluster setup should work properly, as nfs-ganesha enable will be success is dependent on the step. Additional info: [root@nfs1 ~]# gluster volume info Volume Name: share-vol0 Type: Distributed-Replicate Volume ID: 6e498f7d-f2b6-476a-b1ca-6ca384752245 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.70.37.157:/rhs/brick1/d1r11-share Brick2: 10.70.37.89:/rhs/brick1/d1r21-share Brick3: 10.70.37.157:/rhs/brick1/d3r11-share Brick4: 10.70.37.89:/rhs/brick1/d3r21-share Options Reconfigured: nfs.disable: on features.ganesha: volume0.status Volume Name: vol0 Type: Distributed-Replicate Volume ID: b971fa50-a713-45cb-a14e-8833b6091521 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.70.37.157:/rhs/brick1/d1r11 Brick2: 10.70.37.89:/rhs/brick1/d1r21 Brick3: 10.70.37.157:/rhs/brick1/d3r11 Brick4: 10.70.37.89:/rhs/brick1/d3r21 Options Reconfigured: nfs.disable: on features.ganesha: volume0.status [root@nfs1 ~]# gluster volume status Status of volume: share-vol0 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.37.157:/rhs/brick1/d1r11-share 49154 0 Y 19656 Brick 10.70.37.89:/rhs/brick1/d1r21-share 49154 0 Y 1580 Brick 10.70.37.157:/rhs/brick1/d3r11-share 49155 0 Y 19673 Brick 10.70.37.89:/rhs/brick1/d3r21-share 49155 0 Y 1581 Self-heal Daemon on localhost N/A N/A Y 19699 Self-heal Daemon on 10.70.37.89 N/A N/A Y 1564 Task Status of Volume share-vol0 ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: vol0 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.37.157:/rhs/brick1/d1r11 49152 0 Y 19560 Brick 10.70.37.89:/rhs/brick1/d1r21 49152 0 Y 1588 Brick 10.70.37.157:/rhs/brick1/d3r11 49153 0 Y 19577 Brick 10.70.37.89:/rhs/brick1/d3r21 49153 0 Y 1587 Self-heal Daemon on localhost N/A N/A Y 19699 Self-heal Daemon on 10.70.37.89 N/A N/A Y 1564 Task Status of Volume vol0 ------------------------------------------------------------------------------ There are no active volume tasks [root@nfs1 ~]#
Hi Saurabh, Can you please attach the last 100 lines or so of /var/log/messages. Thanks
Created attachment 1009467 [details] node1 /var/log/messages The messages file is attached along with this mail.
It appears that the initial cluster auth was omitted or ?? Not sure I manually did a `pcs cluster auth nfs1 nfs2`. When prompted for username/password I entered hacluster/redhat Then `pcs cluster setup --name ganesha-ha-360 nfs1 nfs2` was successful. The `pcs cluster auth $host ...` is in the script. You should have been prompted for the username and password. Let me know if this is not the case. Thanks,
and btw `pcs cluster auth ...` is idempotent. IOW it won't impact subsequent runs.
I was prompted for the username and password and I used the same credentials.
fixed VIP_$hostname settings in /etc/ganesha/ganesha-ha.conf
This might not be a bug in ganesha-ha.sh. I'll check in the CLI framework and update the bug soon.
I think this was fixed by correctly specifying the VIP_$server= settings in /etc/ganesha-ha.conf
1. Kaleb has tested the script on the machines listed in the bug description. It works. 2. The logs attached to the bug report come from two machines nfs3, nfs4 but the machines listed in the bug description are nfs1,nfs2. 3. I logged into nfs3 and nfs4 machines. The machines don't have the required rpms installed. The ganesha-ha.sh script doesn't exist on the machines. That's why the option fails. Please install glusterfs-ganesha-3.7dev-0.852.git3feaf16.el6.x86_64 rpms and test it again.
What do you mean by HA didn't work? You mean the cluster wasn't set up or you tried fail-over and that didn't work for you? The option succeeds only when the cluster is set up (ideally). You have to export a volume for showmount to show a volume. Please follow the all the steps, in the same order,while testing. gluster vol set <volname> ganesha.enable on This should export a volume via NFS-Ganesha.
586 pcs cluster auth nfs3 587 passwd hacluster 588 pcs cluster auth nfs3 589 sh ganesha-ha.sh --setup /etc/ganesha 590 ssh nfs4 591 sh ganesha-ha.sh --setup /etc/ganesha 592 pcs status 593 gluster features.ganesha disable 594 pcs status 595 sh ganesha-ha.sh --teardown 596 sh ganesha-ha.sh --teardown /etc/ganesha 597 pcs status [root@nfs3 ganesha]# pcs status Cluster name: ganesha-ha-1 Last updated: Fri Apr 10 10:47:45 2015 Last change: Fri Apr 10 00:00:21 2015 Stack: cman Current DC: nfs4 - partition with quorum Version: 1.1.11-97629de 2 Nodes configured 8 Resources configured Online: [ nfs3 nfs4 ] Full list of resources: Clone Set: nfs-mon-clone [nfs-mon] Started: [ nfs3 nfs4 ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ nfs3 nfs4 ] nfs3-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs3 nfs3-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs3 nfs4-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs4 nfs4-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs4 [root@nfs3 ganesha]# HA-script is working. "pcs cluster auth" command was either missed or different passwords were set in the machines of the cluster. Once we fixed that HA script setup and teardown worked as expected. There seems to be an issue with CLI + HA integration which we are looking into at the moment.
The HA_VOL_SERVER in the ganesha-ha.conf should be a part of the trusted pool. In the QE set up, the HA_VOL_SERVER was a node that wasn't part of the trusted pool. I changed that and executed the command, and it worked as expected. [root@nfs3 ganesha]# pcs status Error: cluster is not currently running on this node [root@nfs3 ganesha]# [root@nfs3 ganesha]# gluster features.ganesha enable Enabling NFS-Ganesha requires Gluster-NFS to bedisabled across the trusted pool. Do you still want to continue? (y/n) y ganesha enable : success [root@nfs3 ganesha]# pcs status Cluster name: ganesha-ha-1 Last updated: Fri Apr 10 11:16:17 2015 Last change: Fri Apr 10 11:15:42 2015 Stack: cman Current DC: nfs3 - partition with quorum Version: 1.1.11-97629de 2 Nodes configured 10 Resources configured Online: [ nfs3 nfs4 ] Full list of resources: Clone Set: nfs_start-clone [nfs_start] Stopped: [ nfs3 nfs4 ] Clone Set: nfs-mon-clone [nfs-mon] Started: [ nfs3 nfs4 ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ nfs3 nfs4 ] nfs3-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs3 nfs3-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs3 nfs4-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs4 nfs4-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs4 [root@nfs3 ganesha]# ps aux | grep ganesha root 2848 0.0 0.1 1561892 10468 ? Ssl 11:14 0:00 /usr/bin/ganesha.nfsd -d -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid root 4405 0.0 0.0 103256 820 pts/3 S+ 11:16 0:00 grep ganesha [root@nfs3 ganesha]# [root@nfs3 ganesha]# showmount -e localhost Export list for localhost: / (everyone) [root@nfs3 ganesha]# On another node : [root@nfs4 ~]# ps aux | grep ganesha root 21430 0.0 0.1 1496356 12440 ? Ssl 00:28 0:00 /usr/bin/ganesha.nfsd -d -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid root 22350 0.0 0.0 103252 820 pts/1 S+ 00:29 0:00 grep ganesha [root@nfs4 ~]# showmount -e localhost Export list for localhost: / (everyone) [root@nfs4 ~]# [root@nfs4 ~]# pcs status Cluster name: ganesha-ha-1 Last updated: Fri Apr 10 00:30:16 2015 Last change: Fri Apr 10 11:15:42 2015 Stack: cman Current DC: nfs3 - partition with quorum Version: 1.1.11-97629de 2 Nodes configured 10 Resources configured Online: [ nfs3 nfs4 ] Full list of resources: Clone Set: nfs_start-clone [nfs_start] Stopped: [ nfs3 nfs4 ] Clone Set: nfs-mon-clone [nfs-mon] Started: [ nfs3 nfs4 ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ nfs3 nfs4 ] nfs3-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs3 nfs3-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs3 nfs4-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs4 nfs4-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs4 [root@nfs4 ~]#
Although, there is a bug in ganesha features.ganesha disable. I will send a patch quickly.
So after updating the trusted pool thing, I able to see that the cluster is up, gluster features.ganesha enable Enabling NFS-Ganesha requires Gluster-NFS to bedisabled across the trusted pool. Do you still want to continue? (y/n) y ganesha enable : success [root@nfs3 ~]# pcs status Cluster name: ganesha-ha-1 Last updated: Fri Apr 10 12:37:14 2015 Last change: Fri Apr 10 12:37:09 2015 Stack: cman Current DC: nfs3 - partition with quorum Version: 1.1.11-97629de 2 Nodes configured 8 Resources configured Online: [ nfs3 nfs4 ] Full list of resources: Clone Set: nfs-mon-clone [nfs-mon] Started: [ nfs3 nfs4 ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ nfs3 nfs4 ] nfs3-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs3 nfs3-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs3 nfs4-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs3 nfs4-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs3 [root@nfs3 ~]# pcs config Cluster Name: ganesha-ha-1 Corosync Nodes: nfs3 nfs4 Pacemaker Nodes: nfs3 nfs4 Resources: Clone: nfs-mon-clone Resource: nfs-mon (class=ocf provider=heartbeat type=ganesha_mon) Operations: start interval=0s timeout=40s (nfs-mon-start-timeout-40s) stop interval=0s timeout=40s (nfs-mon-stop-timeout-40s) monitor interval=10s timeout=10s (nfs-mon-monitor-interval-10s) Clone: nfs-grace-clone Resource: nfs-grace (class=ocf provider=heartbeat type=ganesha_grace) Operations: start interval=0s timeout=40s (nfs-grace-start-timeout-40s) stop interval=0s timeout=40s (nfs-grace-stop-timeout-40s) monitor interval=5s timeout=20s (nfs-grace-monitor-interval-5s) Resource: nfs3-cluster_ip-1 (class=ocf provider=heartbeat type=IPaddr) Attributes: ip=10.70.36.217 cidr_netmask=32 Operations: start interval=0s timeout=20s (nfs3-cluster_ip-1-start-timeout-20s) stop interval=0s timeout=20s (nfs3-cluster_ip-1-stop-timeout-20s) monitor interval=15s (nfs3-cluster_ip-1-monitor-interval-15s) Resource: nfs3-trigger_ip-1 (class=ocf provider=heartbeat type=Dummy) Operations: start interval=0s timeout=20 (nfs3-trigger_ip-1-start-timeout-20) stop interval=0s timeout=20 (nfs3-trigger_ip-1-stop-timeout-20) monitor interval=10 timeout=20 (nfs3-trigger_ip-1-monitor-interval-10) Resource: nfs4-cluster_ip-1 (class=ocf provider=heartbeat type=IPaddr) Attributes: ip=10.70.36.218 cidr_netmask=32 Operations: start interval=0s timeout=20s (nfs4-cluster_ip-1-start-timeout-20s) stop interval=0s timeout=20s (nfs4-cluster_ip-1-stop-timeout-20s) monitor interval=15s (nfs4-cluster_ip-1-monitor-interval-15s) Resource: nfs4-trigger_ip-1 (class=ocf provider=heartbeat type=Dummy) Operations: start interval=0s timeout=20 (nfs4-trigger_ip-1-start-timeout-20) stop interval=0s timeout=20 (nfs4-trigger_ip-1-stop-timeout-20) monitor interval=10 timeout=20 (nfs4-trigger_ip-1-monitor-interval-10) Stonith Devices: Fencing Levels: Location Constraints: Resource: nfs3-cluster_ip-1 Enabled on: nfs4 (score:1000) (id:location-nfs3-cluster_ip-1-nfs4-1000) Enabled on: nfs3 (score:2000) (id:location-nfs3-cluster_ip-1-nfs3-2000) Constraint: location-nfs3-cluster_ip-1 Rule: score=-INFINITY (id:location-nfs3-cluster_ip-1-rule) Expression: ganesha-active ne 1 (id:location-nfs3-cluster_ip-1-rule-expr-1) Resource: nfs4-cluster_ip-1 Enabled on: nfs3 (score:1000) (id:location-nfs4-cluster_ip-1-nfs3-1000) Enabled on: nfs4 (score:2000) (id:location-nfs4-cluster_ip-1-nfs4-2000) Constraint: location-nfs4-cluster_ip-1 Rule: score=-INFINITY (id:location-nfs4-cluster_ip-1-rule) Expression: ganesha-active ne 1 (id:location-nfs4-cluster_ip-1-rule-expr-1) Ordering Constraints: start nfs3-trigger_ip-1 then start nfs-grace-clone (kind:Mandatory) (id:order-nfs3-trigger_ip-1-nfs-grace-clone-mandatory) start nfs-grace-clone then start nfs3-cluster_ip-1 (kind:Mandatory) (id:order-nfs-grace-clone-nfs3-cluster_ip-1-mandatory) start nfs4-trigger_ip-1 then start nfs-grace-clone (kind:Mandatory) (id:order-nfs4-trigger_ip-1-nfs-grace-clone-mandatory) start nfs-grace-clone then start nfs4-cluster_ip-1 (kind:Mandatory) (id:order-nfs-grace-clone-nfs4-cluster_ip-1-mandatory) Colocation Constraints: nfs3-cluster_ip-1 with nfs3-trigger_ip-1 (score:INFINITY) (id:colocation-nfs3-cluster_ip-1-nfs3-trigger_ip-1-INFINITY) nfs4-cluster_ip-1 with nfs4-trigger_ip-1 (score:INFINITY) (id:colocation-nfs4-cluster_ip-1-nfs4-trigger_ip-1-INFINITY) Cluster Properties: cluster-infrastructure: cman dc-version: 1.1.11-97629de no-quorum-policy: ignore stonith-enabled: false
REVIEW: http://review.gluster.org/10199 (NFS-Ganesha : Fixing HA script invocation and others) posted (#1) for review on master by Meghana M (mmadhusu)
REVIEW: http://review.gluster.org/10199 (NFS-Ganesha : Fixing HA script invocation and others) posted (#2) for review on master by Meghana M (mmadhusu)
COMMIT: http://review.gluster.org/10199 committed in master by Kaleb KEITHLEY (kkeithle) ------ commit dbd9bd7b2d806163f9bb069ec04e24d9269f769c Author: Meghana Madhusudhan <mmadhusu> Date: Fri Apr 10 19:14:42 2015 +0530 NFS-Ganesha : Fixing HA script invocation and others gluster features.ganesha disable failed invariably. And also, there were problems in unexporting volumes dynamically.Fixed the above problems. Change-Id: I29aa289dc8dc7b39fe0fd9d3098a02097ca8ca0c BUG: 1207629 Signed-off-by: Meghana Madhusudhan <mmadhusu> Reviewed-on: http://review.gluster.org/10199 Reviewed-by: jiffin tony Thottan <jthottan> Reviewed-by: Kaleb KEITHLEY <kkeithle> Tested-by: NetBSD Build System
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report. glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user