Description of problem: Add node of nfs-ganesha not working on rhel7.1 Version-Release number of selected component (if applicable): glusterfs-3.7.1-13.el7rhgs.x86_64 nfs-ganesha-2.2.0-6.el7rhgs.x86_64 How reproducible: Always Steps to Reproduce: 1. Setup a 2 node cluster 2. Now peer probe a new node and follow all the pre requisites 3. Now the add node script fails Actual results: Add node script fails Expected results: Add node script must be successfull Additional info:
After changing the permissions of secret.pem file, [root@dhcp37-137 ~]# /usr/libexec/ganesha/ganesha-ha.sh --add /etc/ganesha dhcp37-100.lab.eng.blr.redhat.com 10.70.36.219 Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). tmp.8gfaQYPwGm 100% 0 0.0KB/s 00:00 Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). /tmp/tmp.rvg8s6iYLy/exports: No such file or directory Unknown operation 'nfs-ganesha'. dhcp37-137.lab.eng.blr.redhat.com: Corosync updated dhcp37-56.lab.eng.blr.redhat.com: Corosync updated dhcp37-100.lab.eng.blr.redhat.com: Succeeded Adding nfs_start-dhcp37-100.lab.eng.blr.redhat.com nfs-mon-clone (kind: Mandatory) (Options: first-action=start then-action=start) CIB updated dhcp37-100.lab.eng.blr.redhat.com: Starting Cluster... Removing Constraint - location-nfs_start-dhcp37-100.lab.eng.blr.redhat.com-dhcp37-100.lab.eng.blr.redhat.com-INFINITY Removing Constraint - order-nfs_start-dhcp37-100.lab.eng.blr.redhat.com-nfs-mon-clone-mandatory Deleting Resource - nfs_start-dhcp37-100.lab.eng.blr.redhat.com Removing Constraint - colocation-dhcp37-137.lab.eng.blr.redhat.com-cluster_ip-1-dhcp37-137.lab.eng.blr.redhat.com-trigger_ip-1-INFINITY Removing Constraint - location-dhcp37-137.lab.eng.blr.redhat.com-cluster_ip-1 Removing Constraint - location-dhcp37-137.lab.eng.blr.redhat.com-cluster_ip-1-dhcp37-56.lab.eng.blr.redhat.com-1000 Removing Constraint - location-dhcp37-137.lab.eng.blr.redhat.com-cluster_ip-1-dhcp37-137.lab.eng.blr.redhat.com-2000 Removing Constraint - order-nfs-grace-clone-dhcp37-137.lab.eng.blr.redhat.com-cluster_ip-1-mandatory Deleting Resource - dhcp37-137.lab.eng.blr.redhat.com-cluster_ip-1 Removing Constraint - order-dhcp37-137.lab.eng.blr.redhat.com-trigger_ip-1-nfs-grace-clone-mandatory Deleting Resource - dhcp37-137.lab.eng.blr.redhat.com-trigger_ip-1 Removing Constraint - colocation-dhcp37-56.lab.eng.blr.redhat.com-cluster_ip-1-dhcp37-56.lab.eng.blr.redhat.com-trigger_ip-1-INFINITY Removing Constraint - location-dhcp37-56.lab.eng.blr.redhat.com-cluster_ip-1 Removing Constraint - location-dhcp37-56.lab.eng.blr.redhat.com-cluster_ip-1-dhcp37-137.lab.eng.blr.redhat.com-1000 Removing Constraint - location-dhcp37-56.lab.eng.blr.redhat.com-cluster_ip-1-dhcp37-56.lab.eng.blr.redhat.com-2000 Removing Constraint - order-nfs-grace-clone-dhcp37-56.lab.eng.blr.redhat.com-cluster_ip-1-mandatory Deleting Resource - dhcp37-56.lab.eng.blr.redhat.com-cluster_ip-1 Removing Constraint - order-dhcp37-56.lab.eng.blr.redhat.com-trigger_ip-1-nfs-grace-clone-mandatory Deleting Resource - dhcp37-56.lab.eng.blr.redhat.com-trigger_ip-1 Adding dhcp37-137.lab.eng.blr.redhat.com-trigger_ip-1 nfs-grace-clone (kind: Mandatory) (Options: first-action=start then-action=start) Adding nfs-grace-clone dhcp37-137.lab.eng.blr.redhat.com-cluster_ip-1 (kind: Mandatory) (Options: first-action=start then-action=start) Adding dhcp37-56.lab.eng.blr.redhat.com-trigger_ip-1 nfs-grace-clone (kind: Mandatory) (Options: first-action=start then-action=start) Adding nfs-grace-clone dhcp37-56.lab.eng.blr.redhat.com-cluster_ip-1 (kind: Mandatory) (Options: first-action=start then-action=start) Adding dhcp37-100.lab.eng.blr.redhat.com-trigger_ip-1 nfs-grace-clone (kind: Mandatory) (Options: first-action=start then-action=start) Adding nfs-grace-clone dhcp37-100.lab.eng.blr.redhat.com-cluster_ip-1 (kind: Mandatory) (Options: first-action=start then-action=start) CIB updated ganesha-ha.conf 100% 1112 1.1KB/s 00:00 ganesha-ha.conf 100% 1112 1.1KB/s 00:00 ganesha-ha.conf 100% 1112 1.1KB/s 00:00 nfs-ganesha process dis not start on the newly added node pcs status output: root@dhcp37-137 ~]# pcs status Cluster name: G1441129758.55 Last updated: Wed Sep 2 04:02:41 2015 Last change: Wed Sep 2 04:02:34 2015 Stack: corosync Current DC: dhcp37-137.lab.eng.blr.redhat.com (1) - partition with quorum Version: 1.1.12-a14efad 3 Nodes configured 13 Resources configured Online: [ dhcp37-100.lab.eng.blr.redhat.com dhcp37-137.lab.eng.blr.redhat.com dhcp37-56.lab.eng.blr.redhat.com ] Full list of resources: Clone Set: nfs-mon-clone [nfs-mon] Started: [ dhcp37-100.lab.eng.blr.redhat.com dhcp37-137.lab.eng.blr.redhat.com dhcp37-56.lab.eng.blr.redhat.com ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ dhcp37-100.lab.eng.blr.redhat.com dhcp37-137.lab.eng.blr.redhat.com dhcp37-56.lab.eng.blr.redhat.com ] dhcp37-137.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped dhcp37-137.lab.eng.blr.redhat.com-trigger_ip-1 (ocf::heartbeat:Dummy): Started dhcp37-137.lab.eng.blr.redhat.com dhcp37-56.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp37-56.lab.eng.blr.redhat.com dhcp37-56.lab.eng.blr.redhat.com-trigger_ip-1 (ocf::heartbeat:Dummy): Started dhcp37-56.lab.eng.blr.redhat.com dhcp37-100.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp37-56.lab.eng.blr.redhat.com dhcp37-100.lab.eng.blr.redhat.com-trigger_ip-1 (ocf::heartbeat:Dummy): Started dhcp37-56.lab.eng.blr.redhat.com dhcp37-100-dead_ip-1 (ocf::heartbeat:Dummy): Started dhcp37-100.lab.eng.blr.redhat.com Failed actions: nfs-grace_monitor_5000 on dhcp37-56.lab.eng.blr.redhat.com 'unknown error' (1): call=109, status=Timed Out, exit-reason='none', last-rc-change='Tue Sep 1 04:49:30 2015', queued=0ms, exec=0ms nfs-grace_monitor_5000 on dhcp37-137.lab.eng.blr.redhat.com 'unknown error' (1): call=109, status=Timed Out, exit-reason='none', last-rc-change='Tue Sep 1 04:49:25 2015', queued=0ms, exec=20003ms PCSD Status: dhcp37-137.lab.eng.blr.redhat.com: Online dhcp37-56.lab.eng.blr.redhat.com: Online dhcp37-100.lab.eng.blr.redhat.com: Online Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/disabled
There is an another issue here. If the 'add-node' is performed from the node which is HA_VOL_SERVER, the export config state is not properly copied. The reason being paswordless scp of the config files shall not work if the source and destination node are found to be same. This needs to be fixed as well in the HA script.
Fixes are merged upstream (bug1259225). Please note that the above mentioned 'scp' issue shall not happen if the secret.pem file is copied to all the nodes including the one where it has been generated (as already documented in RHGS3.1 admin guide). So we shall backport only the fix for 'systemctl' command path.
As mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1259221#c5 , we do not need to backport the other fix (upstream 12091) as we have documented in RHGS 3.1 admin guide that ssh-copy-id of secret.pem file needs to be done on all the nodes including the one where it is generated.
Ganesha Add node works fine on rhel7.1 with glusterfs-3.7.1-15.el7rhgs.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1846.html