Description of problem: When a new node is added to existing 4 node cluster,it fails to start pacemaker service on new node. As a result the cluster failed to run on new node. On existing nodes pcs status,the new node is reflected in failover state. On manually starting pacemaker service on new node,the cluster comes up running on 5th node. Version-Release number of selected component (if applicable): # rpm -qa | grep ganesha glusterfs-ganesha-3.8.4-22.el7rhgs.x86_64 nfs-ganesha-2.4.4-2.el7.x86_64 nfs-ganesha-gluster-2.4.4-2.el7.x86_64 How reproducible: Consistently Steps to Reproduce: 1.Create 4 node ganesha cluster. 2.Add a node to the existing 4 node ganesha cluster Actual results: cluster failed to start on new node.Pacemaker service is not running on additional node. Expected results: Cluster should run on new node without starting pacemaker service manually Additional info: New node pacemaker,corosync services status after adding it to existing cluster- [root@dhcp47-132 ganesha]# pcs status Error: cluster is not currently running on this node [root@dhcp47-132 ganesha]# service pacemaker status Redirecting to /bin/systemctl status pacemaker.service ● pacemaker.service - Pacemaker High Availability Cluster Manager Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; enabled; vendor preset: disabled) Active: inactive (dead) Docs: man:pacemakerd http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/index.html Apr 12 22:51:30 dhcp47-132.lab.eng.blr.redhat.com pacemakerd[15883]: notice: Stopping attrd Apr 12 22:51:30 dhcp47-132.lab.eng.blr.redhat.com pacemakerd[15883]: notice: Stopping lrmd Apr 12 22:51:30 dhcp47-132.lab.eng.blr.redhat.com pacemakerd[15883]: notice: Stopping stonith-ng Apr 12 22:51:30 dhcp47-132.lab.eng.blr.redhat.com stonith-ng[15885]: notice: Caught 'Terminated' signal Apr 12 22:51:30 dhcp47-132.lab.eng.blr.redhat.com pacemakerd[15883]: notice: Stopping cib Apr 12 22:51:30 dhcp47-132.lab.eng.blr.redhat.com cib[15884]: notice: Caught 'Terminated' signal Apr 12 22:51:30 dhcp47-132.lab.eng.blr.redhat.com cib[15884]: notice: Disconnected from Corosync Apr 12 22:51:30 dhcp47-132.lab.eng.blr.redhat.com cib[15884]: notice: Disconnected from Corosync Apr 12 22:51:30 dhcp47-132.lab.eng.blr.redhat.com pacemakerd[15883]: notice: Shutdown complete Apr 12 22:51:30 dhcp47-132.lab.eng.blr.redhat.com systemd[1]: Stopped Pacemaker High Availability Cluster Manager. [root@dhcp47-132 ganesha]# service corosync status Redirecting to /bin/systemctl status corosync.service ● corosync.service - Corosync Cluster Engine Loaded: loaded (/usr/lib/systemd/system/corosync.service; disabled; vendor preset: disabled) Active: active (running) since Wed 2017-04-12 22:53:31 IST; 15min ago Process: 23265 ExecStart=/usr/share/corosync/corosync start (code=exited, status=0/SUCCESS) Main PID: 23275 (corosync) CGroup: /system.slice/corosync.service └─23275 corosync Apr 12 22:53:30 dhcp47-132.lab.eng.blr.redhat.com corosync[23275]: [TOTEM ] adding new UDPU member {10.70.47.144} Apr 12 22:53:30 dhcp47-132.lab.eng.blr.redhat.com corosync[23275]: [TOTEM ] adding new UDPU member {10.70.47.139} Apr 12 22:53:30 dhcp47-132.lab.eng.blr.redhat.com corosync[23275]: [TOTEM ] adding new UDPU member {10.70.47.132} Apr 12 22:53:30 dhcp47-132.lab.eng.blr.redhat.com corosync[23275]: [TOTEM ] A new membership (10.70.47.132:192) was formed. Members joined: 5 Apr 12 22:53:30 dhcp47-132.lab.eng.blr.redhat.com corosync[23275]: [TOTEM ] A new membership (10.70.47.132:204) was formed. Members joined: 4 2 3 1 Apr 12 22:53:30 dhcp47-132.lab.eng.blr.redhat.com corosync[23275]: [QUORUM] This node is within the primary component and will provide service. Apr 12 22:53:30 dhcp47-132.lab.eng.blr.redhat.com corosync[23275]: [QUORUM] Members[5]: 5 4 2 3 1 Apr 12 22:53:30 dhcp47-132.lab.eng.blr.redhat.com corosync[23275]: [MAIN ] Completed service synchronization, ready to provide service. Apr 12 22:53:31 dhcp47-132.lab.eng.blr.redhat.com systemd[1]: Started Corosync Cluster Engine. Apr 12 22:53:31 dhcp47-132.lab.eng.blr.redhat.com corosync[23265]: Starting Corosync Cluster Engine (corosync): [ OK ]
Verified the same with 3.2 latest bits,Issue is not observed on 3.2
[root@dhcp47-147 shared_storage]# /usr/libexec/ganesha/ganesha-ha.sh --add /var/run/gluster/shared_storage/nfs-ganesha/ dhcp47-132.lab.eng.blr.redhat.com 10.70.44.157 Disabling SBD service... dhcp47-132.lab.eng.blr.redhat.com: sbd disabled dhcp47-147.lab.eng.blr.redhat.com: Corosync updated dhcp47-141.lab.eng.blr.redhat.com: Corosync updated dhcp47-144.lab.eng.blr.redhat.com: Corosync updated dhcp47-139.lab.eng.blr.redhat.com: Corosync updated Setting up corosync... dhcp47-132.lab.eng.blr.redhat.com: Succeeded Synchronizing pcsd certificates on nodes dhcp47-132.lab.eng.blr.redhat.com... dhcp47-132.lab.eng.blr.redhat.com: Success Restarting pcsd on the nodes in order to reload the certificates... dhcp47-132.lab.eng.blr.redhat.com: Success dhcp47-132.lab.eng.blr.redhat.com: Stopping Cluster (pacemaker)... dhcp47-144.lab.eng.blr.redhat.com: Stopping Cluster (pacemaker)... dhcp47-147.lab.eng.blr.redhat.com: Stopping Cluster (pacemaker)... dhcp47-141.lab.eng.blr.redhat.com: Stopping Cluster (pacemaker)... dhcp47-139.lab.eng.blr.redhat.com: Stopping Cluster (pacemaker)... dhcp47-132.lab.eng.blr.redhat.com: Stopping Cluster (corosync)... dhcp47-141.lab.eng.blr.redhat.com: Stopping Cluster (corosync)... dhcp47-144.lab.eng.blr.redhat.com: Stopping Cluster (corosync)... dhcp47-139.lab.eng.blr.redhat.com: Stopping Cluster (corosync)... dhcp47-147.lab.eng.blr.redhat.com: Stopping Cluster (corosync)... dhcp47-132.lab.eng.blr.redhat.com: Error connecting to dhcp47-132.lab.eng.blr.redhat.com - (HTTP error: 400) dhcp47-139.lab.eng.blr.redhat.com: Starting Cluster... dhcp47-141.lab.eng.blr.redhat.com: Starting Cluster... dhcp47-147.lab.eng.blr.redhat.com: Starting Cluster... dhcp47-144.lab.eng.blr.redhat.com: Starting Cluster... Error: unable to start all nodes dhcp47-132.lab.eng.blr.redhat.com: Error connecting to dhcp47-132.lab.eng.blr.redhat.com - (HTTP error: 400) Removing group: dhcp47-139.lab.eng.blr.redhat.com-group (and all resources within group) Stopping all resources in group: dhcp47-139.lab.eng.blr.redhat.com-group... Deleting Resource - dhcp47-139.lab.eng.blr.redhat.com-nfs_block Removing Constraint - order-nfs-grace-clone-dhcp47-139.lab.eng.blr.redhat.com-cluster_ip-1-mandatory Deleting Resource - dhcp47-139.lab.eng.blr.redhat.com-cluster_ip-1 Removing Constraint - location-dhcp47-139.lab.eng.blr.redhat.com-group Removing Constraint - location-dhcp47-139.lab.eng.blr.redhat.com-group-dhcp47-147.lab.eng.blr.redhat.com-1000 Removing Constraint - location-dhcp47-139.lab.eng.blr.redhat.com-group-dhcp47-141.lab.eng.blr.redhat.com-2000 Removing Constraint - location-dhcp47-139.lab.eng.blr.redhat.com-group-dhcp47-144.lab.eng.blr.redhat.com-3000 Removing Constraint - location-dhcp47-139.lab.eng.blr.redhat.com-group-dhcp47-139.lab.eng.blr.redhat.com-4000 Deleting Resource (and group) - dhcp47-139.lab.eng.blr.redhat.com-nfs_unblock Removing group: dhcp47-141.lab.eng.blr.redhat.com-group (and all resources within group) Stopping all resources in group: dhcp47-141.lab.eng.blr.redhat.com-group... Deleting Resource - dhcp47-141.lab.eng.blr.redhat.com-nfs_block Removing Constraint - order-nfs-grace-clone-dhcp47-141.lab.eng.blr.redhat.com-cluster_ip-1-mandatory Deleting Resource - dhcp47-141.lab.eng.blr.redhat.com-cluster_ip-1 Removing Constraint - location-dhcp47-141.lab.eng.blr.redhat.com-group Removing Constraint - location-dhcp47-141.lab.eng.blr.redhat.com-group-dhcp47-144.lab.eng.blr.redhat.com-1000 Removing Constraint - location-dhcp47-141.lab.eng.blr.redhat.com-group-dhcp47-139.lab.eng.blr.redhat.com-2000 Removing Constraint - location-dhcp47-141.lab.eng.blr.redhat.com-group-dhcp47-147.lab.eng.blr.redhat.com-3000 Removing Constraint - location-dhcp47-141.lab.eng.blr.redhat.com-group-dhcp47-141.lab.eng.blr.redhat.com-4000 Deleting Resource (and group) - dhcp47-141.lab.eng.blr.redhat.com-nfs_unblock Removing group: dhcp47-144.lab.eng.blr.redhat.com-group (and all resources within group) Stopping all resources in group: dhcp47-144.lab.eng.blr.redhat.com-group... Deleting Resource - dhcp47-144.lab.eng.blr.redhat.com-nfs_block Removing Constraint - order-nfs-grace-clone-dhcp47-144.lab.eng.blr.redhat.com-cluster_ip-1-mandatory Deleting Resource - dhcp47-144.lab.eng.blr.redhat.com-cluster_ip-1 Removing Constraint - location-dhcp47-144.lab.eng.blr.redhat.com-group Removing Constraint - location-dhcp47-144.lab.eng.blr.redhat.com-group-dhcp47-139.lab.eng.blr.redhat.com-1000 Removing Constraint - location-dhcp47-144.lab.eng.blr.redhat.com-group-dhcp47-147.lab.eng.blr.redhat.com-2000 Removing Constraint - location-dhcp47-144.lab.eng.blr.redhat.com-group-dhcp47-141.lab.eng.blr.redhat.com-3000 Removing Constraint - location-dhcp47-144.lab.eng.blr.redhat.com-group-dhcp47-144.lab.eng.blr.redhat.com-4000 Deleting Resource (and group) - dhcp47-144.lab.eng.blr.redhat.com-nfs_unblock Removing group: dhcp47-147.lab.eng.blr.redhat.com-group (and all resources within group) Stopping all resources in group: dhcp47-147.lab.eng.blr.redhat.com-group... Deleting Resource - dhcp47-147.lab.eng.blr.redhat.com-nfs_block Removing Constraint - order-nfs-grace-clone-dhcp47-147.lab.eng.blr.redhat.com-cluster_ip-1-mandatory Deleting Resource - dhcp47-147.lab.eng.blr.redhat.com-cluster_ip-1 Removing Constraint - location-dhcp47-147.lab.eng.blr.redhat.com-group Removing Constraint - location-dhcp47-147.lab.eng.blr.redhat.com-group-dhcp47-141.lab.eng.blr.redhat.com-1000 Removing Constraint - location-dhcp47-147.lab.eng.blr.redhat.com-group-dhcp47-144.lab.eng.blr.redhat.com-2000 Removing Constraint - location-dhcp47-147.lab.eng.blr.redhat.com-group-dhcp47-139.lab.eng.blr.redhat.com-3000 Removing Constraint - location-dhcp47-147.lab.eng.blr.redhat.com-group-dhcp47-147.lab.eng.blr.redhat.com-4000 Deleting Resource (and group) - dhcp47-147.lab.eng.blr.redhat.com-nfs_unblock Adding nfs-grace-clone dhcp47-139.lab.eng.blr.redhat.com-cluster_ip-1 (kind: Mandatory) (Options: first-action=start then-action=start) Adding nfs-grace-clone dhcp47-141.lab.eng.blr.redhat.com-cluster_ip-1 (kind: Mandatory) (Options: first-action=start then-action=start) Adding nfs-grace-clone dhcp47-144.lab.eng.blr.redhat.com-cluster_ip-1 (kind: Mandatory) (Options: first-action=start then-action=start) Adding nfs-grace-clone dhcp47-147.lab.eng.blr.redhat.com-cluster_ip-1 (kind: Mandatory) (Options: first-action=start then-action=start) Adding nfs-grace-clone dhcp47-132.lab.eng.blr.redhat.com-cluster_ip-1 (kind: Mandatory) (Options: first-action=start then-action=start) CIB updated
upstream patch : https://review.gluster.org/17255
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/105948/
Verified this bug on [root@dhcp42-125 ~]# rpm -qa | grep ganesha glusterfs-ganesha-3.8.4-29.el7rhgs.x86_64 nfs-ganesha-gluster-2.4.4-10.el7rhgs.x86_64 nfs-ganesha-debuginfo-2.4.4-10.el7rhgs.x86_64 nfs-ganesha-2.4.4-10.el7rhgs.x86_64 Pacemaker service is started on new node in cluster by add node command As the issue is no more observed,hence moving this bug to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774