Description of problem: ------------------------ 4 nodes running RHEL 6.9. I am unable to setup a Ganesha HA cluster on newer versions of pacemaker ,corosync etc. over RHEL 6.9. ******************************* pcs status post Ganesha enable ******************************** [root@gqas010 ganesha]# pcs status Cluster name: G1474623123.03 WARNING: no stonith devices and stonith-enabled is not false Stack: cman Current DC: gqas015.sbu.lab.eng.bos.redhat.com (version 1.1.15-4.el6-e174ec8) - partition with quorum Last updated: Mon Feb 20 01:53:05 2017 Last change: Mon Feb 20 01:52:51 2017 by root via crmd on gqas015.sbu.lab.eng.bos.redhat.com 4 nodes and 0 resources configured Online: [ gqas009.sbu.lab.eng.bos.redhat.com gqas010.sbu.lab.eng.bos.redhat.com gqas014.sbu.lab.eng.bos.redhat.com gqas015.sbu.lab.eng.bos.redhat.com ] No resources Daemon Status: cman: active/disabled corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled [root@gqas010 ganesha]# ***************** ERROR on the CLI ***************** Restarting pcsd on the nodes in order to reload the certificates... gqas015.sbu.lab.eng.bos.redhat.com: Success gqas010.sbu.lab.eng.bos.redhat.com: Success gqas009.sbu.lab.eng.bos.redhat.com: Success gqas014.sbu.lab.eng.bos.redhat.com: Success + '[' 0 -ne 0 ']' + pcs property set stonith-enabled=false Error: unable to get cib Error: unable to get cib + sleep 4 + pcs cluster start --all gqas014.sbu.lab.eng.bos.redhat.com: Unable to connect to gqas014.sbu.lab.eng.bos.redhat.com (Connection error) gqas015.sbu.lab.eng.bos.redhat.com: Unable to connect to gqas015.sbu.lab.eng.bos.redhat.com (Connection error) gqas009.sbu.lab.eng.bos.redhat.com: Unable to connect to gqas009.sbu.lab.eng.bos.redhat.com (Connection error) gqas010.sbu.lab.eng.bos.redhat.com: Unable to connect to gqas010.sbu.lab.eng.bos.redhat.com (Connection error) Error: unable to start all nodes gqas015.sbu.lab.eng.bos.redhat.com: Unable to connect to gqas015.sbu.lab.eng.bos.redhat.com (Connection error) gqas010.sbu.lab.eng.bos.redhat.com: Unable to connect to gqas010.sbu.lab.eng.bos.redhat.com (Connection error) gqas009.sbu.lab.eng.bos.redhat.com: Unable to connect to gqas009.sbu.lab.eng.bos.redhat.com (Connection error) gqas014.sbu.lab.eng.bos.redhat.com: Unable to connect to gqas014.sbu.lab.eng.bos.redhat.com (Connection error) + '[' 1 -ne 0 ']' + logger 'pcs cluster start failed' + exit 1 [root@gqas009 ganesha]# Version-Release number of selected component (if applicable): ------------------------------------------------------------- [root@gqas009 ganesha]# rpm -qa|grep pacemaker pacemaker-cluster-libs-1.1.15-4.el6.x86_64 pacemaker-1.1.15-4.el6.x86_64 pacemaker-cli-1.1.15-4.el6.x86_64 pacemaker-libs-1.1.15-4.el6.x86_64 [root@gqas009 ganesha]# [root@gqas009 ganesha]# rpm -qa|grep corosync corosynclib-1.4.7-5.el6.x86_64 corosync-1.4.7-5.el6.x86_64 [root@gqas009 ganesha]# [root@gqas009 ganesha]# rpm -qa|grep cman cman-3.0.12.1-84.el6.x86_64 [root@gqas009 ganesha]# [root@gqas009 ganesha]# rpm -qa|grep pcsd [root@gqas009 ganesha]# [root@gqas009 ganesha]# rpm -qa|grep pcs pcs-0.9.155-2.el6.x86_64 [root@gqas009 ganesha]# rpm -qa|grep ganesha nfs-ganesha-gluster-2.4.1-7.el6rhs.x86_64 nfs-ganesha-2.4.1-7.el6rhs.x86_64 glusterfs-ganesha-3.8.4-14.el6rhs.x86_64 nfs-ganesha-debuginfo-2.4.1-7.el6rhs.x86_64 How reproducible: ------------------- 2/2 Steps to Reproduce: ------------------- Try setting up Ganesha. Actual results: --------------- pcs cluster shows no resources,pcs cluster start fails on the first time.Passes on the second. Expected results: ---------------- cluster start should pass.
while debugging this with Soumya,she added pcs cluster start --all a couple of time sin ganesha.sh,and it works on the second,third time for some reason ,and fails each time on the first try : fi pcs property set stonith-enabled=false sleep 4 pcs cluster start --all pcs cluster start --all //Add here pcs cluster start --all //Add here pcs cluster start --all //Add here if [ $? -ne 0 ]; then logger "pcs cluster start failed" *********** CLI Output *********** + pcs property set stonith-enabled=false Error: unable to get cib Error: unable to get cib + sleep 4 + pcs cluster start --all gqas014.sbu.lab.eng.bos.redhat.com: Unable to connect to gqas014.sbu.lab.eng.bos.redhat.com (Connection error) gqas015.sbu.lab.eng.bos.redhat.com: Unable to connect to gqas015.sbu.lab.eng.bos.redhat.com (Connection error) gqas009.sbu.lab.eng.bos.redhat.com: Unable to connect to gqas009.sbu.lab.eng.bos.redhat.com (Connection error) gqas010.sbu.lab.eng.bos.redhat.com: Unable to connect to gqas010.sbu.lab.eng.bos.redhat.com (Connection error) Error: unable to start all nodes gqas015.sbu.lab.eng.bos.redhat.com: Unable to connect to gqas015.sbu.lab.eng.bos.redhat.com (Connection error) gqas010.sbu.lab.eng.bos.redhat.com: Unable to connect to gqas010.sbu.lab.eng.bos.redhat.com (Connection error) gqas009.sbu.lab.eng.bos.redhat.com: Unable to connect to gqas009.sbu.lab.eng.bos.redhat.com (Connection error) gqas014.sbu.lab.eng.bos.redhat.com: Unable to connect to gqas014.sbu.lab.eng.bos.redhat.com (Connection error) + pcs cluster start --all gqas010.sbu.lab.eng.bos.redhat.com: Starting Cluster... gqas009.sbu.lab.eng.bos.redhat.com: Starting Cluster... gqas015.sbu.lab.eng.bos.redhat.com: Starting Cluster... gqas014.sbu.lab.eng.bos.redhat.com: Starting Cluster... + pcs cluster start --all gqas010.sbu.lab.eng.bos.redhat.com: Starting Cluster... gqas009.sbu.lab.eng.bos.redhat.com: Starting Cluster... gqas015.sbu.lab.eng.bos.redhat.com: Starting Cluster... gqas014.sbu.lab.eng.bos.redhat.com: Starting Cluster... + pcs cluster start --all gqas010.sbu.lab.eng.bos.redhat.com: Starting Cluster... gqas009.sbu.lab.eng.bos.redhat.com: Starting Cluster... gqas015.sbu.lab.eng.bos.redhat.com: Starting Cluster... gqas014.sbu.lab.eng.bos.redhat.com: Starting Cluster...
pcs status is OK as well : Cluster name: G1474623123.03 Stack: cman Current DC: gqas009.sbu.lab.eng.bos.redhat.com (version 1.1.15-4.el6-e174ec8) - partition with quorum Last updated: Mon Feb 20 03:28:21 2017 Last change: Mon Feb 20 02:02:20 2017 by root via cibadmin on gqas009.sbu.lab.eng.bos.redhat.com 4 nodes and 24 resources configured Online: [ gqas009.sbu.lab.eng.bos.redhat.com gqas010.sbu.lab.eng.bos.redhat.com gqas014.sbu.lab.eng.bos.redhat.com gqas015.sbu.lab.eng.bos.redhat.com ] Full list of resources: Clone Set: nfs_setup-clone [nfs_setup] Started: [ gqas009.sbu.lab.eng.bos.redhat.com gqas010.sbu.lab.eng.bos.redhat.com gqas014.sbu.lab.eng.bos.redhat.com gqas015.sbu.lab.eng.bos.redhat.com ] Clone Set: nfs-mon-clone [nfs-mon] Started: [ gqas009.sbu.lab.eng.bos.redhat.com gqas010.sbu.lab.eng.bos.redhat.com gqas014.sbu.lab.eng.bos.redhat.com gqas015.sbu.lab.eng.bos.redhat.com ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ gqas009.sbu.lab.eng.bos.redhat.com gqas010.sbu.lab.eng.bos.redhat.com gqas014.sbu.lab.eng.bos.redhat.com gqas015.sbu.lab.eng.bos.redhat.com ] Resource Group: gqas009.sbu.lab.eng.bos.redhat.com-group gqas009.sbu.lab.eng.bos.redhat.com-nfs_block (ocf::heartbeat:portblock): Started gqas009.sbu.lab.eng.bos.redhat.com gqas009.sbu.lab.eng.bos.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started gqas009.sbu.lab.eng.bos.redhat.com gqas009.sbu.lab.eng.bos.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started gqas009.sbu.lab.eng.bos.redhat.com Resource Group: gqas010.sbu.lab.eng.bos.redhat.com-group gqas010.sbu.lab.eng.bos.redhat.com-nfs_block (ocf::heartbeat:portblock): Started gqas010.sbu.lab.eng.bos.redhat.com gqas010.sbu.lab.eng.bos.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started gqas010.sbu.lab.eng.bos.redhat.com gqas010.sbu.lab.eng.bos.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started gqas010.sbu.lab.eng.bos.redhat.com Resource Group: gqas014.sbu.lab.eng.bos.redhat.com-group gqas014.sbu.lab.eng.bos.redhat.com-nfs_block (ocf::heartbeat:portblock): Started gqas014.sbu.lab.eng.bos.redhat.com gqas014.sbu.lab.eng.bos.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started gqas014.sbu.lab.eng.bos.redhat.com gqas014.sbu.lab.eng.bos.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started gqas014.sbu.lab.eng.bos.redhat.com Resource Group: gqas015.sbu.lab.eng.bos.redhat.com-group gqas015.sbu.lab.eng.bos.redhat.com-nfs_block (ocf::heartbeat:portblock): Started gqas015.sbu.lab.eng.bos.redhat.com gqas015.sbu.lab.eng.bos.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started gqas015.sbu.lab.eng.bos.redhat.com gqas015.sbu.lab.eng.bos.redhat.com-nfs_unblock (ocf::heartbeat:portblock): Started gqas015.sbu.lab.eng.bos.redhat.com Daemon Status: cman: active/disabled corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled [root@gqas009 ~]#
See https://bugzilla.redhat.com/show_bug.cgi?id=1284404 cluster devels say this is the result of new async behavior of the `pcs cluster setup ...` command. SSL auth certs have to be deployed before the cluster will accept connections. They suggest a delay of approx 12 seconds between the `pcs cluster setup ...` and `pcs cluster start --all`
We support Ganesha on RHEL 6.X,and this needs to be a part of RHGS 3.2 before shipping it. The fix is minor as well,adding a 12 sec delay between pcs cluster setup and start. Proposing this as a blocker for RHGS 3.2,so that this gets pulled in.
Instead of adding 12sec delay, IMO the cleaner approach would be to add --wait to pcs cluster start command as mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1284404#c5
Tomas Jelinek writes in email: we were discussing this on irc yesterday and I considered the discussion to be resolved and closed. Summary for the record: * this is a known pcs bug Bug 1284404 - make restarting pcsd a synchronous operation https://bugzilla.redhat.com/show_bug.cgi?id=1284404 * it is already fixed in the recent RHEL7.4 build * as a workaround put sleep(12) between "pcs cluster setup" and "pcs cluster start"
upstream 3.10- included in https://review.gluster.org/#/c/16692/ upstream 3.9 - https://review.gluster.org/#/c/16690/ upstream 3.8 - https://review.gluster.org/#/c/16691/ this change will not be done in the mainline as ganesha-ha.sh script has been taken out from the mainline and now is moved to storhaug project. downstream patch : https://code.engineering.redhat.com/gerrit/#/c/98383/
Kaleb et al, With the delay of 12s between pcs setup and start ,I am still not able to setup a cluster on RHEL 6.9 nodes even after multiple tries. My ganesha.sh is exactly like the one in the patch : https://review.gluster.org/#/c/16690/ <snip> if [ $? -ne 0 ]; then logger "pcs cluster setup ${RHEL6_PCS_CNAME_OPTION} ${name} ${servers} failed" exit 1; fi sleep 12 pcs cluster start --all while [ $? -ne 0 ]; do sleep 2 pcs cluster start --all done </snip> I still see connection failures : Error: unable to start all nodes gqas015.sbu.lab.eng.bos.redhat.com: Error connecting to gqas015.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas010.sbu.lab.eng.bos.redhat.com: Error connecting to gqas010.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas009.sbu.lab.eng.bos.redhat.com: Error connecting to gqas009.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas015.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) messages also show logged failures : Feb 23 04:18:29 gqas009 kernel: DLM (built Feb 14 2017 02:28:20) installed Feb 23 04:18:33 gqas009 root: pcs cluster start failed I generally do stuff by hand ,or using perf automation.But I tried gdeploy as well.It fails for the same reason each time. Is there something else that may need to be done?
Based on comment 14, I am moving this bug back to POST. Please work on this and see if there are any additional modifications required in the script.
The infinite loop introduced in ganesha.sh in the patch : while [ $? -ne 0 ]; do ... It prints this forever and never comes out,when I try setup via --setup(and NOT gluster CLI,as that almost never seemed to work for me on 6.9) : gqas015.sbu.lab.eng.bos.redhat.com: Error connecting to gqas015.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas010.sbu.lab.eng.bos.redhat.com: Error connecting to gqas010.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas009.sbu.lab.eng.bos.redhat.com: Error connecting to gqas009.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas014.sbu.lab.eng.bos.redhat.com: Error connecting to gqas014.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas015.sbu.lab.eng.bos.redhat.com: Error connecting to gqas015.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas010.sbu.lab.eng.bos.redhat.com: Error connecting to gqas010.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas009.sbu.lab.eng.bos.redhat.com: Error connecting to gqas009.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas014.sbu.lab.eng.bos.redhat.com: Error connecting to gqas014.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) Error: unable to start all nodes gqas015.sbu.lab.eng.bos.redhat.com: Error connecting to gqas015.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas010.sbu.lab.eng.bos.redhat.com: Error connecting to gqas010.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas009.sbu.lab.eng.bos.redhat.com: Error connecting to gqas009.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas014.sbu.lab.eng.bos.redhat.com: Error connecting to gqas014.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas015.sbu.lab.eng.bos.redhat.com: Error connecting to gqas015.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas010.sbu.lab.eng.bos.redhat.com: Error connecting to gqas010.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas009.sbu.lab.eng.bos.redhat.com: Error connecting to gqas009.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas014.sbu.lab.eng.bos.redhat.com: Error connecting to gqas014.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) Error: unable to start all nodes gqas015.sbu.lab.eng.bos.redhat.com: Error connecting to gqas015.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas010.sbu.lab.eng.bos.redhat.com: Error connecting to gqas010.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas009.sbu.lab.eng.bos.redhat.com: Error connecting to gqas009.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas014.sbu.lab.eng.bos.redhat.com: Error connecting to gqas014.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas015.sbu.lab.eng.bos.redhat.com: Error connecting to gqas015.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas010.sbu.lab.eng.bos.redhat.com: Error connecting to gqas010.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas009.sbu.lab.eng.bos.redhat.com: Error connecting to gqas009.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas014.sbu.lab.eng.bos.redhat.com: Error connecting to gqas014.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) Error: unable to start all nodes gqas015.sbu.lab.eng.bos.redhat.com: Error connecting to gqas015.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas010.sbu.lab.eng.bos.redhat.com: Error connecting to gqas010.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas009.sbu.lab.eng.bos.redhat.com: Error connecting to gqas009.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas014.sbu.lab.eng.bos.redhat.com: Error connecting to gqas014.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas015.sbu.lab.eng.bos.redhat.com: Error connecting to gqas015.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas010.sbu.lab.eng.bos.redhat.com: Error connecting to gqas010.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas009.sbu.lab.eng.bos.redhat.com: Error connecting to gqas009.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas014.sbu.lab.eng.bos.redhat.com: Error connecting to gqas014.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) Error: unable to start all nodes gqas015.sbu.lab.eng.bos.redhat.com: Error connecting to gqas015.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas010.sbu.lab.eng.bos.redhat.com: Error connecting to gqas010.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas009.sbu.lab.eng.bos.redhat.com: Error connecting to gqas009.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas014.sbu.lab.eng.bos.redhat.com: Error connecting to gqas014.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas015.sbu.lab.eng.bos.redhat.com: Error connecting to gqas015.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas009.sbu.lab.eng.bos.redhat.com: Error connecting to gqas009.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas010.sbu.lab.eng.bos.redhat.com: Error connecting to gqas010.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas014.sbu.lab.eng.bos.redhat.com: Error connecting to gqas014.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) Error: unable to start all nodes gqas015.sbu.lab.eng.bos.redhat.com: Error connecting to gqas015.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas010.sbu.lab.eng.bos.redhat.com: Error connecting to gqas010.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas009.sbu.lab.eng.bos.redhat.com: Error connecting to gqas009.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas014.sbu.lab.eng.bos.redhat.com: Error connecting to gqas014.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas015.sbu.lab.eng.bos.redhat.com: Error connecting to gqas015.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas010.sbu.lab.eng.bos.redhat.com: Error connecting to gqas010.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas009.sbu.lab.eng.bos.redhat.com: Error connecting to gqas009.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas014.sbu.lab.eng.bos.redhat.com: Error connecting to gqas014.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) Error: unable to start all nodes gqas015.sbu.lab.eng.bos.redhat.com: Error connecting to gqas015.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas010.sbu.lab.eng.bos.redhat.com: Error connecting to gqas010.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas009.sbu.lab.eng.bos.redhat.com: Error connecting to gqas009.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas014.sbu.lab.eng.bos.redhat.com: Error connecting to gqas014.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas015.sbu.lab.eng.bos.redhat.com: Error connecting to gqas015.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas010.sbu.lab.eng.bos.redhat.com: Error connecting to gqas010.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas009.sbu.lab.eng.bos.redhat.com: Error connecting to gqas009.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas014.sbu.lab.eng.bos.redhat.com: Error connecting to gqas014.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) Error: unable to start all nodes gqas015.sbu.lab.eng.bos.redhat.com: Error connecting to gqas015.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas010.sbu.lab.eng.bos.redhat.com: Error connecting to gqas010.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas009.sbu.lab.eng.bos.redhat.com: Error connecting to gqas009.sbu.lab.eng.bos.redhat.com - (HTTP error: 400) gqas014.sbu.lab.eng.bos.redhat.com: Error connecting to gqas014.sbu.lab.eng.bos.redhat.com - (HTTP error: 400)
Brought up the cluster a couple of times on glusterfs-3.8.4-16 without any problem. Verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0484.html