Bug 1425112

Summary: [Ganesha] : Unable to bring up a Ganesha HA cluster on RHEL 6.9.
Product: [Community] GlusterFS Reporter: Kaleb KEITHLEY <kkeithle>
Component: common-haAssignee: Kaleb KEITHLEY <kkeithle>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.8CC: amukherj, asoman, bturner, bugs, dang, ffilz, jthottan, mbenjamin, rhinduja, rhs-bugs, skoduri, storage-qa-internal
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.8.10 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1425110 Environment:
Last Closed: 2017-03-14 11:10:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1424944, 1425110    
Bug Blocks:    

Comment 1 Kaleb KEITHLEY 2017-02-20 15:16:38 UTC
**********
CLI Output
***********

+ pcs property set stonith-enabled=false
Error: unable to get cib
Error: unable to get cib
+ sleep 4
+ pcs cluster start --all
gqas014: Unable to connect to gqas014.sbu.lab.eng.bos.redhat.com (Connection error)
gqas015: Unable to connect to gqas015.sbu.lab.eng.bos.redhat.com (Connection error)
gqas009: Unable to connect to gqas009.sbu.lab.eng.bos.redhat.com (Connection error)
gqas010: Unable to connect to gqas010.sbu.lab.eng.bos.redhat.com (Connection error)
Error: unable to start all nodes


cluster devels say this is the result of new async behavior of the `pcs cluster setup ...` command.

SSL auth certs have to be deployed before the cluster will accept connections.

They suggest a delay of approx 12 seconds between the `pcs cluster setup ...` and `pcs cluster start --all`

Comment 2 Worker Ant 2017-02-20 16:19:44 UTC
REVIEW: https://review.gluster.org/16691 (common-ha: unable to start HA, Connection Error) posted (#1) for review on release-3.8 by Kaleb KEITHLEY (kkeithle)

Comment 3 Worker Ant 2017-02-26 19:14:59 UTC
COMMIT: https://review.gluster.org/16691 committed in release-3.8 by Kaleb KEITHLEY (kkeithle) 
------
commit 5d499cc221850fb1f83b625df5a113e0b83d0a99
Author: Kaleb S. KEITHLEY <kkeithle>
Date:   Mon Feb 20 11:14:53 2017 -0500

    common-ha: unable to start HA, Connection Error
    
    See BZ 1284404. pcsd behavior has changed and pcsd will not accept
    connections until SSL certificates have fully propagated throughout
    all the nodes
    
    HA devels suggest a 12 second delay between the `pcs cluster setup ...`
    and the `pcs cluster start --all`
    
    release-3.9 BZ: 1425110
    release-3.9 change: https://review.gluster.org/16690
    
    Change-Id: If94b6991a62f346dbead023c7e7f8282a995728c
    BUG: 1425112
    Signed-off-by: Kaleb S. KEITHLEY <kkeithle>
    Reviewed-on: https://review.gluster.org/16691
    Smoke: Gluster Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>

Comment 4 Niels de Vos 2017-03-18 10:52:28 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.10, please open a new bug report.

glusterfs-3.8.10 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-March/000068.html
[2] https://www.gluster.org/pipermail/gluster-users/