Bug 1451414

Summary: First galera cluster bootstrap may fail if cluster has no data
Product: Red Hat Enterprise Linux 7 Reporter: Tom Lavigne <tlavigne>
Component: resource-agentsAssignee: Oyvind Albrigtsen <oalbrigt>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.4CC: agk, cfeist, chjones, cluster-maint, dciabrin, fdinitto, mbayer, mkrcmari, oalbrigt, royoung, rscarazz, tlavigne, ushkalim
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: resource-agents-3.9.5-82.el7_3.11 Doc Type: Bug Fix
Doc Text:
If no data is available in the Galera cluster the current node is chosen as "fallback" node to bootstrap the cluster. The way the "fallback" node is chosen makes every node take a different decision which could lead to single-node clusters of Galera. To fix this the algorithm has been changed to yield coherent results across all the nodes.
Story Points: ---
Clone Of: 1451097 Environment:
Last Closed: 2017-05-25 15:53:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1451097    
Bug Blocks:    

Description Tom Lavigne 2017-05-16 15:16:24 UTC
This bug has been copied from bug #1451097 and has been proposed
to be backported to 7.3 z-stream (EUS).

Comment 3 Udi Shkalim 2017-05-17 15:14:41 UTC
Verified:

Instruction for testing:

Additional comment, if the test is being run on a OpenStack HA overcloud, one should run the additional step on all nodes:

2b. let the resource agent use the default user for polling state 

  rm /etc/sysconfig/clustercheck

--

1. create a 3-node pacemaker cluster

  pcs cluster setup --name foo centos1 centos2 centos3 --force
  pcs cluster start --all

2. on all nodes, start from a clean mysql database in /var/lib/mysql

  rm -rf /var/lib/mysql
  mkdir /var/lib/mysql
  chown mysql. /var/lib/mysql
  restorecon /var/lib/mysql

3. create a galera resource, don't start it yet

  pcs resource create galera galera enable_creation=true wsrep_cluster_address='gcomm://centos1,centos2,centos3' meta master-max=3 --master --disable

4. monitor the cluster after the resource is enabled

  crm_mon -RrA
  pcs resource enable galera

The last-commit attribute from all nodes will be set to -1 because no WSREP commit has been integrated yet.

With the fix, on such start condition, the 3 nodes will always chose centos3 as a bootstrap node, as expected.

resource-agents-3.9.5-82.el7_3.11

Comment 5 errata-xmlrpc 2017-05-25 15:53:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1315