Bug 1386808

Summary: OSP10 - galera can fail to bootstrap
Product: Red Hat OpenStack Reporter: Michele Baldessari <michele>
Component: puppet-tripleoAssignee: RHOS Maint <rhos-maint>
Status: CLOSED ERRATA QA Contact: Marian Krcmarik <mkrcmari>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 10.0 (Newton)CC: dciabrin, jjoyce, jschluet, mburns, oblaut, rhos-flags, scohen, slinaber, tvignaud
Target Milestone: rcKeywords: Triaged
Target Release: 10.0 (Newton)Flags: scohen: needinfo+
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: puppet-tripleo-5.3.0-1.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-12-14 16:22:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Michele Baldessari 2016-10-19 16:23:19 UTC
Description of problem:
In puppet-tripleo-5.2.0-2.el7ost.noarch we are missing the following patch:
commit 39d88a49bf83a7a3437edc82f42986596356d331
Author: Juan Antonio Osorio Robles <jaosorior>
Date:   Wed Oct 5 10:48:32 2016 +0300

    Enable usage of "short names" for Galera cluster
    
    We're not able to use FQDNs yet, so to work around this, we give
    precedence to a "short name" list we'll get from t-h-t.
    
    Change-Id: I4ef7786474c229d5212a0deb2ca02ee992b030d8
    Related-Bug: #1628521

diff --git a/manifests/profile/pacemaker/database/mysql.pp b/manifests/profile/pacemaker/database/mysql.pp
index 0169e1600a3b..7464854ee608 100644
--- a/manifests/profile/pacemaker/database/mysql.pp
+++ b/manifests/profile/pacemaker/database/mysql.pp
@@ -45,7 +45,12 @@ class tripleo::profile::pacemaker::database::mysql (
 
   # use only mysql_node_names when we land a patch in t-h-t that
   # switches to autogenerating these values from composable services
-  $galera_node_names_lookup = hiera('mysql_node_names', hiera('galera_node_names', $::hostname))
+  # The galera node names need to match the pacemaker node names... so if we
+  # want to use FQDNs for this, the cluster will not finish bootstrapping,
+  # since all the nodes will be marked as slaves. For now, we'll stick to the
+  # short name which is already registered in pacemaker until we get around
+  # this issue.
+  $galera_node_names_lookup = hiera('mysql_short_node_names', hiera('mysql_node_names', $::hostname))
   if is_array($galera_node_names_lookup) {
     $galera_nodes = downcase(join($galera_node_names_lookup, ','))
   } else {

The symptoms are that the galera db will not come up correctly:
Notice: /Stage[main]/Glance::Deps/Anchor[glance::db::end]: Dependency Exec[galera-ready] has failures: true
Notice: /Stage[main]/Glance::Deps/Anchor[glance::dbsync::begin]: Dependency Exec[galera-ready] has failures: true
Notice: /Stage[main]/Glance::Deps/Anchor[glance::dbsync::end]: Dependency Exec[galera-ready] has failures: true
Notice: /Stage[main]/Glance::Deps/Anchor[glance::service::begin]: Dependency Exec[galera-ready] has failures: true
Notice: /Firewall[998 log all]: Dependency Exec[galera-ready] has failures: true
Notice: /Firewall[999 drop all]: Dependency Exec[galera-ready] has failures: true


The reason for this is that the names passed to the galera resource agents are not the hostnames:
Oct 18 03:11:21 localhost galera(galera)[20794]: ERROR: MySQL is not running
Oct 18 03:11:21 localhost galera(galera)[20794]: INFO: Waiting on node <overcloud-controller01.internalapi.localdomain> to report database status before Master instances can start.
Oct 18 03:11:21 localhost galera(galera)[20794]: INFO: Waiting on node <overcloud-controller02.internalapi.localdomain> to report database status before Master instances can start.
Oct 18 03:11:21 localhost galera(galera)[20794]: INFO: Waiting on node <overcloud-controller03.internalapi.localdomain> to report database status before Master instances can start.


So we will constantly have the logging of the above until clustercheck will just fail.

I.e. we cannot have galera traffic on any dedicated network until https://bugzilla.redhat.com/show_bug.cgi?id=1381836 is fixed.

Comment 5 errata-xmlrpc 2016-12-14 16:22:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html