Bug 1386808 - OSP10 - galera can fail to bootstrap
Summary: OSP10 - galera can fail to bootstrap
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: puppet-tripleo
Version: 10.0 (Newton)
Hardware: All
OS: All
urgent
urgent
Target Milestone: rc
: 10.0 (Newton)
Assignee: RHOS Maint
QA Contact: Marian Krcmarik
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-19 16:23 UTC by Michele Baldessari
Modified: 2016-12-14 16:22 UTC (History)
9 users (show)

Fixed In Version: puppet-tripleo-5.3.0-1.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-12-14 16:22:54 UTC
scohen: needinfo+


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2016:2948 normal SHIPPED_LIVE Red Hat OpenStack Platform 10 enhancement update 2016-12-14 19:55:27 UTC
OpenStack gerrit 382883 None None None 2016-10-19 16:23:49 UTC

Description Michele Baldessari 2016-10-19 16:23:19 UTC
Description of problem:
In puppet-tripleo-5.2.0-2.el7ost.noarch we are missing the following patch:
commit 39d88a49bf83a7a3437edc82f42986596356d331
Author: Juan Antonio Osorio Robles <jaosorior@redhat.com>
Date:   Wed Oct 5 10:48:32 2016 +0300

    Enable usage of "short names" for Galera cluster
    
    We're not able to use FQDNs yet, so to work around this, we give
    precedence to a "short name" list we'll get from t-h-t.
    
    Change-Id: I4ef7786474c229d5212a0deb2ca02ee992b030d8
    Related-Bug: #1628521

diff --git a/manifests/profile/pacemaker/database/mysql.pp b/manifests/profile/pacemaker/database/mysql.pp
index 0169e1600a3b..7464854ee608 100644
--- a/manifests/profile/pacemaker/database/mysql.pp
+++ b/manifests/profile/pacemaker/database/mysql.pp
@@ -45,7 +45,12 @@ class tripleo::profile::pacemaker::database::mysql (
 
   # use only mysql_node_names when we land a patch in t-h-t that
   # switches to autogenerating these values from composable services
-  $galera_node_names_lookup = hiera('mysql_node_names', hiera('galera_node_names', $::hostname))
+  # The galera node names need to match the pacemaker node names... so if we
+  # want to use FQDNs for this, the cluster will not finish bootstrapping,
+  # since all the nodes will be marked as slaves. For now, we'll stick to the
+  # short name which is already registered in pacemaker until we get around
+  # this issue.
+  $galera_node_names_lookup = hiera('mysql_short_node_names', hiera('mysql_node_names', $::hostname))
   if is_array($galera_node_names_lookup) {
     $galera_nodes = downcase(join($galera_node_names_lookup, ','))
   } else {

The symptoms are that the galera db will not come up correctly:
Notice: /Stage[main]/Glance::Deps/Anchor[glance::db::end]: Dependency Exec[galera-ready] has failures: true
Notice: /Stage[main]/Glance::Deps/Anchor[glance::dbsync::begin]: Dependency Exec[galera-ready] has failures: true
Notice: /Stage[main]/Glance::Deps/Anchor[glance::dbsync::end]: Dependency Exec[galera-ready] has failures: true
Notice: /Stage[main]/Glance::Deps/Anchor[glance::service::begin]: Dependency Exec[galera-ready] has failures: true
Notice: /Firewall[998 log all]: Dependency Exec[galera-ready] has failures: true
Notice: /Firewall[999 drop all]: Dependency Exec[galera-ready] has failures: true


The reason for this is that the names passed to the galera resource agents are not the hostnames:
Oct 18 03:11:21 localhost galera(galera)[20794]: ERROR: MySQL is not running
Oct 18 03:11:21 localhost galera(galera)[20794]: INFO: Waiting on node <overcloud-controller01.internalapi.localdomain> to report database status before Master instances can start.
Oct 18 03:11:21 localhost galera(galera)[20794]: INFO: Waiting on node <overcloud-controller02.internalapi.localdomain> to report database status before Master instances can start.
Oct 18 03:11:21 localhost galera(galera)[20794]: INFO: Waiting on node <overcloud-controller03.internalapi.localdomain> to report database status before Master instances can start.


So we will constantly have the logging of the above until clustercheck will just fail.

I.e. we cannot have galera traffic on any dedicated network until https://bugzilla.redhat.com/show_bug.cgi?id=1381836 is fixed.

Comment 5 errata-xmlrpc 2016-12-14 16:22:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html


Note You need to log in before you can comment on or make changes to this bug.