Bug 1381836

Summary: galera resource agent cannot bootstrap cluster when galera and pacemaker nodes' name differ
Product: Red Hat Enterprise Linux 7 Reporter: Damien Ciabrini <dciabrin>
Component: resource-agentsAssignee: Damien Ciabrini <dciabrin>
Status: CLOSED CURRENTRELEASE QA Contact: cluster-qe <cluster-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.3CC: agk, cluster-maint, fdinitto, josorior, mbayer, mcornea, michele, oalbrigt
Target Milestone: rcKeywords: Triaged
Target Release: 7.4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-07-20 08:31:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Damien Ciabrini 2016-10-05 07:44:37 UTC
Description of problem:

In order to bootstrap a Galera cluster, the galera resource agent needs to retrieve the last sequence number of each galera node to find out which node is the most up-to-date.

For each galera node, the last sequence number is stored as a pacemaker attribute linked to the corresponding pacemaker node. There is currently
a 1-1 mapping between galera node's name and pacemaker node's name, and
the resource agent enforces that.

In case the galera server has to be started on a different network than
the one pacemaker is using, it is likely that both galera and pacemaker name
will differ (e.g. gcomm://overcloud-controller-0.internalapi.localdomain vs overcloud-controller-0.localdomain). If so, the resource agent will bail out
and the bootstrap will never finish.

Version-Release number of selected component (if applicable):
RHEL 7

How reproducible:
Always

Steps to Reproduce:
1. have galera nodes' name differ from pacemaker node
2. setup a galera resource with the appropriate wsrep_cluster_address
3. enable the galera resource

Actual results:
galera resource stuck is state "Slave", will never go "Master" in pacemaker

Expected results:
galera resource go "Master", i.e. galera cluster is bootstrapped

Comment 2 Juan Antonio Osorio 2017-06-28 14:16:36 UTC
This was already addressed by the resource agent by Damien's introduction of the cluster_host_map parameter.