Bug 1380451

Summary: Galera fails promotion after a stop-start sequence
Product: Red Hat Enterprise Linux 7 Reporter: Raoul Scarazzini <rscarazz>
Component: resource-agentsAssignee: Damien Ciabrini <dciabrin>
Status: CLOSED DUPLICATE QA Contact: cluster-qe <cluster-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.3CC: agk, cluster-maint, dciabrin, dh3, fdinitto, oalbrigt, rscarazz
Target Milestone: pre-dev-freezeKeywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-07-20 08:34:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Raoul Scarazzini 2016-09-29 16:00:24 UTC
Description of problem:

While testing HA resource behavior inside Newton we do this operations sequence:

1 - Stop Galera;
2 - Poll every minute for the status of the other resources;
3 - Start Galera;

Problem is that Galera failed to be started again:

 ip-172.18.0.11 (ocf::heartbeat:IPaddr2):       Started overcloud-controller-0
 ip-172.20.0.19 (ocf::heartbeat:IPaddr2):       Started overcloud-controller-1
 ip-172.19.0.18 (ocf::heartbeat:IPaddr2):       Started overcloud-controller-2
 ip-172.17.0.11 (ocf::heartbeat:IPaddr2):       Started overcloud-controller-0
 Clone Set: haproxy-clone [haproxy]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Master/Slave Set: galera-master [galera]
     galera     (ocf::heartbeat:galera):        FAILED Master overcloud-controller-2 (unmanaged)
     Slaves: [ overcloud-controller-0 overcloud-controller-1 ]
 ip-172.17.0.19 (ocf::heartbeat:IPaddr2):       Started overcloud-controller-1
 Clone Set: rabbitmq-clone [rabbitmq]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Master/Slave Set: redis-master [redis]
     Masters: [ overcloud-controller-1 ]
     Slaves: [ overcloud-controller-0 overcloud-controller-2 ]
 ip-192.0.2.15  (ocf::heartbeat:IPaddr2):       Started overcloud-controller-2
 openstack-cinder-volume        (systemd:openstack-cinder-volume):      Started overcloud-controller-0

Failed Actions:
* galera_promote_0 on overcloud-controller-2 'unknown error' (1): call=179, status=complete, exitreason='MySQL server failed to start (pid=65378) (rc=0), please check your installation',
    last-rc-change='Wed Sep 28 15:20:24 2016', queued=0ms, exec=12635ms

So basically the promotion on overcloud-controller-2 failed. This can be a race condition, since it does not happen every time, but looking inside the logs could be useful to understand what happened this time.

Comment 1 Raoul Scarazzini 2016-09-29 16:02:38 UTC
sosreports, logs and status are here: http://file.rdu.redhat.com/~rscarazz/BZ1380451/

Comment 3 Damien Ciabrini 2018-07-20 08:34:38 UTC

*** This bug has been marked as a duplicate of bug 1360768 ***