Bug 1380451 - Galera fails promotion after a stop-start sequence
Summary: Galera fails promotion after a stop-start sequence
Keywords:
Status: CLOSED DUPLICATE of bug 1360768
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: resource-agents
Version: 7.3
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: pre-dev-freeze
: ---
Assignee: Damien Ciabrini
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-29 16:00 UTC by Raoul Scarazzini
Modified: 2018-07-20 08:34 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-07-20 08:34:38 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Raoul Scarazzini 2016-09-29 16:00:24 UTC
Description of problem:

While testing HA resource behavior inside Newton we do this operations sequence:

1 - Stop Galera;
2 - Poll every minute for the status of the other resources;
3 - Start Galera;

Problem is that Galera failed to be started again:

 ip-172.18.0.11 (ocf::heartbeat:IPaddr2):       Started overcloud-controller-0
 ip-172.20.0.19 (ocf::heartbeat:IPaddr2):       Started overcloud-controller-1
 ip-172.19.0.18 (ocf::heartbeat:IPaddr2):       Started overcloud-controller-2
 ip-172.17.0.11 (ocf::heartbeat:IPaddr2):       Started overcloud-controller-0
 Clone Set: haproxy-clone [haproxy]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Master/Slave Set: galera-master [galera]
     galera     (ocf::heartbeat:galera):        FAILED Master overcloud-controller-2 (unmanaged)
     Slaves: [ overcloud-controller-0 overcloud-controller-1 ]
 ip-172.17.0.19 (ocf::heartbeat:IPaddr2):       Started overcloud-controller-1
 Clone Set: rabbitmq-clone [rabbitmq]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Master/Slave Set: redis-master [redis]
     Masters: [ overcloud-controller-1 ]
     Slaves: [ overcloud-controller-0 overcloud-controller-2 ]
 ip-192.0.2.15  (ocf::heartbeat:IPaddr2):       Started overcloud-controller-2
 openstack-cinder-volume        (systemd:openstack-cinder-volume):      Started overcloud-controller-0

Failed Actions:
* galera_promote_0 on overcloud-controller-2 'unknown error' (1): call=179, status=complete, exitreason='MySQL server failed to start (pid=65378) (rc=0), please check your installation',
    last-rc-change='Wed Sep 28 15:20:24 2016', queued=0ms, exec=12635ms

So basically the promotion on overcloud-controller-2 failed. This can be a race condition, since it does not happen every time, but looking inside the logs could be useful to understand what happened this time.

Comment 1 Raoul Scarazzini 2016-09-29 16:02:38 UTC
sosreports, logs and status are here: http://file.rdu.redhat.com/~rscarazz/BZ1380451/

Comment 3 Damien Ciabrini 2018-07-20 08:34:38 UTC

*** This bug has been marked as a duplicate of bug 1360768 ***


Note You need to log in before you can comment on or make changes to this bug.