Bug 1881114

Summary: galera resource agent fails promotion during a rolling restart
Product: Red Hat Enterprise Linux 8 Reporter: Damien Ciabrini <dciabrin>
Component: resource-agentsAssignee: Oyvind Albrigtsen <oalbrigt>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 8.2CC: agk, cfeist, cluster-maint, fdinitto, lmiccini, michele, phagara, pkomarov
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: resource-agents-4.1.1-70.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-18 15:11:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Damien Ciabrini 2020-09-21 14:43:46 UTC
Description of problem:
galera being a M/S resource, the resource agent decides when and how
to promote a resource replica (i.e. start a galera process locally)
based on the current state of the entire galera cluster:

  . If there's no galera cluster, the replica is promoted as the
    bootstrap node: it will start a galera process that will bootstrap
    a new galera cluster.

  . If there's a running galera cluster: the replica is promoted as a
    joiner node: it will join the running galera cluster, and will
    never try to bootstrap a new cluster.

When one changes a property of a pacemaker resource, pacemaker must restart
the resource on all the nodes. For instance for galera:

[root@controller-0 ~]# pcs resource show galera
 Resource: galera (class=ocf provider=heartbeat type=galera)
  Attributes: additional_parameters=--open-files-limit=16384 cluster_host_map=controller-0:controller-0.internalapi.redhat.local;controller-1:controller-1.internalapi.redhat.local;controller-2:controller-2.internalapi.redhat.local enable_creation=true log=/var/log/mysql/mysqld.log wsrep_cluster_address=gcomm://controller-0.internalapi.redhat.local,controller-1.internalapi.redhat.local,controller-2.internalapi.redhat.local
  Meta Attrs: container-attribute-target=host master-max=3 ordered=true
  Operations: demote interval=0s timeout=120s (galera-demote-interval-0s)
              monitor interval=20s timeout=30s (galera-monitor-interval-20s)
              monitor interval=10s role=Master timeout=30s (galera-monitor-interval-10s)
              monitor interval=30s role=Slave timeout=30s (galera-monitor-interval-30s)
              promote interval=0s on-fail=block timeout=300s (galera-promote-interval-0s)
              start interval=0s timeout=120s (galera-start-interval-0s)
              stop interval=0s timeout=120s (galera-stop-interval-0s)

[root@controller-0 ~]# pcs resource update galera additional_parameters=--open-files-limit=20000

For all galera replicas, pacemaker will trigger a demote, the the resource agent
will eventually request a promote operation, and configure the local node as a
bootstrap node or a joiner node depending on the state of the galera cluster
when the agent was called.

During such a rolling restart, on galera node can request a promotion
as a joiner node because when the agent was called some replicas were
still running as Master.

However, ther be some time between the moment when a node is promoted and
when the promote operation effectively takes place. So if a node is
promoted for joining a cluster, all the running galera nodes are
stopped before the promote operation start, the joining node won't be
able to join the cluster, and it can't bootstrap a new one either
because it doesn't have the most recent copy of the DB.

This promotion window makes the resource agent fail its promotion, and
blocks the replica on this node until a manual "pcs resource cleanup galera"
is executed.


Version-Release number of selected component (if applicable):
resource-agents-4.1.1-61.el7.x86_64

How reproducible:
Timing-dependent, but happens almost always in OpenStack HA control plane

Steps to Reproduce:
1. Deploy an HA overcloud
2. On a controller node, change a resource parameter as shown above

Actual results:
One replica failed to restart due to all master being stopped before
the promotion effectively takes place:

* galera_promote_0 on galera-bundle-2 'unknown error' (1): call=998, status=complete, exitreason='Failure, Attempted to promote Master instance of galera before bootstrap node has been detected.',
    last-rc-change='Mon Sep 21 14:30:34 2020', queued=0ms, exec=1535ms


Expected results:
Promotion should no longer be attempted because Master are gone, this shouldn't be fatal.

Comment 8 Damien Ciabrini 2020-12-04 08:57:39 UTC
Steps to verify the fix:

with the old resource agent:

. deploy and HA overcloud

. on a controller, update a random parameter in the galera resource

pcs resource update galera additional_parameters=--open-files-limit=20000

. observe that pacemaker restarts the resource on all nodes due to the config change.

. one of the node should fail to restart. this is a racy failure, but odds are high that the failure will happen.


with the new resource agent:

. deploy an HA overcloud

. on a controller, update a random parameter in the galera resource

pcs resource update galera additional_parameters=--open-files-limit=20000

. observe that pacemaker restarts the resource on all nodes due to the config change.

. the resource will be promoted back to Master on all nodes. Some nodes may have retried the promotion, in which case some logs will be present in the journal:

"There is no running cluster to join, demoting ourself"

Comment 9 pkomarov 2020-12-05 00:09:02 UTC
Verified ,
[stack@undercloud-0 ~]$ ansible database -b -mshell -a'podman exec `podman ps -f name=galera-bundle -q`  sh -c "rpm -q resource-agents";rpm -q resource-agents'
[WARNING]: Found both group and host with same name: undercloud
database-0 | CHANGED | rc=0 >>
resource-agents-4.1.1-79.el8.x86_64
resource-agents-4.1.1-79.el8.x86_64
database-2 | CHANGED | rc=0 >>
resource-agents-4.1.1-79.el8.x86_64
resource-agents-4.1.1-79.el8.x86_64
database-1 | CHANGED | rc=0 >>
resource-agents-4.1.1-79.el8.x86_64
resource-agents-4.1.1-79.el8.x86_64

#pcs resource update galera additional_parameters=--open-files-limit=20000
    * galera-bundle-2   (ocf::heartbeat:galera):         Demoting database-0

    * galera-bundle-0   (ocf::heartbeat:galera):         Master database-2
    * galera-bundle-1   (ocf::heartbeat:galera):         Demoting database-1
    * galera-bundle-2   (ocf::heartbeat:galera):         Slave database-0


   * galera-bundle-0   (ocf::heartbeat:galera):         Promoting database-2
    * galera-bundle-1   (ocf::heartbeat:galera):         Slave database-1
    * galera-bundle-2   (ocf::heartbeat:galera):         Master database-0


    * galera-bundle-0   (ocf::heartbeat:galera):         Master database-2
    * galera-bundle-1   (ocf::heartbeat:galera):         Master database-1
    * galera-bundle-2   (ocf::heartbeat:galera):         Master database-0

Comment 11 errata-xmlrpc 2021-05-18 15:11:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (resource-agents bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1736