Bug 1123312 - mariadb will fail to start because puppet are not adding "op start timeout=120s" to the configuration.
Summary: mariadb will fail to start because puppet are not adding "op start timeout=12...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-foreman-installer
Version: 5.0 (RHEL 7)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ga
: Installer
Assignee: Crag Wolfe
QA Contact: Leonid Natapov
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-25 09:48 UTC by Leonid Natapov
Modified: 2016-04-26 19:06 UTC (History)
9 users (show)

Fixed In Version: openstack-foreman-installer-2.0.19-1.el6ost
Doc Type: Bug Fix
Doc Text:
Previously, an appropriately long start time had not been assigned for Galera (MariaDB) service under the Pacemaker control which lead to a false error condition as Galera did not start up within the allocated window. With this update, the start timeout has been increased to 300s and as a result, Pacemaker is able to start up Galera under systemd.
Clone Of:
Environment:
Last Closed: 2014-08-21 18:06:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:1090 0 normal SHIPPED_LIVE Red Hat Enterprise Linux OpenStack Platform Enhancement Advisory 2014-08-22 15:28:08 UTC

Description Leonid Natapov 2014-07-25 09:48:03 UTC
Rubygem-staypuft: HA: mariadb will fail to start because puppet are not adding "op start timeout=120s" to the configuration.

Try to pcs cluster stop && sleep 5 && pcs cluster start on any node and mariadb will fail to start because puppet are not adding "op start timeout=120s" to the configuration.

galera/mariadb can take a decent amount of time to sync. This problem is solved by switching to the galera resource agent.

must switch galera to use the new how-to and resource-agent

See:
http://rhel-ha.etherpad.corp.redhat.com/RHOS-RHEL-HA-how-to-mrgcloud-rhos5-on-rhel7-db

Comment 4 Ryan O'Hara 2014-07-31 17:12:49 UTC
(In reply to Leonid Natapov from comment #0)
> Rubygem-staypuft: HA: mariadb will fail to start because puppet are not
> adding "op start timeout=120s" to the configuration.

Provide logs to support this. Specifically, show that pacemaker tried to recover because a node took longer than 60s to join the cluster.

> Try to pcs cluster stop && sleep 5 && pcs cluster start on any node and
> mariadb will fail to start because puppet are not adding "op start
> timeout=120s" to the configuration.

Running "pcs cluster stop && sleep 5 && pcs cluster start" worked fine for me.

> galera/mariadb can take a decent amount of time to sync. This problem is
> solved by switching to the galera resource agent.

Did you try this?

> must switch galera to use the new how-to and resource-agent
> 
> See:
> http://rhel-ha.etherpad.corp.redhat.com/RHOS-RHEL-HA-how-to-mrgcloud-rhos5-
> on-rhel7-db

Comment 5 David Vossel 2014-07-31 18:55:56 UTC
(In reply to Ryan O'Hara from comment #4)
> (In reply to Leonid Natapov from comment #0)
> > Rubygem-staypuft: HA: mariadb will fail to start because puppet are not
> > adding "op start timeout=120s" to the configuration.
> 
> Provide logs to support this. Specifically, show that pacemaker tried to
> recover because a node took longer than 60s to join the cluster.
> 
> > Try to pcs cluster stop && sleep 5 && pcs cluster start on any node and
> > mariadb will fail to start because puppet are not adding "op start
> > timeout=120s" to the configuration.
> 
> Running "pcs cluster stop && sleep 5 && pcs cluster start" worked fine for
> me.

There are too many variables involved with this to say a simple stop/sleep/start will trigger it.  It all depends on if during the start syncing (SST) is occurring and how large the transfer is during the sync.  It also depends if galera instances on other nodes are attempting to sync with a donor node at the same time. I believe only one node can sync from a donor at a time, which might mean there's a period of time a galera instance is blocking waiting to sync during the start operation. This would increase the chances of timing out during the start operation as well.

To be safe we should definitely set the timeout to at least 2 minutes. For larger databases the sync might even take longer.  Even a 5 minute timeout just to be safe wouldn't be unreasonable. 

It is possible that managing galera with systemd  forces us into a 60 start window. I've see instances where systemd enforces its own timeout value, which could conflict with pacemakers timeout if pacemaker's timeout is longer.

If we are stuck trying to support the systemd management for now. Increase the timeout to >= 120 seconds and make sure to also set the ordered=true metadata option.

Example:
pcs resource create db systemd:mariadb op start timeout=300s meta ordered=true --clone

The 'ordered=true' option will guarantee pacemaker starts the galera instances in series instead of parallel. This will prevent the condition were two nodes are attempting to SST the same donor at the same time.

-- Vossel

Comment 6 Ryan O'Hara 2014-07-31 19:46:50 UTC
(In reply to David Vossel from comment #5)
> (In reply to Ryan O'Hara from comment #4)
> > (In reply to Leonid Natapov from comment #0)
> > > Rubygem-staypuft: HA: mariadb will fail to start because puppet are not
> > > adding "op start timeout=120s" to the configuration.
> > 
> > Provide logs to support this. Specifically, show that pacemaker tried to
> > recover because a node took longer than 60s to join the cluster.
> > 
> > > Try to pcs cluster stop && sleep 5 && pcs cluster start on any node and
> > > mariadb will fail to start because puppet are not adding "op start
> > > timeout=120s" to the configuration.
> > 
> > Running "pcs cluster stop && sleep 5 && pcs cluster start" worked fine for
> > me.
> 
> There are too many variables involved with this to say a simple
> stop/sleep/start will trigger it.  It all depends on if during the start
> syncing (SST) is occurring and how large the transfer is during the sync. 
> It also depends if galera instances on other nodes are attempting to sync
> with a donor node at the same time. I believe only one node can sync from a
> donor at a time, which might mean there's a period of time a galera instance
> is blocking waiting to sync during the start operation. This would increase
> the chances of timing out during the start operation as well.

Right. But if we assume that the nodes are sync'd when the cluster is stopped, there will be no SST when the nodes rejoin on cluster start. So there isn't enough information in this bug to say that mariadb failed due to start delay being too short, etc.

You're right that a node can only be a donor for one joiner at a time, and yes this could delay things if your bootstrap node is sync-ing node #2 (while node #3 waits). I am not convinced this is the case here.

> To be safe we should definitely set the timeout to at least 2 minutes. For
> larger databases the sync might even take longer.  Even a 5 minute timeout
> just to be safe wouldn't be unreasonable. 

I don't disagree, but I was also under the impression that excessively long start delay is bad.

> It is possible that managing galera with systemd  forces us into a 60 start
> window. I've see instances where systemd enforces its own timeout value,
> which could conflict with pacemakers timeout if pacemaker's timeout is
> longer.
> 
> If we are stuck trying to support the systemd management for now. Increase
> the timeout to >= 120 seconds and make sure to also set the ordered=true
> metadata option.
> 
> Example:
> pcs resource create db systemd:mariadb op start timeout=300s meta
> ordered=true --clone
> 
> The 'ordered=true' option will guarantee pacemaker starts the galera
> instances in series instead of parallel. This will prevent the condition
> were two nodes are attempting to SST the same donor at the same time.

OK that might be useful.

Comment 7 David Vossel 2014-07-31 19:58:22 UTC
(In reply to Ryan O'Hara from comment #6)
> (In reply to David Vossel from comment #5)
> > (In reply to Ryan O'Hara from comment #4)
> > > (In reply to Leonid Natapov from comment #0)
> > > > Rubygem-staypuft: HA: mariadb will fail to start because puppet are not
> > > > adding "op start timeout=120s" to the configuration.
> > > 
> > > Provide logs to support this. Specifically, show that pacemaker tried to
> > > recover because a node took longer than 60s to join the cluster.
> > > 
> > > > Try to pcs cluster stop && sleep 5 && pcs cluster start on any node and
> > > > mariadb will fail to start because puppet are not adding "op start
> > > > timeout=120s" to the configuration.
> > > 
> > > Running "pcs cluster stop && sleep 5 && pcs cluster start" worked fine for
> > > me.
> > 
> > There are too many variables involved with this to say a simple
> > stop/sleep/start will trigger it.  It all depends on if during the start
> > syncing (SST) is occurring and how large the transfer is during the sync. 
> > It also depends if galera instances on other nodes are attempting to sync
> > with a donor node at the same time. I believe only one node can sync from a
> > donor at a time, which might mean there's a period of time a galera instance
> > is blocking waiting to sync during the start operation. This would increase
> > the chances of timing out during the start operation as well.
> 
> Right. But if we assume that the nodes are sync'd when the cluster is
> stopped, there will be no SST when the nodes rejoin on cluster start. So
> there isn't enough information in this bug to say that mariadb failed due to
> start delay being too short, etc.

yep, we're running on theoretical assumptions here.

> 
> You're right that a node can only be a donor for one joiner at a time, and
> yes this could delay things if your bootstrap node is sync-ing node #2
> (while node #3 waits). I am not convinced this is the case here.
> 
> > To be safe we should definitely set the timeout to at least 2 minutes. For
> > larger databases the sync might even take longer.  Even a 5 minute timeout
> > just to be safe wouldn't be unreasonable. 
> 
> I don't disagree, but I was also under the impression that excessively long
> start delay is bad.

just to be clear, we're not talking about start-delay here. Everyone collectively erase start-delay from your memory. It was only a poor workaround for a issue with Pacemaker management of systemd... that's all behind us now :D


Long start timeouts in this case should be fine.  Start will return when mariadb finishes the sync, so idealy the start timeout should never be observed except in failure conditions.


> > It is possible that managing galera with systemd  forces us into a 60 start
> > window. I've see instances where systemd enforces its own timeout value,
> > which could conflict with pacemakers timeout if pacemaker's timeout is
> > longer.
> > 
> > If we are stuck trying to support the systemd management for now. Increase
> > the timeout to >= 120 seconds and make sure to also set the ordered=true
> > metadata option.
> > 
> > Example:
> > pcs resource create db systemd:mariadb op start timeout=300s meta
> > ordered=true --clone
> > 
> > The 'ordered=true' option will guarantee pacemaker starts the galera
> > instances in series instead of parallel. This will prevent the condition
> > were two nodes are attempting to SST the same donor at the same time.
> 
> OK that might be useful.

Comment 9 Crag Wolfe 2014-08-12 02:30:59 UTC
Patch posted: https://github.com/redhat-openstack/astapor/pull/347

Comment 10 Jason Guiditta 2014-08-12 15:40:46 UTC
Merged

Comment 13 Leonid Natapov 2014-08-18 10:08:24 UTC
openstack-foreman-installer-2.0.19-1.el6ost

[root@mac047d7b627d5a ~]# pcs resource  show mysqld-clone
 Clone: mysqld-clone
  Resource: mysqld (class=systemd type=mysqld)
   Attributes: timeout=300s 
   Meta Attrs: ordered=true 
   Operations: monitor interval=30s (mysqld-monitor-interval-30s)

Comment 14 errata-xmlrpc 2014-08-21 18:06:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1090.html


Note You need to log in before you can comment on or make changes to this bug.