1262425 – memcached needs the interleave=true pacemaker attribute

Bug 1262425 - memcached needs the interleave=true pacemaker attribute

Summary: memcached needs the interleave=true pacemaker attribute

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-heat-templates
Sub Component:
Version:	7.0 (Kilo)
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	y2
Target Release:	7.0 (Kilo)
Assignee:	Giulio Fidente
QA Contact:	Asaf Hirshberg
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1262263
TreeView+	depends on / blocked

Reported:	2015-09-11 15:34 UTC by Michele Baldessari
Modified:	2015-12-21 16:49 UTC (History)
CC List:	12 users (show)
Fixed In Version:	openstack-tripleo-heat-templates-0.8.6-72.el7ost
Doc Type:	Bug Fix
Doc Text:	Previously, the interleave property was not enabled for the Pacemaker memcached clone set. Due to this, the Pacemaker resources depending on the memcached had to wait for all copies of memcached to be in the running state before they could be started. With this update, the memcached clone set is configured enabling the interleave property. As a result, the Pacemaker resources depending on memcached can be started as soon as one of the copies from the clone set becomes available.
Clone Of:
Environment:
Last Closed:	2015-12-21 16:49:01 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
OpenStack gerrit	236990	0	None	None	None	Never
Red Hat Product Errata	RHSA-2015:2650	0	normal	SHIPPED_LIVE	Moderate: Red Hat Enterprise Linux OpenStack Platform 7 director update	2015-12-21 21:44:54 UTC

Description Michele Baldessari 2015-09-11 15:34:16 UTC

Description of problem:
osp-d creates the memcached resource as follows (CIB dump as pcs status does not show meta attributes in pcs status):
<clone id="memcached-clone">
  <primitive class="systemd" id="memcached" type="memcached">
    <instance_attributes id="memcached-instance_attributes"/>
    <operations>
      <op id="memcached-start-timeout-60s" interval="0s" name="start" timeout="60s"/>
      <op id="memcached-monitor-interval-60s" interval="60s" name="monitor"/>
    </operations>
  </primitive>
  <meta_attributes id="memcached-clone-meta"/>
</clone>   


Whereas the osp-ha reference architecture sets interleave=true
<clone id="memcached-clone">
  <primitive class="systemd" id="memcached" type="memcached">
    <instance_attributes id="memcached-instance_attributes"/>
    <operations>
      <op id="memcached-monitor-interval-60s" interval="60s" name="monitor"/>
    </operations>
  </primitive>
  <meta_attributes id="memcached-clone-meta">
    <nvpair id="memcached-interleave" name="interleave" value="true"/>
  </meta_attributes>
</clone>

Comment 3 chris alfonso 2015-09-14 16:10:50 UTC

What is the overall impact of this bug, is it just an info display issue or does it cause other issues?

Comment 4 Michele Baldessari 2015-09-14 16:24:24 UTC

Mainly speed of starting all the services on a controller. Without interleave=true
the cascade of services depending on memcached (keystone and services depending on keystone) will all need to wait for memcached to be started on *all* nodes before starting themselves.

E.g. with memcached interleave=true, keystone on node A can start as soon as memcached on node A is started and does not need to wait for memcached to be started on node B and C.

Fabio, anything I missed above?

Comment 5 Fabio Massimo Di Nitto 2015-09-14 16:53:28 UTC

(In reply to Michele Baldessari from comment #4)
> Mainly speed of starting all the services on a controller. Without
> interleave=true
> the cascade of services depending on memcached (keystone and services
> depending on keystone) will all need to wait for memcached to be started on
> *all* nodes before starting themselves.
> 
> E.g. with memcached interleave=true, keystone on node A can start as soon as
> memcached on node A is started and does not need to wait for memcached to be
> started on node B and C.
> 
> Fabio, anything I missed above?

That is correct, use of interleave=true decreases recovery time of services in case of some faults. It is already used for many openstack services, but for some reasons OSPd based deployments didn´t have it.

Comment 6 Michele Baldessari 2015-09-16 20:24:38 UTC

I need to partially backpedal on my comment #4. The issue here is *not* simply a speed of starting problem (which still holds true). The real problem is that whenever a controller joins a cluster (say after a reboot), pacemaker will consider memcached as a single unit on all nodes so it will restart keystone on every node. So this one is more important than initially thought.

Putting Andrew in CC: as he provided this feedback in today's call

Comment 10 Asaf Hirshberg 2015-12-01 10:14:41 UTC

verified on RHEL-OSP director 7.2 puddle - 2015-11-25.2

using cibadmin --query --local:
      <clone id="memcached-clone">
        <primitive class="systemd" id="memcached" type="memcached">
          <instance_attributes id="memcached-instance_attributes"/>
          <operations>
            <op id="memcached-start-interval-0s" interval="0s" name="start" timeout="100s"/>
            <op id="memcached-stop-interval-0s" interval="0s" name="stop" timeout="100s"/>
            <op id="memcached-monitor-interval-60s" interval="60s" name="monitor"/>
          </operations>
        </primitive>
        <meta_attributes id="memcached-clone-meta_attributes">
          <nvpair id="memcached-interleave" name="interleave" value="true"/>

Info:
rpm: openstack-tripleo-heat-templates-0.8.6-85.el7ost.noarch
HA-environmet: 3 controllers, 3 computes

Comment 13 errata-xmlrpc 2015-12-21 16:49:01 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:2650

Note You need to log in before you can comment on or make changes to this bug.