Bug 1262425

Summary: memcached needs the interleave=true pacemaker attribute
Product: Red Hat OpenStack Reporter: Michele Baldessari <michele>
Component: openstack-tripleo-heat-templatesAssignee: Giulio Fidente <gfidente>
Status: CLOSED ERRATA QA Contact: Asaf Hirshberg <ahirshbe>
Severity: high Docs Contact:
Priority: high    
Version: 7.0 (Kilo)CC: abeekhof, calfonso, dnavale, fdinitto, gfidente, jcoufal, mburns, michele, ohochman, rhel-osp-director-maint, rscarazz, ushkalim
Target Milestone: y2Keywords: Triaged
Target Release: 7.0 (Kilo)   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-0.8.6-72.el7ost Doc Type: Bug Fix
Doc Text:
Previously, the interleave property was not enabled for the Pacemaker memcached clone set. Due to this, the Pacemaker resources depending on the memcached had to wait for all copies of memcached to be in the running state before they could be started. With this update, the memcached clone set is configured enabling the interleave property. As a result, the Pacemaker resources depending on memcached can be started as soon as one of the copies from the clone set becomes available.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-21 16:49:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1262263    

Description Michele Baldessari 2015-09-11 15:34:16 UTC
Description of problem:
osp-d creates the memcached resource as follows (CIB dump as pcs status does not show meta attributes in pcs status):
<clone id="memcached-clone">
  <primitive class="systemd" id="memcached" type="memcached">
    <instance_attributes id="memcached-instance_attributes"/>
    <operations>
      <op id="memcached-start-timeout-60s" interval="0s" name="start" timeout="60s"/>
      <op id="memcached-monitor-interval-60s" interval="60s" name="monitor"/>
    </operations>
  </primitive>
  <meta_attributes id="memcached-clone-meta"/>
</clone>   


Whereas the osp-ha reference architecture sets interleave=true
<clone id="memcached-clone">
  <primitive class="systemd" id="memcached" type="memcached">
    <instance_attributes id="memcached-instance_attributes"/>
    <operations>
      <op id="memcached-monitor-interval-60s" interval="60s" name="monitor"/>
    </operations>
  </primitive>
  <meta_attributes id="memcached-clone-meta">
    <nvpair id="memcached-interleave" name="interleave" value="true"/>
  </meta_attributes>
</clone>

Comment 3 chris alfonso 2015-09-14 16:10:50 UTC
What is the overall impact of this bug, is it just an info display issue or does it cause other issues?

Comment 4 Michele Baldessari 2015-09-14 16:24:24 UTC
Mainly speed of starting all the services on a controller. Without interleave=true
the cascade of services depending on memcached (keystone and services depending on keystone) will all need to wait for memcached to be started on *all* nodes before starting themselves.

E.g. with memcached interleave=true, keystone on node A can start as soon as memcached on node A is started and does not need to wait for memcached to be started on node B and C.

Fabio, anything I missed above?

Comment 5 Fabio Massimo Di Nitto 2015-09-14 16:53:28 UTC
(In reply to Michele Baldessari from comment #4)
> Mainly speed of starting all the services on a controller. Without
> interleave=true
> the cascade of services depending on memcached (keystone and services
> depending on keystone) will all need to wait for memcached to be started on
> *all* nodes before starting themselves.
> 
> E.g. with memcached interleave=true, keystone on node A can start as soon as
> memcached on node A is started and does not need to wait for memcached to be
> started on node B and C.
> 
> Fabio, anything I missed above?

That is correct, use of interleave=true decreases recovery time of services in case of some faults. It is already used for many openstack services, but for some reasons OSPd based deployments didnĀ“t have it.

Comment 6 Michele Baldessari 2015-09-16 20:24:38 UTC
I need to partially backpedal on my comment #4. The issue here is *not* simply a speed of starting problem (which still holds true). The real problem is that whenever a controller joins a cluster (say after a reboot), pacemaker will consider memcached as a single unit on all nodes so it will restart keystone on every node. So this one is more important than initially thought.

Putting Andrew in CC: as he provided this feedback in today's call

Comment 10 Asaf Hirshberg 2015-12-01 10:14:41 UTC
verified on RHEL-OSP director 7.2 puddle - 2015-11-25.2

using cibadmin --query --local:
      <clone id="memcached-clone">
        <primitive class="systemd" id="memcached" type="memcached">
          <instance_attributes id="memcached-instance_attributes"/>
          <operations>
            <op id="memcached-start-interval-0s" interval="0s" name="start" timeout="100s"/>
            <op id="memcached-stop-interval-0s" interval="0s" name="stop" timeout="100s"/>
            <op id="memcached-monitor-interval-60s" interval="60s" name="monitor"/>
          </operations>
        </primitive>
        <meta_attributes id="memcached-clone-meta_attributes">
          <nvpair id="memcached-interleave" name="interleave" value="true"/>

Info:
rpm: openstack-tripleo-heat-templates-0.8.6-85.el7ost.noarch
HA-environmet: 3 controllers, 3 computes

Comment 13 errata-xmlrpc 2015-12-21 16:49:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:2650