Bug 1319384 - OSP9 - Pacemaker-related race condition observed during controller deployment
Summary: OSP9 - Pacemaker-related race condition observed during controller deployment
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 9.0 (Mitaka)
Assignee: Michele Baldessari
QA Contact: Aharon Canan
URL:
Whiteboard:
Depends On:
Blocks: 1339488 1395147 1418617 1418619
TreeView+ depends on / blocked
 
Reported: 2016-03-19 10:05 UTC by Pablo Caruana
Modified: 2020-05-14 15:11 UTC (History)
20 users (show)

Fixed In Version: openstack-tripleo-heat-templates-2.0.0-44.el7ost
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1395147 1418617 1418619 (view as bug list)
Environment:
Last Closed: 2017-03-08 20:05:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 397622 0 None None None 2016-11-15 09:54:09 UTC
Red Hat Product Errata RHBA-2017:0470 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 9 director Bug Fix Advisory 2017-03-09 01:05:45 UTC

Description Pablo Caruana 2016-03-19 10:05:29 UTC
Description of problem:
customer reported  a race condition related to pacemaker during deployment of Overcloud controllers. Unfortunately, logs were not were retained  at that time and they can't afford to have the Overcloud sitting idle for investigation as already delayed for deadlines

What it was observed the overcloud deployment process failing because in one of the three controllers, Puppet was unable to create the following
         pacemaker::resource::filesystem used by Glance (NFS driver). As part of the deployment process, the following Puppet fragment is run:

  if $glance_backend == 'file' and hiera('glance_file_pcmk_manage', false) {
    pacemaker::resource::filesystem { "glance-fs":
      device       => hiera('glance_file_pcmk_device'),
      directory    => hiera('glance_file_pcmk_directory'),
      fstype       => hiera('glance_file_pcmk_fstype'),
      fsoptions    => hiera('glance_file_pcmk_options', ''),
      clone_params => '',
    }
  }

This Puppet fragment failed in just one of our three controllers. Looking at logs of all controllers


In controller #0, there os-collect-config logs shows that a 'pcs resource create' command for the glance-fs resource was attempted and succeded. In controller #1, there is no trace of 'pcs resource create'. Note that in controller #2, 'pcs resource create' was called and failed with an error that claims that the glance-fs resource already exists.

Customer impression is that there is a race condition in the pacemaker::resource::filesystem code. Let me explain how I believe it all happened, first by showing a fragment of pacemaker::resource::filesystem:

Puppet::Type.type(:pcmk_resource).provide(:default) do
  desc 'A base resource definition for a pacemaker resource'

  ### overloaded methods
  def create
    ...
    # Build the 'pcs resource create' command.  Check out the pcs man page :-)
    cmd = 'resource create ' + @resource[:name]+' ' +@resource[:resource_type]
    if not_empty_string(resource_params)
      cmd += ' ' + resource_params
    end
    ...
    # do pcs create
    pcs('create', cmd)
  end
...

  def exists?
    cmd = 'resource show ' + @resource[:name] + ' > /dev/null 2>&1'
    pcs('show', cmd)
  end

From heir perspective, the underlying resource creation logic involves calling the "exists?" method and if it claims False, the the "create" method is called. However, there is a race condition here. It could happen that by the time "exists?" returns False, the local "corosync" daemon replicates the resource clone) (which was created almost at the same time in another controller), and by the time "pcs resource create" is called, the glance-fs resource already exists and then it fails because it's duplicated.

A potential proper fix is to parse the "pcs resource create" output to deal with this potential race condition. At the moment, customer think the output from "pcs resource create" is never parsed to deal with this race condition.

Another approach would be to run the Puppet code that sets Glance up just in one controller node.

In any case, it's looks a very annoying failure mode. so this is for reducing those kind of  race condition as is probably other  like ones exist.


How reproducible:
Rarely. This kind of Race conditions are hard to reproduce.

 The only currently effective workaround is redeploying the Overcloud to see if the timing is correct.

Customer don't have enough time and resources for investing a full testing so Expecting Red Hat QE for this

Comment 3 Emilien Macchi 2016-03-24 14:45:12 UTC
Looking at how TripleO works [1], the Pacemaker::Resource::Filesystem['glance-fs'] is created on all controller nodes.
Indeed, it can leads to race conditions during the deployment if Puppet is run on the same time, because each node will try to create its own filesystem.

I see 2 different options that would help to avoid this issue (maybe there is more):

* set verify_on_create to True on Pacemaker::Resource::Filesystem['glance-fs'] resource (in tripleo-heat-templates).
* manage Pacemaker::Resource::Filesystem['glance-fs'] only in the "if $pacemaker_master" block.

Both or either solutions could work, we need some testing though, I was not able to reproduce the bug yet.

[1] https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/manifests/overcloud_controller_pacemaker.pp#L637-L646

Comment 4 Crag Wolfe 2016-03-24 16:56:52 UTC
I think doing both is the right way to go. For some context, when the verify_on_create code was written, it was under the assumption that only one node would attempt to create a given pcs resource (other nodes could try to create other pcs resources or properties at the same time, that would be OK). One would expect that if only node was trying to create the resource, and if the call to pcs create succeeded, there would be no need to verify and maybe retry. But in our experience, this turned out not to be true rare cases. I forget the exact mechanism, but it was clearly seen in the logs where the cluster agreed that the latest cib.xml should not include the resource that pcs said it created. This is probably more likely to occur when you have other nodes that also editing the cluster definition through pcs calls, e.g. updating pcs properties. But, I think there is a possibility of it it occurring anyway.

So, verify_on_create is a good idea when creating a pcs resource. However, creating the same resource with (or without) verify_on_create on two nodes could lead to one of the nodes having a puppet error, i.e. what Paul wrote is correct:

"From their perspective, the underlying resource creation logic involves calling the "exists?" method and if it claims False, the the "create" method is called. However, there is a race condition here. It could happen that by the time "exists?" returns False, the local "corosync" daemon replicates the resource clone) (which was created almost at the same time in another controller), and by the time "pcs resource create" is called, the glance-fs resource already exists and then it fails because it's duplicated.

A potential proper fix is to parse the "pcs resource create" output to deal with this potential race condition. At the moment, customer think the output from "pcs resource create" is never parsed to deal with this race condition."

Specifically, in the way verify_on_create is currently written:
https://github.com/openstack/puppet-pacemaker/blob/master/lib/puppet/provider/pcmk_resource/default.rb#L152
if the pcs command fails because the resource already exists (or any other reason), it won't even attempt to try to verify with "pcs show".

Comment 5 Mike Burns 2016-04-07 21:14:44 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 6 Felipe Alfaro Solana 2016-06-17 15:28:22 UTC
May I ask you to reconsider fixing this in OSP7 or OSP8? I personally don't think it's that hard to defer it for more than 4 months.

Comment 7 Mike Burns 2016-06-17 16:15:17 UTC
moving needinfo to HA PM

Comment 8 Edu Alcaniz 2016-09-13 05:48:17 UTC
Morning, could you give us an update about this RFE? Thanks very much.

Comment 9 Jaromir Coufal 2016-10-13 18:39:00 UTC
This seems like a pacemaker config issue. Moving to HA team.

Comment 17 Udi Shkalim 2017-03-01 17:53:34 UTC
I've started testing this one but noticed that we need NFS backend so I moved it to the storage team.

Comment 18 Marian Krcmarik 2017-03-02 07:51:20 UTC
I am going to verify on openstack-tripleo-heat-templates-2.0.0-44.el7ost.

I was able to reproduce problem where fs-varlibglanceimages resource creation was triggered multiple times, It happened 3 of 10 overcloud deploys in my env. I was not able to reproduce with fixed package during 10 tries of overcloud deploy.
I did not observe situation when resource could be created but not yet present.

The error when reproduced on older package I got:
Error: pcs       |
create failed: Error: unable to create resource/fence device 'fs-varlibglanceimages\', \'fs-varlibglanceimages\' already exists on this                  
system\x1b[0m\n\x1b[1;31mError: /Stage[main]/Main/Pacemaker::Resource::Filesystem[glance-fs]/Pcmk_resource[fs-varlibglanceimages]/ensure: change from absent to present failed: pcs create failed: Error: unable to create resource/fence device \'fs-varlibglanceimages\', \'fs-varlibglanceimages\' already exists on this system\x1b[0m\n', u'deploy_status_code': 6}

Comment 20 errata-xmlrpc 2017-03-08 20:05:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0470.html


Note You need to log in before you can comment on or make changes to this bug.