Bug 1654432 - [OSP13] puppet-pacemaker instanceha does does not work correctly with more than 10 nodes due to regex issues
Summary: [OSP13] puppet-pacemaker instanceha does does not work correctly with more th...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: puppet-pacemaker
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: z4
: 13.0 (Queens)
Assignee: Michele Baldessari
QA Contact: pkomarov
URL:
Whiteboard:
Depends On:
Blocks: 1655217
TreeView+ depends on / blocked
 
Reported: 2018-11-28 19:02 UTC by Andreas Karis
Modified: 2022-03-13 16:33 UTC (History)
9 users (show)

Fixed In Version: puppet-pacemaker-0.7.2-0.20180423212255.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1655217 (view as bug list)
Environment:
Last Closed: 2019-01-16 17:55:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1805786 0 None None None 2018-11-29 08:16:29 UTC
OpenStack gerrit 620892 0 'None' MERGED Match node properties more strictly 2020-06-21 16:03:12 UTC
Red Hat Issue Tracker OSP-13775 0 None None None 2022-03-13 16:33:15 UTC
Red Hat Product Errata RHBA-2019:0068 0 None None None 2019-01-16 17:55:38 UTC

Description Andreas Karis 2018-11-28 19:02:30 UTC
Description of problem:
puppet-pacemaker instanceha does does not work correctly with more than 10 nodes due to regex issues

------------------------------------------------------------------------

Deploying instance-ha with 12 compute nodes. compute-1 always has
issues for a deployment with 12 nodes. With 4 computes, it's o.k.
We checked with 10 compute nodes as well, and 10 compute nodes are fine, too

~~~
pcs status
(...)
 Clone Set: compute-unfence-trigger-clone [compute-unfence-trigger]
     Started: [ compute-0 compute-10 compute-11 compute-2 compute-3
compute-4 compute-5 compute-6 compute-7 compute-8 compute-9 ]
     Stopped: [ compute-1 controller-0 controller-1 controller-2 ]
(...)
~~~

~~~
[root@compute-1 ~]# hiera -c /etc/puppet/hiera.yaml tripleo::instanceha
true
[root@compute-1 ~]#
[root@compute-1 ~]# journalctl | grep instanceha-role
[root@compute-1 ~]#
~~~

Vs.
~~~
[root@compute-8 ~]# journalctl | grep instanceha-role | head -1
Nov 27 15:10:50 compute-8 puppet-user[33673]:
(/Stage[main]/Tripleo::Profile::Pacemaker::Compute_instanceha/Pacemaker::Property[compute-instanceha-role-node-property]/Pcmk_property[property-compute-8-compute-instanceha-role]/ensure)
created
[root@compute-8 ~]#
~~~

~~~
[root@compute-1 ~]# cat
/etc/puppet/modules/tripleo/manifests/profile/pacemaker/compute_instanceha.pp
# == Class: tripleo::profile::pacemaker::compute_instanceha
#
# Configures Compute nodes for Instance HA
#
# === Parameters:
#
# [*step*]
#   (Optional) The current step in deployment. See tripleo-heat-templates
#   for more details.
#   Defaults to hiera('step')
#
# [*pcs_tries*]
#   (Optional) The number of times pcs commands should be retried.
#   Defaults to hiera('pcs_tries', 20)
#
# [*enable_instanceha*]
#  (Optional) Boolean driving the Instance HA controlplane configuration
#  Defaults to false
#
class tripleo::profile::pacemaker::compute_instanceha (
  $step              = Integer(hiera('step')),
  $pcs_tries         = hiera('pcs_tries', 20),
  $enable_instanceha = hiera('tripleo::instanceha', false),
) {
  if $step >= 2 and $enable_instanceha {
    pacemaker::property { 'compute-instanceha-role-node-property':
      property => 'compute-instanceha-role',
      value    => true,
      tries    => $pcs_tries,
      node     => $::hostname,
    }
  }
}
~~~

~~~
Nov 27 15:10:50 [39123] controller-0        cib:     info:
cib_perform_op:      ++                                <nvpair
id="nodes-compute-8-compute-instanceha-role"
name="compute-instanceha-role" value="true"/>
Nov 27 15:10:50 [39123] controller-0        cib:     info:
cib_perform_op:      ++                                <nvpair
id="nodes-compute-3-compute-instanceha-role"
name="compute-instanceha-role" value="true"/>
Nov 27 15:10:56 [39123] controller-0        cib:     info:
cib_perform_op:      ++                                <nvpair
id="nodes-compute-6-compute-instanceha-role"
name="compute-instanceha-role" value="true"/>
Nov 27 15:10:57 [39123] controller-0        cib:     info:
cib_perform_op:      ++                                <nvpair
id="nodes-compute-4-compute-instanceha-role"
name="compute-instanceha-role" value="true"/>
Nov 27 15:11:03 [39123] controller-0        cib:     info:
cib_perform_op:      ++                                <nvpair
id="nodes-compute-2-compute-instanceha-role"
name="compute-instanceha-role" value="true"/>
Nov 27 15:11:08 [39123] controller-0        cib:     info:
cib_perform_op:      ++                                <nvpair
id="nodes-compute-10-compute-instanceha-role"
name="compute-instanceha-role" value="true"/>
Nov 27 15:11:08 [39123] controller-0        cib:     info:
cib_perform_op:      ++                                <nvpair
id="nodes-compute-7-compute-instanceha-role"
name="compute-instanceha-role" value="true"/>
Nov 27 15:11:09 [39123] controller-0        cib:     info:
cib_perform_op:      ++                                <nvpair
id="nodes-compute-5-compute-instanceha-role"
name="compute-instanceha-role" value="true"/>
Nov 27 15:11:10 [39123] controller-0        cib:     info:
cib_perform_op:      ++                                <nvpair
id="nodes-compute-9-compute-instanceha-role"
name="compute-instanceha-role" value="true"/>
Nov 27 15:11:17 [39123] controller-0        cib:     info:
cib_perform_op:      ++                                <nvpair
id="nodes-compute-11-compute-instanceha-role"
name="compute-instanceha-role" value="true"/>
Nov 27 15:11:20 [39123] controller-0        cib:     info:
cib_perform_op:      ++                                <nvpair
id="nodes-compute-0-compute-instanceha-role"
name="compute-instanceha-role" value="true"/>
Nov 27 15:23:55 [39123] controller-0        cib:     info:
cib_perform_op:      ++
<expression attribute="compute-instanceha-role"
id="location-compute-unfence-trigger-clone-rule-expr" operation="ne"
value="true"/>
Nov 27 15:24:01 [39123] controller-0        cib:     info:
cib_perform_op:      ++
<expression attribute="compute-instanceha-role"
id="location-nova-evacuate-rule-expr" operation="eq" value="true"/>
(overcloud-Queens) [root@controller-0 ~]# cibadmin -Q | grep
compute-instanceha-role
          <nvpair id="nodes-compute-3-compute-instanceha-role"
name="compute-instanceha-role" value="true"/>
          <nvpair id="nodes-compute-8-compute-instanceha-role"
name="compute-instanceha-role" value="true"/>
          <nvpair id="nodes-compute-6-compute-instanceha-role"
name="compute-instanceha-role" value="true"/>
          <nvpair id="nodes-compute-4-compute-instanceha-role"
name="compute-instanceha-role" value="true"/>
          <nvpair id="nodes-compute-2-compute-instanceha-role"
name="compute-instanceha-role" value="true"/>
          <nvpair id="nodes-compute-5-compute-instanceha-role"
name="compute-instanceha-role" value="true"/>
          <nvpair id="nodes-compute-7-compute-instanceha-role"
name="compute-instanceha-role" value="true"/>
          <nvpair id="nodes-compute-10-compute-instanceha-role"
name="compute-instanceha-role" value="true"/>
          <nvpair id="nodes-compute-9-compute-instanceha-role"
name="compute-instanceha-role" value="true"/>
          <nvpair id="nodes-compute-11-compute-instanceha-role"
name="compute-instanceha-role" value="true"/>
          <nvpair id="nodes-compute-0-compute-instanceha-role"
name="compute-instanceha-role" value="true"/>
          <expression attribute="compute-instanceha-role"
id="location-compute-unfence-trigger-clone-rule-expr" operation="ne"
value="true"/>
          <expression attribute="compute-instanceha-role"
id="location-nova-evacuate-rule-expr" operation="eq" value="true"/>
(overcloud-Queens) [root@controller-0 ~]#
~~~

And I can "fix" this manually by running:
~~~

root@compute-1 ~]# pcs property set --node compute-1
compute-instanceha-role=true
[root@compute-1 ~]# pcs property show
(...)
 compute-0: compute-instanceha-role=true
 compute-1: compute-instanceha-role=true
 compute-10: compute-instanceha-role=true
 compute-11: compute-instanceha-role=true
 compute-2: compute-instanceha-role=true
 compute-3: compute-instanceha-role=true
 compute-4: compute-instanceha-role=true
 compute-5: compute-instanceha-role=true
 compute-6: compute-instanceha-role=true
 compute-7: compute-instanceha-role=true
 compute-8: compute-instanceha-role=true
 compute-9: compute-instanceha-role=true
(...)
~~~

~~~
pcs status
(...)\
Clone Set: compute-unfence-trigger-clone [compute-unfence-trigger]
     Started: [ compute-0 compute-1 compute-10 compute-11 compute-2
compute-3 compute-4 compute-5 compute-6 compute-7 compute-8 compute-9
]
     Stopped: [ controller-0 controller-1 controller-2 ]
(...)
~~~


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Dimitri Savineau thinks it's due to the following:
~~~
I never tried instance-ha but looking at the puppet-pacemaker module this could come from an issue on how the property is checked in pcs based on the hostname.
Because compute-1 is a subset of compute-10 and compute-11 and the code uses a "| grep hostname" [1] if compute-10 or compute-11 is configured before compute-1 then the property is not set.

[1] https://github.com/openstack/puppet-pacemaker/blob/master/lib/puppet/provider/pcmk_property/default.rb#L50
~~~

Comment 14 pkomarov 2018-12-11 22:52:25 UTC
Verified , 

On an OSP13 11compute IHA ,with compute-1 and compute-10 (same fix subset) as the test subjects for successfull deployment.
(undercloud) [stack@undercloud-0 ~]$ cat core_puddle_version 
2018-12-07.1(undercloud) [stack@undercloud-0 ~]$ 

#pcs status:

     Started: [ overcloud-novacomputeiha-0 overcloud-novacomputeiha-1 overcloud-novacomputeiha-10 overcloud-novacomputeiha-2 overcloud-novacomputeiha-3 overcloud-novacomputeiha-4 overcloud-novacomputeiha-5 overcloud-novacomputeiha-6 overcloud-novacomputeiha-7 overcloud-novacomputeiha-8 overcloud-novacomputeiha-9 ]
     Stopped: [ controller-0 ]


(undercloud) [stack@undercloud-0 ~]$  ansible compute -b -mshell -a'journalctl | grep instanceha-role 2>/dev/null|head -1'
 [WARNING]: Found both group and host with same name: undercloud

compute-1 | SUCCESS | rc=0 >>
Dec 11 22:09:03 overcloud-novacomputeiha-1 puppet-user[24188]: (/Stage[main]/Tripleo::Profile::Pacemaker::Compute_instanceha/Pacemaker::Property[compute-instanceha-role-node-property]/Pcmk_property[property-overcloud-novacomputeiha-1-compute-instanceha-role]/ensure) created

compute-3 | SUCCESS | rc=0 >>
Dec 11 22:09:12 overcloud-novacomputeiha-3 puppet-user[24202]: (/Stage[main]/Tripleo::Profile::Pacemaker::Compute_instanceha/Pacemaker::Property[compute-instanceha-role-node-property]/Pcmk_property[property-overcloud-novacomputeiha-3-compute-instanceha-role]/ensure) created

compute-4 | SUCCESS | rc=0 >>
Dec 11 22:09:02 overcloud-novacomputeiha-4 puppet-user[24128]: (/Stage[main]/Tripleo::Profile::Pacemaker::Compute_instanceha/Pacemaker::Property[compute-instanceha-role-node-property]/Pcmk_property[property-overcloud-novacomputeiha-4-compute-instanceha-role]/ensure) created

compute-2 | SUCCESS | rc=0 >>
Dec 11 22:08:52 overcloud-novacomputeiha-2 puppet-user[23984]: (/Stage[main]/Tripleo::Profile::Pacemaker::Compute_instanceha/Pacemaker::Property[compute-instanceha-role-node-property]/Pcmk_property[property-overcloud-novacomputeiha-2-compute-instanceha-role]/ensure) created

compute-0 | SUCCESS | rc=0 >>
Dec 11 22:09:12 overcloud-novacomputeiha-0 puppet-user[24000]: (/Stage[main]/Tripleo::Profile::Pacemaker::Compute_instanceha/Pacemaker::Property[compute-instanceha-role-node-property]/Pcmk_property[property-overcloud-novacomputeiha-0-compute-instanceha-role]/ensure) created

compute-6 | SUCCESS | rc=0 >>
Dec 11 22:09:07 overcloud-novacomputeiha-6 puppet-user[24242]: (/Stage[main]/Tripleo::Profile::Pacemaker::Compute_instanceha/Pacemaker::Property[compute-instanceha-role-node-property]/Pcmk_property[property-overcloud-novacomputeiha-6-compute-instanceha-role]/ensure) created

compute-5 | SUCCESS | rc=0 >>
Dec 11 22:08:58 overcloud-novacomputeiha-5 puppet-user[24166]: (/Stage[main]/Tripleo::Profile::Pacemaker::Compute_instanceha/Pacemaker::Property[compute-instanceha-role-node-property]/Pcmk_property[property-overcloud-novacomputeiha-5-compute-instanceha-role]/ensure) created

compute-7 | SUCCESS | rc=0 >>
Dec 11 22:09:12 overcloud-novacomputeiha-7 puppet-user[41842]: (/Stage[main]/Tripleo::Profile::Pacemaker::Compute_instanceha/Pacemaker::Property[compute-instanceha-role-node-property]/Pcmk_property[property-overcloud-novacomputeiha-7-compute-instanceha-role]/ensure) created

compute-9 | SUCCESS | rc=0 >>
Dec 11 22:09:05 overcloud-novacomputeiha-9 puppet-user[24135]: (/Stage[main]/Tripleo::Profile::Pacemaker::Compute_instanceha/Pacemaker::Property[compute-instanceha-role-node-property]/Pcmk_property[property-overcloud-novacomputeiha-9-compute-instanceha-role]/ensure) created

compute-8 | SUCCESS | rc=0 >>
Dec 11 22:09:12 overcloud-novacomputeiha-8 puppet-user[41965]: (/Stage[main]/Tripleo::Profile::Pacemaker::Compute_instanceha/Pacemaker::Property[compute-instanceha-role-node-property]/Pcmk_property[property-overcloud-novacomputeiha-8-compute-instanceha-role]/ensure) created

compute-10 | SUCCESS | rc=0 >>
Dec 11 22:08:37 overcloud-novacomputeiha-10 puppet-user[23967]: (/Stage[main]/Tripleo::Profile::Pacemaker::Compute_instanceha/Pacemaker::Property[compute-instanceha-role-node-property]/Pcmk_property[property-overcloud-novacomputeiha-10-compute-instanceha-role]/ensure) created

Comment 18 errata-xmlrpc 2019-01-16 17:55:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0068


Note You need to log in before you can comment on or make changes to this bug.