Bug 1654432
| Summary: | [OSP13] puppet-pacemaker instanceha does does not work correctly with more than 10 nodes due to regex issues | |||
|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Andreas Karis <akaris> | |
| Component: | puppet-pacemaker | Assignee: | Michele Baldessari <michele> | |
| Status: | CLOSED ERRATA | QA Contact: | pkomarov | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | urgent | |||
| Version: | 13.0 (Queens) | CC: | chjones, dabarzil, jjoyce, jschluet, mariel, michele, pkomarov, slinaber, tvignaud | |
| Target Milestone: | z4 | Keywords: | Triaged, ZStream | |
| Target Release: | 13.0 (Queens) | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | puppet-pacemaker-0.7.2-0.20180423212255.el7ost | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1655217 (view as bug list) | Environment: | ||
| Last Closed: | 2019-01-16 17:55:29 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1655217 | |||
Verified , 
On an OSP13 11compute IHA ,with compute-1 and compute-10 (same fix subset) as the test subjects for successfull deployment.
(undercloud) [stack@undercloud-0 ~]$ cat core_puddle_version 
2018-12-07.1(undercloud) [stack@undercloud-0 ~]$ 
#pcs status:
     Started: [ overcloud-novacomputeiha-0 overcloud-novacomputeiha-1 overcloud-novacomputeiha-10 overcloud-novacomputeiha-2 overcloud-novacomputeiha-3 overcloud-novacomputeiha-4 overcloud-novacomputeiha-5 overcloud-novacomputeiha-6 overcloud-novacomputeiha-7 overcloud-novacomputeiha-8 overcloud-novacomputeiha-9 ]
     Stopped: [ controller-0 ]
(undercloud) [stack@undercloud-0 ~]$  ansible compute -b -mshell -a'journalctl | grep instanceha-role 2>/dev/null|head -1'
 [WARNING]: Found both group and host with same name: undercloud
compute-1 | SUCCESS | rc=0 >>
Dec 11 22:09:03 overcloud-novacomputeiha-1 puppet-user[24188]: (/Stage[main]/Tripleo::Profile::Pacemaker::Compute_instanceha/Pacemaker::Property[compute-instanceha-role-node-property]/Pcmk_property[property-overcloud-novacomputeiha-1-compute-instanceha-role]/ensure) created
compute-3 | SUCCESS | rc=0 >>
Dec 11 22:09:12 overcloud-novacomputeiha-3 puppet-user[24202]: (/Stage[main]/Tripleo::Profile::Pacemaker::Compute_instanceha/Pacemaker::Property[compute-instanceha-role-node-property]/Pcmk_property[property-overcloud-novacomputeiha-3-compute-instanceha-role]/ensure) created
compute-4 | SUCCESS | rc=0 >>
Dec 11 22:09:02 overcloud-novacomputeiha-4 puppet-user[24128]: (/Stage[main]/Tripleo::Profile::Pacemaker::Compute_instanceha/Pacemaker::Property[compute-instanceha-role-node-property]/Pcmk_property[property-overcloud-novacomputeiha-4-compute-instanceha-role]/ensure) created
compute-2 | SUCCESS | rc=0 >>
Dec 11 22:08:52 overcloud-novacomputeiha-2 puppet-user[23984]: (/Stage[main]/Tripleo::Profile::Pacemaker::Compute_instanceha/Pacemaker::Property[compute-instanceha-role-node-property]/Pcmk_property[property-overcloud-novacomputeiha-2-compute-instanceha-role]/ensure) created
compute-0 | SUCCESS | rc=0 >>
Dec 11 22:09:12 overcloud-novacomputeiha-0 puppet-user[24000]: (/Stage[main]/Tripleo::Profile::Pacemaker::Compute_instanceha/Pacemaker::Property[compute-instanceha-role-node-property]/Pcmk_property[property-overcloud-novacomputeiha-0-compute-instanceha-role]/ensure) created
compute-6 | SUCCESS | rc=0 >>
Dec 11 22:09:07 overcloud-novacomputeiha-6 puppet-user[24242]: (/Stage[main]/Tripleo::Profile::Pacemaker::Compute_instanceha/Pacemaker::Property[compute-instanceha-role-node-property]/Pcmk_property[property-overcloud-novacomputeiha-6-compute-instanceha-role]/ensure) created
compute-5 | SUCCESS | rc=0 >>
Dec 11 22:08:58 overcloud-novacomputeiha-5 puppet-user[24166]: (/Stage[main]/Tripleo::Profile::Pacemaker::Compute_instanceha/Pacemaker::Property[compute-instanceha-role-node-property]/Pcmk_property[property-overcloud-novacomputeiha-5-compute-instanceha-role]/ensure) created
compute-7 | SUCCESS | rc=0 >>
Dec 11 22:09:12 overcloud-novacomputeiha-7 puppet-user[41842]: (/Stage[main]/Tripleo::Profile::Pacemaker::Compute_instanceha/Pacemaker::Property[compute-instanceha-role-node-property]/Pcmk_property[property-overcloud-novacomputeiha-7-compute-instanceha-role]/ensure) created
compute-9 | SUCCESS | rc=0 >>
Dec 11 22:09:05 overcloud-novacomputeiha-9 puppet-user[24135]: (/Stage[main]/Tripleo::Profile::Pacemaker::Compute_instanceha/Pacemaker::Property[compute-instanceha-role-node-property]/Pcmk_property[property-overcloud-novacomputeiha-9-compute-instanceha-role]/ensure) created
compute-8 | SUCCESS | rc=0 >>
Dec 11 22:09:12 overcloud-novacomputeiha-8 puppet-user[41965]: (/Stage[main]/Tripleo::Profile::Pacemaker::Compute_instanceha/Pacemaker::Property[compute-instanceha-role-node-property]/Pcmk_property[property-overcloud-novacomputeiha-8-compute-instanceha-role]/ensure) created
compute-10 | SUCCESS | rc=0 >>
Dec 11 22:08:37 overcloud-novacomputeiha-10 puppet-user[23967]: (/Stage[main]/Tripleo::Profile::Pacemaker::Compute_instanceha/Pacemaker::Property[compute-instanceha-role-node-property]/Pcmk_property[property-overcloud-novacomputeiha-10-compute-instanceha-role]/ensure) created
    Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0068  | 
Description of problem: puppet-pacemaker instanceha does does not work correctly with more than 10 nodes due to regex issues ------------------------------------------------------------------------ Deploying instance-ha with 12 compute nodes. compute-1 always has issues for a deployment with 12 nodes. With 4 computes, it's o.k. We checked with 10 compute nodes as well, and 10 compute nodes are fine, too ~~~ pcs status (...) Clone Set: compute-unfence-trigger-clone [compute-unfence-trigger] Started: [ compute-0 compute-10 compute-11 compute-2 compute-3 compute-4 compute-5 compute-6 compute-7 compute-8 compute-9 ] Stopped: [ compute-1 controller-0 controller-1 controller-2 ] (...) ~~~ ~~~ [root@compute-1 ~]# hiera -c /etc/puppet/hiera.yaml tripleo::instanceha true [root@compute-1 ~]# [root@compute-1 ~]# journalctl | grep instanceha-role [root@compute-1 ~]# ~~~ Vs. ~~~ [root@compute-8 ~]# journalctl | grep instanceha-role | head -1 Nov 27 15:10:50 compute-8 puppet-user[33673]: (/Stage[main]/Tripleo::Profile::Pacemaker::Compute_instanceha/Pacemaker::Property[compute-instanceha-role-node-property]/Pcmk_property[property-compute-8-compute-instanceha-role]/ensure) created [root@compute-8 ~]# ~~~ ~~~ [root@compute-1 ~]# cat /etc/puppet/modules/tripleo/manifests/profile/pacemaker/compute_instanceha.pp # == Class: tripleo::profile::pacemaker::compute_instanceha # # Configures Compute nodes for Instance HA # # === Parameters: # # [*step*] # (Optional) The current step in deployment. See tripleo-heat-templates # for more details. # Defaults to hiera('step') # # [*pcs_tries*] # (Optional) The number of times pcs commands should be retried. # Defaults to hiera('pcs_tries', 20) # # [*enable_instanceha*] # (Optional) Boolean driving the Instance HA controlplane configuration # Defaults to false # class tripleo::profile::pacemaker::compute_instanceha ( $step = Integer(hiera('step')), $pcs_tries = hiera('pcs_tries', 20), $enable_instanceha = hiera('tripleo::instanceha', false), ) { if $step >= 2 and $enable_instanceha { pacemaker::property { 'compute-instanceha-role-node-property': property => 'compute-instanceha-role', value => true, tries => $pcs_tries, node => $::hostname, } } } ~~~ ~~~ Nov 27 15:10:50 [39123] controller-0 cib: info: cib_perform_op: ++ <nvpair id="nodes-compute-8-compute-instanceha-role" name="compute-instanceha-role" value="true"/> Nov 27 15:10:50 [39123] controller-0 cib: info: cib_perform_op: ++ <nvpair id="nodes-compute-3-compute-instanceha-role" name="compute-instanceha-role" value="true"/> Nov 27 15:10:56 [39123] controller-0 cib: info: cib_perform_op: ++ <nvpair id="nodes-compute-6-compute-instanceha-role" name="compute-instanceha-role" value="true"/> Nov 27 15:10:57 [39123] controller-0 cib: info: cib_perform_op: ++ <nvpair id="nodes-compute-4-compute-instanceha-role" name="compute-instanceha-role" value="true"/> Nov 27 15:11:03 [39123] controller-0 cib: info: cib_perform_op: ++ <nvpair id="nodes-compute-2-compute-instanceha-role" name="compute-instanceha-role" value="true"/> Nov 27 15:11:08 [39123] controller-0 cib: info: cib_perform_op: ++ <nvpair id="nodes-compute-10-compute-instanceha-role" name="compute-instanceha-role" value="true"/> Nov 27 15:11:08 [39123] controller-0 cib: info: cib_perform_op: ++ <nvpair id="nodes-compute-7-compute-instanceha-role" name="compute-instanceha-role" value="true"/> Nov 27 15:11:09 [39123] controller-0 cib: info: cib_perform_op: ++ <nvpair id="nodes-compute-5-compute-instanceha-role" name="compute-instanceha-role" value="true"/> Nov 27 15:11:10 [39123] controller-0 cib: info: cib_perform_op: ++ <nvpair id="nodes-compute-9-compute-instanceha-role" name="compute-instanceha-role" value="true"/> Nov 27 15:11:17 [39123] controller-0 cib: info: cib_perform_op: ++ <nvpair id="nodes-compute-11-compute-instanceha-role" name="compute-instanceha-role" value="true"/> Nov 27 15:11:20 [39123] controller-0 cib: info: cib_perform_op: ++ <nvpair id="nodes-compute-0-compute-instanceha-role" name="compute-instanceha-role" value="true"/> Nov 27 15:23:55 [39123] controller-0 cib: info: cib_perform_op: ++ <expression attribute="compute-instanceha-role" id="location-compute-unfence-trigger-clone-rule-expr" operation="ne" value="true"/> Nov 27 15:24:01 [39123] controller-0 cib: info: cib_perform_op: ++ <expression attribute="compute-instanceha-role" id="location-nova-evacuate-rule-expr" operation="eq" value="true"/> (overcloud-Queens) [root@controller-0 ~]# cibadmin -Q | grep compute-instanceha-role <nvpair id="nodes-compute-3-compute-instanceha-role" name="compute-instanceha-role" value="true"/> <nvpair id="nodes-compute-8-compute-instanceha-role" name="compute-instanceha-role" value="true"/> <nvpair id="nodes-compute-6-compute-instanceha-role" name="compute-instanceha-role" value="true"/> <nvpair id="nodes-compute-4-compute-instanceha-role" name="compute-instanceha-role" value="true"/> <nvpair id="nodes-compute-2-compute-instanceha-role" name="compute-instanceha-role" value="true"/> <nvpair id="nodes-compute-5-compute-instanceha-role" name="compute-instanceha-role" value="true"/> <nvpair id="nodes-compute-7-compute-instanceha-role" name="compute-instanceha-role" value="true"/> <nvpair id="nodes-compute-10-compute-instanceha-role" name="compute-instanceha-role" value="true"/> <nvpair id="nodes-compute-9-compute-instanceha-role" name="compute-instanceha-role" value="true"/> <nvpair id="nodes-compute-11-compute-instanceha-role" name="compute-instanceha-role" value="true"/> <nvpair id="nodes-compute-0-compute-instanceha-role" name="compute-instanceha-role" value="true"/> <expression attribute="compute-instanceha-role" id="location-compute-unfence-trigger-clone-rule-expr" operation="ne" value="true"/> <expression attribute="compute-instanceha-role" id="location-nova-evacuate-rule-expr" operation="eq" value="true"/> (overcloud-Queens) [root@controller-0 ~]# ~~~ And I can "fix" this manually by running: ~~~ root@compute-1 ~]# pcs property set --node compute-1 compute-instanceha-role=true [root@compute-1 ~]# pcs property show (...) compute-0: compute-instanceha-role=true compute-1: compute-instanceha-role=true compute-10: compute-instanceha-role=true compute-11: compute-instanceha-role=true compute-2: compute-instanceha-role=true compute-3: compute-instanceha-role=true compute-4: compute-instanceha-role=true compute-5: compute-instanceha-role=true compute-6: compute-instanceha-role=true compute-7: compute-instanceha-role=true compute-8: compute-instanceha-role=true compute-9: compute-instanceha-role=true (...) ~~~ ~~~ pcs status (...)\ Clone Set: compute-unfence-trigger-clone [compute-unfence-trigger] Started: [ compute-0 compute-1 compute-10 compute-11 compute-2 compute-3 compute-4 compute-5 compute-6 compute-7 compute-8 compute-9 ] Stopped: [ controller-0 controller-1 controller-2 ] (...) ~~~ Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: Dimitri Savineau thinks it's due to the following: ~~~ I never tried instance-ha but looking at the puppet-pacemaker module this could come from an issue on how the property is checked in pcs based on the hostname. Because compute-1 is a subset of compute-10 and compute-11 and the code uses a "| grep hostname" [1] if compute-10 or compute-11 is configured before compute-1 then the property is not set. [1] https://github.com/openstack/puppet-pacemaker/blob/master/lib/puppet/provider/pcmk_property/default.rb#L50 ~~~