Bug 1458153

Summary: pcs 0.9.158 breaks resource creation with meta-attribute
Product: Red Hat Enterprise Linux 7 Reporter: Damien Ciabrini <dciabrin>
Component: pcsAssignee: Ivan Devat <idevat>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.4CC: abeekhof, cfeist, chjones, cluster-maint, dciabrin, idevat, jeckersb, jpokorny, michele, mlisik, omular, tojeline
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pcs-0.9.158-4.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 18:26:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
example of a good cib
none
example of a bad cib
none
proposed fix none

Description Damien Ciabrini 2017-06-02 07:27:13 UTC
Created attachment 1284335 [details]
example of a good cib

Description of problem:
Starting pcs 0.9.158-1, I'm not able anymore to create a 3-node galera resource when I call:

pcs resource create galera galera enable_creation=true wsrep_cluster_address='gcomm://ra1,ra2,ra3' --master meta master-max=3

Only one out of the 3 resources is promoted to master state. Looking at the cib generated after the pcs call, this is because the meta attributes section is now being generated under the <primitive> tag, while it used to be generated right under the <master> tag.


Version-Release number of selected component (if applicable):
pcs-0.9.158-3.el7

How reproducible:
always

Steps to Reproduce:
1. create a master/slave resource with some meta_attributes
pcs resource create galera galera enable_creation=true wsrep_cluster_address='gcomm://ra1,ra2,ra3' --master meta master-max=3

Actual results:
meta attribute being generated at the wrong place in the CIB 


Expected results:
meta attribute should be generated like they were in pcs 0.9.152 for pacemaker to pick them up

Additional info:
cib files attached to show differences between pcs-0.9.152-10.xml and pcs-0.9.158-1

Comment 2 Damien Ciabrini 2017-06-02 07:27:52 UTC
Created attachment 1284336 [details]
example of a bad cib

Comment 3 Tomas Jelinek 2017-06-02 08:30:51 UTC
This is actually result of a fixed bug: bz1378107

Putting "meta" after "--master" is not and was not a way to set master's meta attributes. The correct way is:
pcs resource create galera galera enable_creation=true wsrep_cluster_address='gcomm://ra1,ra2,ra3' --master master-max=3
or
pcs resource create galera galera enable_creation=true wsrep_cluster_address='gcomm://ra1,ra2,ra3' master master-max=3
i.e. no "meta" after "--master"
This is how it was documented before 0.9.158. Where did you get the "meta" from?

It was "working" previously because when --master was specified, ALL meta attributes were put to master. Meaning it was broken:
pcs resource create dummy ocf:pacemaker:dummy meta a=b --master
put a=b to the master instead to the primitive.

Comment 4 Damien Ciabrini 2017-06-02 08:41:34 UTC
(In reply to Tomas Jelinek from comment #3)
> This is actually result of a fixed bug: bz1378107
> 
> Putting "meta" after "--master" is not and was not a way to set master's
> meta attributes. The correct way is:
> pcs resource create galera galera enable_creation=true
> wsrep_cluster_address='gcomm://ra1,ra2,ra3' --master master-max=3
> or
> pcs resource create galera galera enable_creation=true
> wsrep_cluster_address='gcomm://ra1,ra2,ra3' master master-max=3
> i.e. no "meta" after "--master"
> This is how it was documented before 0.9.158. Where did you get the "meta"
> from?

This is basically how we pass meta attributes when we create resources in all version of RHOSP. 

> 
> It was "working" previously because when --master was specified, ALL meta
> attributes were put to master. Meaning it was broken:
> pcs resource create dummy ocf:pacemaker:dummy meta a=b --master
> put a=b to the master instead to the primitive.

Comment 5 Damien Ciabrini 2017-06-02 11:21:44 UTC
 it was documented before 0.9.158. Where did you get the "meta"
> > from?
> 
> This is basically how we pass meta attributes when we create resources in
> all version of RHOSP. 
> 

More specifically the current pcs command for e.g. galera looks like:

/usr/sbin/pcs -f /var/lib/pacemaker/cib/puppet-cib-backup20170602-20225-1sjl7vp resource create galera ocf:heartbeat:galera additional_parameters='--open-files-limit=16384' enable_creation=true wsrep_cluster_address='gcomm://overcloud-controller-0,overcloud-controller-1,overcloud-controller-2' meta master-max=3 ordered=true op promote timeout=300s on-fail=block --master --disabled

Comment 6 Tomas Jelinek 2017-06-02 11:31:33 UTC
We cannot revert the fix as it is a part of the "pcs resource create" command overhaul. We will instead provide an additional patch which will restore the old behavior of the --master flag.

Comment 7 Jan Pokorný [poki] 2017-06-02 16:56:44 UTC
Looking at how current clufter reverse-engineers the CIB:

$ ./run-dev cib2pcscmd -s tests/bz1458153 2>/dev/null
> #!/bin/sh
> # sequence generated on 2017-06-02 18:54:22 with: clufter 0.75.1a
> # invoked as: ['./run-dev', 'cib2pcscmd', '-s', 'tests/bz1458153']
> # targeting system: ('linux', 'fedora', '22', 'Twenty Two')
> # using interpreter: CPython 2.7.10
> pcs cluster cib tmp-cib.xml
> pcs -f tmp-cib.xml property set stonith-enabled=false
> pcs -f tmp-cib.xml resource create galera ocf:heartbeat:galera \
>   enable_creation=true wsrep_cluster_address=gcomm://ra1,ra2,ra3 \
>   op start interval=0s timeout=120 stop interval=0s timeout=120 \
>   monitor interval=20 timeout=30 monitor interval=10 role=Master \
>   timeout=30 monitor interval=30 role=Slave timeout=30 promote \
>   interval=0s timeout=300 demote interval=0s timeout=120
> pcs -f tmp-cib.xml resource master galera-master galera master-max=3
> pcs cluster cib-push tmp-cib.xml --config

two-phased approach seems the safest ... why it is not good enough?

Comment 8 Damien Ciabrini 2017-06-03 08:29:34 UTC
(In reply to Jan Pokorný from comment #7)

> >   interval=0s timeout=300 demote interval=0s timeout=120
> > pcs -f tmp-cib.xml resource master galera-master galera master-max=3
> > pcs cluster cib-push tmp-cib.xml --config
> 
> two-phased approach seems the safest ... why it is not good enough?

The problem that we have is that the client needs to accommodate to the new syntax for the current behavior (master-max=3 on galera-master) to keep working. This is some kind of unexpected breakage.

Comment 10 Ivan Devat 2017-06-06 10:05:29 UTC
Created attachment 1285327 [details]
proposed fix

Comment 11 Ivan Devat 2017-06-06 11:05:43 UTC
After Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.158-4.el7.x86_64

The flag `--master` now behave as before fix in bz1378107. The keyword `master` keeps the behaviour as in bz1378107. So behaviour of `--master` differs from  `master` now. It allows to addapt dependent scripts to right way gradually.

> 1) options after the keyword meta are interpreted as master meta atributes when flag --master is there

[vm-rhel72-1 ~] $ pcs resource create R ocf:heartbeat:Dummy meta a=b --master c=d
Warning: flag '--master' is deprecated, use keyword 'master' instead (see the usage)
[vm-rhel72-1 ~] $ pcs cluster cib|grep 'id="R-master"' -A12
      <master id="R-master">
        <primitive class="ocf" id="R" provider="heartbeat" type="Dummy">
          <operations>
            <op id="R-monitor-interval-10" interval="10" name="monitor" timeout="20"/>
            <op id="R-start-interval-0s" interval="0s" name="start" timeout="20"/>
            <op id="R-stop-interval-0s" interval="0s" name="stop" timeout="20"/>
          </operations>
        </primitive>
        <meta_attributes id="R-master-meta_attributes">
          <nvpair id="R-master-meta_attributes-a" name="a" value="b"/>
          <nvpair id="R-master-meta_attributes-c" name="c" value="d"/>
        </meta_attributes>
      </master>

[vm-rhel72-1 ~] $ pcs resource delete R

> 2) options after --master are interpreted as instance attributes

[vm-rhel72-1 ~] $ pcs resource create R ocf:heartbeat:Dummy --master fake=abc
Warning: flag '--master' is deprecated, use keyword 'master' instead (see the usage)
[vm-rhel72-1 ~] $ pcs cluster cib|grep 'id="R-master"' -A11
      <master id="R-master">
        <primitive class="ocf" id="R" provider="heartbeat" type="Dummy">
          <instance_attributes id="R-instance_attributes">
            <nvpair id="R-instance_attributes-fake" name="fake" value="abc"/>
          </instance_attributes>
          <operations>
            <op id="R-monitor-interval-10" interval="10" name="monitor" timeout="20"/>
            <op id="R-start-interval-0s" interval="0s" name="start" timeout="20"/>
            <op id="R-stop-interval-0s" interval="0s" name="stop" timeout="20"/>
          </operations>
        </primitive>
      </master>

[vm-rhel72-1 ~] $ pcs resource delete R

> 3) options after flag --master in command "pcs resource create" interpreted as operation options

[vm-rhel72-1 ~] $ pcs resource create R ocf:heartbeat:Dummy op monitor interval=30s --master timeout=5s
Warning: flag '--master' is deprecated, use keyword 'master' instead (see the usage)
[vm-rhel72-1 ~] $ pcs cluster cib|grep 'id="R-master"' -A8
      <master id="R-master">
        <primitive class="ocf" id="R" provider="heartbeat" type="Dummy">
          <operations>
            <op id="R-monitor-interval-30s" interval="30s" name="monitor" timeout="5s"/>
            <op id="R-start-interval-0s" interval="0s" name="start" timeout="20"/>
            <op id="R-stop-interval-0s" interval="0s" name="stop" timeout="20"/>
          </operations>
        </primitive>
      </master>

Comment 14 errata-xmlrpc 2017-08-01 18:26:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1958