Bug 1404233 - Make pcs avoid a full CIB replacement when possible
Summary: Make pcs avoid a full CIB replacement when possible
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pcs
Version: 7.3
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Tomas Jelinek
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On: 1412309
Blocks: 1413046
TreeView+ depends on / blocked
 
Reported: 2016-12-13 12:56 UTC by Michele Baldessari
Modified: 2018-12-10 15:22 UTC (History)
12 users (show)

Fixed In Version: pcs-0.9.156-1.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1412309 1413046 (view as bug list)
Environment:
Last Closed: 2017-08-01 18:24:40 UTC


Attachments (Terms of Use)
proposed fix (7.84 KB, patch)
2017-01-11 14:56 UTC, Tomas Jelinek
no flags Details | Diff
additional fix (1017 bytes, patch)
2017-02-14 08:24 UTC, Tomas Jelinek
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:1958 normal SHIPPED_LIVE pcs bug fix and enhancement update 2017-08-01 18:09:47 UTC
Red Hat Bugzilla 1359057 None NEW [cli] pcs should provide enhanced, stackable, integrity-protecting "pcs cluster cib/cib-push" alternative 2019-06-19 14:31:42 UTC
Red Hat Bugzilla 1441673 None CLOSED Make pcs avoid a full CIB replacement 2019-06-19 14:31:42 UTC

Internal Links: 1359057 1441673

Description Michele Baldessari 2016-12-13 12:56:57 UTC
Description of problem:
(Bug opened after an email discussion with Ken, Tomas and Andrew)

Currently pcs does a full CIB replacement when adding a resource or a constraint.
So if I start with a cluster that has one resource:
[root@centos tests]# pcs status
...
Online: [ centos ]

Full list of resources:

 Clone Set: delay1-clone [delay1]
     Started: [ centos ]
...

I then make two copies of the CIB and add two separate resources to each CIB
[root@centos tests]# pcs cluster cib 1.xml
[root@centos tests]# pcs cluster cib 2.xml
[root@centos tests]# /usr/sbin/pcs -f 1.xml resource create delayfoo ocf:heartbeat:Delay op start timeout=200s stop timeout=200s monitor timeout=200s --clone interleave=true
[root@centos tests]# /usr/sbin/pcs -f 2.xml resource create delaybar ocf:heartbeat:Delay op start timeout=200s stop timeout=200s monitor timeout=200s --clone interleave=true

[root@centos tests]# pcs cluster cib-push --config 1.xml 
CIB updated
[root@centos tests]# pcs status
...
Online: [ centos ]

Full list of resources:

 Clone Set: delay1-clone [delay1]
     Started: [ centos ]
 Clone Set: delayfoo-clone [delayfoo]
     Stopped: [ centos ]
...

Now when I push the second CIB with a separate resource I see that it actually
replaces the CIB:
[root@centos tests]# pcs cluster cib-push --config 2.xml 
CIB updated
[root@centos tests]# pcs status
...
Online: [ centos ]

Full list of resources:

 Clone Set: delay1-clone [delay1]
     Started: [ centos ]
 Clone Set: delaybar-clone [delaybar]
     Stopped: [ centos ]
...

The expected result would be one of the following:
A) An error when trying to push 2.xml
B) Both delayfoo and delaybar created


Version-Release number of selected component (if applicable):
pcs-0.9.152-10.el7.centos.x86_64
pacemaker-1.1.15-11.el7_3.2.x86_64
corosync-2.4.0-4.el7.x86_64

Additional Info:
The reason for needing either an error or a "differential CIB update" is so that we can implement composable HA race-free in TripleO.

Comment 1 Chris Feist 2016-12-13 22:37:43 UTC
I talked with Andrew and I think we want to do two things here (at least to start).

Add a command to push only a diff between 2 cib versions, so the workflow would look like this:

1.  pcs cib file1
2.  cp file1 file1.orig
3.  pcs -f file1 <do stuff here>
4.  pcs cib-push file1 --diff-only=file1.orig

So we would only be sending the difference between the original and the new files instead of updating the entire CIB.


The other thing we can do is when we're not working with files we should only push diffs instead of the full CIB.  That way it should reduce race issues when two programs are updating different parts of the CIB.

There's a pacemaker tool that will give you the diff between two CIBs that is suitable (crm_diff).

Comment 4 Tomas Jelinek 2017-01-05 09:01:04 UTC
Just to make things clear. The real world usage does not involve using -f when running pcs. It was only used in the reproducer to show the issue. We aim to fix the issue where more that one change is done to the LIVE CIB in parallel.

Comment 12 Tomas Jelinek 2017-01-11 14:56:38 UTC
Created attachment 1239461 [details]
proposed fix

Comment 13 Tomas Jelinek 2017-01-11 15:02:53 UTC
Test:

pcs cluster cib > original.xml
cp original.xml updated.xml
pcs -f updated.xml resource create dummy dummy
pcs cluster cib-push updated.xml diff-against=original.xml

Verify that the resource was created in the cluster. If run in parallel all changes done are applied and no change gets lost due to it being overwritten by another.

Comment 14 Ken Gaillot 2017-01-11 17:30:39 UTC
Per discussion with Andrew Beekhof, there is a bug in pacemaker's crm_diff that will need to be fixed for this to work properly without needing to loop, so I will clone this for pacemaker.

Comment 16 Tomas Jelinek 2017-02-14 08:24:05 UTC
Created attachment 1250141 [details]
additional fix

Comment 17 Ivan Devat 2017-02-20 07:35:00 UTC
After Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.156-1.el7.x86_64

[vm-rhel72-1 ~] $ pcs resource
NO resources configured
[vm-rhel72-1 ~] $ pcs cluster cib > original.xml
[vm-rhel72-1 ~] $ cp original.xml updated.xml
[vm-rhel72-1 ~] $ cp original.xml updated2.xml

[vm-rhel72-1 ~] $ pcs -f updated.xml resource create A ocf:heartbeat:Dummy
[vm-rhel72-1 ~] $ pcs -f updated2.xml resource create B ocf:heartbeat:Dummy
[vm-rhel72-1 ~] $ pcs -f updated2.xml resource create C ocf:heartbeat:Dummy

[vm-rhel72-1 ~] $ pcs cluster cib-push updated.xml diff-against=original.xml
CIB updated
[vm-rhel72-1 ~] $ pcs cluster cib-push updated2.xml diff-against=original.xml
CIB updated

[vm-rhel72-1 ~] $ pcs resource
 B      (ocf::heartbeat:Dummy): Started vm-rhel72-2
 C      (ocf::heartbeat:Dummy): Started vm-rhel72-3
 A      (ocf::heartbeat:Dummy): Started vm-rhel72-2

Comment 21 Jan Pokorný [poki] 2017-05-02 16:04:45 UTC
See also reflection of this new provision in a "pull-modify-push
chaining workflow" proposal, [bug 1359057 comment 4].

Comment 22 errata-xmlrpc 2017-08-01 18:24:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1958


Note You need to log in before you can comment on or make changes to this bug.