1603613 – Cluster nodes should send newer schemas to older Pacemaker Remote nodes

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1603613 - Cluster nodes should send newer schemas to older Pacemaker Remote nodes

Summary: Cluster nodes should send newer schemas to older Pacemaker Remote nodes

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	pacemaker
Sub Component:
Version:	7.6
Hardware:	All
OS:	Linux
Priority:	high
Severity:	medium
Target Milestone:	rc
Target Release:	7.9
Assignee:	Ken Gaillot
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1595356
TreeView+	depends on / blocked

Reported:	2018-07-19 18:02 UTC by Michele Baldessari
Modified:	2020-06-03 20:56 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-06-03 20:56:37 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Michele Baldessari 2018-07-19 18:02:30 UTC

Description of problem:
I think we kind of saw the same issue when we had the 1.1.16-1.1.18 rebase, but I was playing with pcmk-2.0 (on rhel7) and started deploying a cluster with bundles (that contained an older version of the cluster stack: 1.1.19.

In theory upgrading the cluster first and then the bundles is the suggested way,
but if the schema differs between the two versions we hit the following issue:

    So pcmk2.0 on cluster with pcmk-1.1.19 in the bundles seems broken:
    Jul 19 17:33:39 controller-0 sudo[40425]: heat-admin : TTY=pts/0 ; PWD=/home/heat-admin ; USER=root ; COMMAND=/bin/bash
    Jul 19 17:33:49 controller-0 dockerd-current[19102]: Debug: backup_cib: /usr/sbin/pcs cluster cib /var/lib/pacemaker/cib/puppet-cib-backup20180719-8-phqecy returned
    Jul 19 17:33:49 controller-0 dockerd-current[19102]: Debug: try 10/20: /usr/sbin/pcs -f /var/lib/pacemaker/cib/puppet-cib-backup20180719-8-phqecy property set stonith-enabled=false
    Jul 19 17:33:51 controller-0 dockerd-current[19102]: Debug: Error: Error: unable to get cib
    Jul 19 17:33:51 controller-0 dockerd-current[19102]: Error: unable to get cib
    Running: /usr/sbin/cibadmin -l -Q
    Return Value: 203
    --Debug Output Start--
    cibadmin: Connection to local file 'cib.xml' failed: Update does not conform to the configured schema
    Signon to CIB failed: Update does not conform to the configured schema
    Init failed, could not perform requested operations

The problem is the same even without calling pcs inside a (very privileged) container. A simple rabbitmq RA call to cibadmin -Q from inside a 1.1.19 bundle will fail like above.

Comment 2 Andrew Beekhof 2018-07-25 02:38:08 UTC

This needs to work otherwise bundles are basically useless.

Comment 3 Ken Gaillot 2018-07-25 14:32:15 UTC

(In reply to Andrew Beekhof from comment #2)
> This needs to work otherwise bundles are basically useless.

Not entirely: the lrmd (remote daemon) doesn't care about the CIB, so it's only command-line tools (most problematically those used by resource agents) that can be affected.

An older remote won't have the XSLTs/RNGs needed to validate a newer CIB, not to mention any library or command-line logic that might be needed to deal with any new features.

The same problem could actually occur with cluster nodes in a mixed-version cluster, but they're guaranteed to understand the live CIB, so only on-disk CIBs upgraded on a newer cluster node and copied to an older cluster node could run into the issue (highly unlikely and not particularly important). Remotes are problematic because they might not understand the live CIB or disk copies of it.

I'm surprised this hasn't come up before, and I don't see a good way to deal with it.

Perhaps when the cluster connects to the remote daemon, the remote daemon could send its highest known schema version, and the cluster could reply with any newer XSLTs/RNGs. The remote would save them to a temporary schema storage area, and all the relevant library code would need to check that area when validating. This might mostly work, but there would be problems e.g. when using CIB_file when not connected to the cluster, and when the command-line tools need some new logic about the relevant schema features.

A harsher possibility would be to have the cluster send the live CIB schema version when connecting, and have the remote daemon reject the connection if it doesn't understand the schema. That would prevent any possibility of problems, at the expense of severely restricting mixed-version clusters. (There's also the special case of the cluster nodes themselves being mixed-version, and upgrading the live CIB after the connection is made.)

Another convoluted approach might be to reimplement every command line feature that deals with the CIB as a daemon request, and have the CLI tools prefer the daemon request if connected (even for CIB_file), and use the local schemas only as a fallback.

Comment 4 Andrew Beekhof 2018-07-30 02:05:04 UTC

(In reply to Ken Gaillot from comment #3)
> (In reply to Andrew Beekhof from comment #2)
> > This needs to work otherwise bundles are basically useless.
> 
> Not entirely: the lrmd (remote daemon) doesn't care about the CIB, so it's
> only command-line tools (most problematically those used by resource agents)
> that can be affected.

Ie. any master/slave resources, of which galera and rabbit are critical examples.

> 
> An older remote won't have the XSLTs/RNGs needed to validate a newer CIB,
> not to mention any library or command-line logic that might be needed to
> deal with any new features.
> 
> The same problem could actually occur with cluster nodes in a mixed-version
> cluster, but they're guaranteed to understand the live CIB, 

I was thinking that this was part of the problem.

Pacemaker is obviously sane enough to prevent the cib config from being upgraded beyond what the DC (oldest node) understands.  I thought that we might need to add "version of any containers" to that logic.

However I also remembered that we implemented cibadmin --upgrade so that the cluster didnt need/try to update the config on its own.  Which is consistent with me not being able to find anywhere in Pacemaker that auto-upgrades the configuration once all nodes have been upgraded.

So I think the correct approach is: if you have old containers, don't update the config.

And the bug is: find out who is updating the config and make them stop

Comment 5 Andrew Beekhof 2018-07-30 03:06:51 UTC

More generally though, pacemaker was designed under the assumption that daemons and their client libraries were always of the same version. That is no longer true and as such all the APIs should implement a handshake (that the lrm proxy can pass on) to establish minimum and maximum versions of the API (_not_ CRM_FEATURE_SET) supported by each side.

Comment 6 Ken Gaillot 2018-07-30 17:19:48 UTC

(In reply to Andrew Beekhof from comment #4)
> However I also remembered that we implemented cibadmin --upgrade so that the
> cluster didnt need/try to update the config on its own.  Which is consistent
> with me not being able to find anywhere in Pacemaker that auto-upgrades the
> configuration once all nodes have been upgraded.
> 
> So I think the correct approach is: if you have old containers, don't update
> the config.
> 
> And the bug is: find out who is updating the config and make them stop

The DC always upgrades the CIB to the latest schema version it understands, just in memory (not on disk). The upgraded in-memory CIB is guaranteed to be understandable by all cluster nodes because their version will be greater than or equal to the DC's.

The DC's logic is based on its latest schema, and that logic is definitely incompatible across schema major version bumps, so we can't tell the DC not to upgrade across a schema major version change (e.g. a remote node that only understands 2.10 can't force a newer DC not to upgrade to 3.1).

Offhand, I'm not sure if that also applies to schema minor version upgrades. I suspect so, but if not, it would be theoretically possible to allow remotes to have a lower schema minor version as long as they support the DC's schema major version, and tell the DC not to upgrade beyond that.

However even that would be impractical. The cluster can't know what schema a remote understands until it connects, meaning it might be required to downgrade the CIB minor version after connecting to a remote node -- something that's currently impossible and if implemented (as a downgrade transform for every minor version change) could mean that a working cluster feature is suddenly disabled. Also the downgrade would have to be done as part of the remote handshake.

(In reply to Andrew Beekhof from comment #5)
> More generally though, pacemaker was designed under the assumption that
> daemons and their client libraries were always of the same version. That is
> no longer true and as such all the APIs should implement a handshake (that
> the lrm proxy can pass on) to establish minimum and maximum versions of the
> API (_not_ CRM_FEATURE_SET) supported by each side.

That's what I had in mind with the cluster sending the CIB schema version when connecting to the remote, and having the remote reject the connection if it doesn't understand that schema. Sending API versions might be a good idea, too, though I'd think a single number should be sufficient, maybe the minimum pacemaker version supported as an API client (or perhaps use the LRMD protocol version as a proxy for that).

But that could only help for upgrades between future versions that support the new handshake, not anything already released.

Also, it eliminates one of the proposed benefits of remote nodes, which is to run a legacy application in a container or VM with its required older OS (short of compiling an unsupported newer pacemaker for the older OS, which may not even be possible in some cases).

We'd basically be saying that rolling upgrades are supported only for cluster nodes, not any remote nodes: disable all remote/guest/bundle resources, do a rolling upgrade of the cluster nodes, then re-enable the remotes.

That seems harsh for something that won't be a problem in all cases. I.e. if users avoid using command-line tools on remotes during a rolling upgrade, and if the resource agents used on the remotes don't use any API commands that changed, they'll be fine.

We should probably still do a new remote handshake that includes more version information passed back and forth, but maybe we just log a warning for a mismatch, and update the documentation about rolling upgrades.

We can also look at specific command-line features that currently require CIB validation, and see if we can offload some of that to the daemons, to minimize the problem space.

Comment 7 Andrew Beekhof 2018-07-31 10:37:29 UTC

(In reply to Ken Gaillot from comment #6)
> (In reply to Andrew Beekhof from comment #4)
> > However I also remembered that we implemented cibadmin --upgrade so that the
> > cluster didnt need/try to update the config on its own.  Which is consistent
> > with me not being able to find anywhere in Pacemaker that auto-upgrades the
> > configuration once all nodes have been upgraded.
> > 
> > So I think the correct approach is: if you have old containers, don't update
> > the config.
> > 
> > And the bug is: find out who is updating the config and make them stop
> 
> The DC always upgrades the CIB to the latest schema version it understands,
> just in memory (not on disk). 

s/DC/PE/

Unless you've been making changes lately, only the PE upgrades it in memory (cli_config_update) but everyone talking over the CIB API (including pcs and remote clients) is getting the old version.
So I believe comment #4 stands.

> The upgraded in-memory CIB is guaranteed to be
> understandable by all cluster nodes because their version will be greater
> than or equal to the DC's.
> 
> The DC's logic is based on its latest schema, and that logic is definitely
> incompatible across schema major version bumps, so we can't tell the DC not
> to upgrade across a schema major version change (e.g. a remote node that
> only understands 2.10 can't force a newer DC not to upgrade to 3.1).
> 
> Offhand, I'm not sure if that also applies to schema minor version upgrades.
> I suspect so, but if not, it would be theoretically possible to allow
> remotes to have a lower schema minor version as long as they support the
> DC's schema major version, and tell the DC not to upgrade beyond that.
> 
> However even that would be impractical. The cluster can't know what schema a
> remote understands until it connects, meaning it might be required to
> downgrade the CIB minor version after connecting to a remote node --
> something that's currently impossible and if implemented (as a downgrade
> transform for every minor version change) could mean that a working cluster
> feature is suddenly disabled. Also the downgrade would have to be done as
> part of the remote handshake.
> 
> (In reply to Andrew Beekhof from comment #5)
> > More generally though, pacemaker was designed under the assumption that
> > daemons and their client libraries were always of the same version. That is
> > no longer true and as such all the APIs should implement a handshake (that
> > the lrm proxy can pass on) to establish minimum and maximum versions of the
> > API (_not_ CRM_FEATURE_SET) supported by each side.
> 
> That's what I had in mind with the cluster sending the CIB schema version
> when connecting to the remote, and having the remote reject the connection
> if it doesn't understand that schema. Sending API versions might be a good
> idea, too, though I'd think a single number should be sufficient, maybe the
> minimum pacemaker version supported as an API client (or perhaps use the
> LRMD protocol version as a proxy for that).
> 
> But that could only help for upgrades between future versions that support
> the new handshake, not anything already released.

Obviously, but thats not a reason not to do it.

> 
> Also, it eliminates one of the proposed benefits of remote nodes, which is
> to run a legacy application in a container or VM with its required older OS
> (short of compiling an unsupported newer pacemaker for the older OS, which
> may not even be possible in some cases).

Depends on how you implement it.

> 
> We'd basically be saying that rolling upgrades are supported only for
> cluster nodes, not any remote nodes: disable all remote/guest/bundle
> resources, do a rolling upgrade of the cluster nodes, then re-enable the
> remotes.
> 
> That seems harsh for something that won't be a problem in all cases. I.e. if
> users avoid using command-line tools on remotes during a rolling upgrade,
> and if the resource agents used on the remotes don't use any API commands
> that changed, they'll be fine.
> 
> We should probably still do a new remote handshake that includes more
> version information passed back and forth, but maybe we just log a warning
> for a mismatch, and update the documentation about rolling upgrades.
> 
> We can also look at specific command-line features that currently require
> CIB validation, and see if we can offload some of that to the daemons, to
> minimize the problem space.

Comment 8 Ken Gaillot 2018-07-31 16:47:40 UTC

(In reply to Andrew Beekhof from comment #7)
> (In reply to Ken Gaillot from comment #6)
> > > And the bug is: find out who is updating the config and make them stop
> > 
> > The DC always upgrades the CIB to the latest schema version it understands,
> > just in memory (not on disk). 
> 
> s/DC/PE/
> 
> Unless you've been making changes lately, only the PE upgrades it in memory
> (cli_config_update) but everyone talking over the CIB API (including pcs and
> remote clients) is getting the old version.
> So I believe comment #4 stands.

Oh, right. We do recommend an on-disk CIB upgrade before and after upgrading from pacemaker 1.x to pacemaker 2.x, since some of the pre-1.0 syntax is rejected by 2, and the 2 transform is heavyweight.

Michele, does your process include a pcs cluster cib-upgrade at any point? If so, that should be done once, after all nodes (cluster, remote, and bundle) are upgraded.

pcs can also trigger an on-disk upgrade itself, but only if newer features are used, so that's unlikely here.

> > That's what I had in mind with the cluster sending the CIB schema version
> > when connecting to the remote, and having the remote reject the connection
> > if it doesn't understand that schema. Sending API versions might be a good
> > idea, too, though I'd think a single number should be sufficient, maybe the
> > minimum pacemaker version supported as an API client (or perhaps use the
> > LRMD protocol version as a proxy for that).
> > 
> > But that could only help for upgrades between future versions that support
> > the new handshake, not anything already released.
> 
> Obviously, but thats not a reason not to do it.

Right, we can use this bz for that, but I wanted to be clear it won't affect the reported issue, just similar ones in the future. Even then, it won't make this behavior work, it will just log a better message. (I prefer that to rejecting the connection, since the combination will work with certain restrictions, which we can document.)

FYI Michele, we will not support rolling upgrades from RHEL 7 to RHEL 8 for cluster nodes (pacemaker upstream will support it, but corosync will change from 2 to 3+knet which does not). We are considering supporting RHEL 7 guests/remotes/bundles with RHEL 8 cluster nodes, but given this issue here, the restrictions will likely be: pcs should not be used on the older remotes (technically non-CIB commands might work, but no guarantees), and only resource agents that do not use CIB commands will be supported in the older remotes.

> > (In reply to Andrew Beekhof from comment #2)
> > > This needs to work otherwise bundles are basically useless.
> > 
> > Not entirely: the lrmd (remote daemon) doesn't care about the CIB, so it's
> > only command-line tools (most problematically those used by resource agents)
> > that can be affected.
> Ie. any master/slave resources, of which galera and rabbit are critical
> examples.

Getting back to this, I would think only setting permanent node attributes via crm_attribute/crm_master would be affected (RAs typically use transient attributes). Is that what these agents are doing, or are they using something else that needs the CIB? We could potentially add permanent attribute support to attrd, and have crm_attribute prefer that for live CIB changes.

Comment 9 Andrew Beekhof 2018-08-10 04:24:45 UTC

(In reply to Ken Gaillot from comment #8)
> (In reply to Andrew Beekhof from comment #7)
> > (In reply to Ken Gaillot from comment #6)
> > > > And the bug is: find out who is updating the config and make them stop
> > > 
> > > The DC always upgrades the CIB to the latest schema version it understands,
> > > just in memory (not on disk). 
> > 
> > s/DC/PE/
> > 
> > Unless you've been making changes lately, only the PE upgrades it in memory
> > (cli_config_update) but everyone talking over the CIB API (including pcs and
> > remote clients) is getting the old version.
> > So I believe comment #4 stands.
> 
> Oh, right. We do recommend an on-disk CIB upgrade before and after upgrading
> from pacemaker 1.x to pacemaker 2.x, since some of the pre-1.0 syntax is
> rejected by 2, and the 2 transform is heavyweight.

pre-1.0 isn't involved here thankfully

> 
> Michele, does your process include a pcs cluster cib-upgrade at any point?
> If so, that should be done once, after all nodes (cluster, remote, and
> bundle) are upgraded.
> 
> pcs can also trigger an on-disk upgrade itself, but only if newer features
> are used, so that's unlikely here.
> 
> > > That's what I had in mind with the cluster sending the CIB schema version
> > > when connecting to the remote, and having the remote reject the connection
> > > if it doesn't understand that schema. Sending API versions might be a good
> > > idea, too, though I'd think a single number should be sufficient, maybe the
> > > minimum pacemaker version supported as an API client (or perhaps use the
> > > LRMD protocol version as a proxy for that).
> > > 
> > > But that could only help for upgrades between future versions that support
> > > the new handshake, not anything already released.
> > 
> > Obviously, but thats not a reason not to do it.
> 
> Right, we can use this bz for that, but I wanted to be clear it won't affect
> the reported issue, just similar ones in the future. Even then, it won't
> make this behavior work, it will just log a better message. 

After a while, yes you'll just get the better message because you'll drop the compatibility code.  But it also means we can make changes less incompatibly (oh, you're talking x.y, that means I need to fiddle with it like so...).  

Dare one say the word "microversions"?

> (I prefer that
> to rejecting the connection, since the combination will work with certain
> restrictions, which we can document.)
> 
> FYI Michele, we will not support rolling upgrades from RHEL 7 to RHEL 8 for
> cluster nodes (pacemaker upstream will support it, but corosync will change
> from 2 to 3+knet which does not).

yep

> We are considering supporting RHEL 7
> guests/remotes/bundles with RHEL 8 cluster nodes, but given this issue here,
> the restrictions will likely be: pcs should not be used on the older remotes
> (technically non-CIB commands might work, but no guarantees), and only
> resource agents that do not use CIB commands will be supported in the older
> remotes.

for OSP's specific 7->8 needs, I dont think that will be a problem but we want to be very careful about 7/8.x -> 7/8.y

> 
> > > (In reply to Andrew Beekhof from comment #2)
> > > > This needs to work otherwise bundles are basically useless.
> > > 
> > > Not entirely: the lrmd (remote daemon) doesn't care about the CIB, so it's
> > > only command-line tools (most problematically those used by resource agents)
> > > that can be affected.
> > Ie. any master/slave resources, of which galera and rabbit are critical
> > examples.
> 
> Getting back to this, I would think only setting permanent node attributes
> via crm_attribute/crm_master would be affected (RAs typically use transient
> attributes). Is that what these agents are doing, or are they using
> something else that needs the CIB? We could potentially add permanent
> attribute support to attrd, and have crm_attribute prefer that for live CIB
> changes.

In this specific case, I believe permanent node attributes are all thats involved.
But I'd prefer to see a more comprehensive solution :)

Comment 10 Michele Baldessari 2018-09-07 07:54:17 UTC

(In reply to Ken Gaillot from comment #8)
> (In reply to Andrew Beekhof from comment #7)
> > (In reply to Ken Gaillot from comment #6)
> > > > And the bug is: find out who is updating the config and make them stop
> > > 
> > > The DC always upgrades the CIB to the latest schema version it understands,
> > > just in memory (not on disk). 
> > 
> > s/DC/PE/
> > 
> > Unless you've been making changes lately, only the PE upgrades it in memory
> > (cli_config_update) but everyone talking over the CIB API (including pcs and
> > remote clients) is getting the old version.
> > So I believe comment #4 stands.
> 
> Oh, right. We do recommend an on-disk CIB upgrade before and after upgrading
> from pacemaker 1.x to pacemaker 2.x, since some of the pre-1.0 syntax is
> rejected by 2, and the 2 transform is heavyweight.
> 
> Michele, does your process include a pcs cluster cib-upgrade at any point?
> If so, that should be done once, after all nodes (cluster, remote, and
> bundle) are upgraded.

ATM no, we do not have that in any of our upgrade steps.
 
> pcs can also trigger an on-disk upgrade itself, but only if newer features
> are used, so that's unlikely here.
> 
> > > That's what I had in mind with the cluster sending the CIB schema version
> > > when connecting to the remote, and having the remote reject the connection
> > > if it doesn't understand that schema. Sending API versions might be a good
> > > idea, too, though I'd think a single number should be sufficient, maybe the
> > > minimum pacemaker version supported as an API client (or perhaps use the
> > > LRMD protocol version as a proxy for that).
> > > 
> > > But that could only help for upgrades between future versions that support
> > > the new handshake, not anything already released.
> > 
> > Obviously, but thats not a reason not to do it.
> 
> Right, we can use this bz for that, but I wanted to be clear it won't affect
> the reported issue, just similar ones in the future. Even then, it won't
> make this behavior work, it will just log a better message. (I prefer that
> to rejecting the connection, since the combination will work with certain
> restrictions, which we can document.)
> 
> FYI Michele, we will not support rolling upgrades from RHEL 7 to RHEL 8 for
> cluster nodes (pacemaker upstream will support it, but corosync will change
> from 2 to 3+knet which does not). We are considering supporting RHEL 7
> guests/remotes/bundles with RHEL 8 cluster nodes, but given this issue here,
> the restrictions will likely be: pcs should not be used on the older remotes
> (technically non-CIB commands might work, but no guarantees), and only
> resource agents that do not use CIB commands will be supported in the older
> remotes.
> 
> > > (In reply to Andrew Beekhof from comment #2)
> > > > This needs to work otherwise bundles are basically useless.
> > > 
> > > Not entirely: the lrmd (remote daemon) doesn't care about the CIB, so it's
> > > only command-line tools (most problematically those used by resource agents)
> > > that can be affected.
> > Ie. any master/slave resources, of which galera and rabbit are critical
> > examples.
> 
> Getting back to this, I would think only setting permanent node attributes
> via crm_attribute/crm_master would be affected (RAs typically use transient
> attributes). Is that what these agents are doing, or are they using
> something else that needs the CIB? We could potentially add permanent
> attribute support to attrd, and have crm_attribute prefer that for live CIB
> changes.

Yeah so our RAs use both volatile attributes and non volatile ones.

So we're still up in the air as to how exactly our upgrade will look for the next release. So
not sure yet how we are affected by this incompatibility.

Comment 11 Ken Gaillot 2018-12-04 00:21:05 UTC

I don't think there is a general solution to this problem, but I think these two changes would cover most of the actual use cases, so this BZ will focus on them:

* cibadmin --query could take an option to skip the schema verification (maybe --force)

* pacemaker-attrd could be modified to handle permanent node attributes as well as transient attributes, and crm_attribute would go through pacemaker-attrd whenever possible. This would cover the case of resource agents using crm_attribute or crm_master to set permanent node attributes.

Comment 12 Patrik Hagara 2018-12-14 11:45:33 UTC

qa_ack+

In scenarios with pacemaker_remote nodes (running a version containing the fix for this BZ) where the full cluster nodes are running a newer pacemaker version with newer minor version of CIB XML schema:
  * it MUST be possible to successfully query the live CIB from remote nodes (via `cibadmin --query --no-verify-schema` or similar)
  * it MUST be possible to successfully set/modify/query transient or permanent node attributes from remote nodes using `crm_attribute` or `crm_master` (ie. from resource agents)

Comment 13 Ken Gaillot 2019-01-15 17:19:15 UTC

Bumping to 7.8 due to capacity constraints

Comment 14 Ken Gaillot 2019-01-15 21:24:59 UTC

Revisiting this, I missed something the first time around:

>    Jul 19 17:33:49 controller-0 dockerd-current[19102]: Debug: backup_cib: /usr/sbin/pcs cluster cib /var/lib/pacemaker/cib/puppet-cib-backup20180719-8-phqecy returned

The above command queried the live CIB and saved it to a file, and it succeeded. When querying the live CIB, the CIB daemon does the validation, so there is no problem. That also means resource agents calling crm_attribute shouldn't have any problems.

>    Jul 19 17:33:49 controller-0 dockerd-current[19102]: Debug: try 10/20: /usr/sbin/pcs -f /var/lib/pacemaker/cib/puppet-cib-backup20180719-8-phqecy property set stonith-enabled=false
...
>    cibadmin: Connection to local file 'cib.xml' failed: Update does not conform to the configured schema

This command then attempted to modify the *saved* CIB file. That doesn't go through the daemon, so the validation happens on the remote node, which fails since it doesn't have the necessary schema files.

So, the changes in Comment 11 are not relevant. Instead:

1. When pacemaker_remoted receives a new connection from the cluster, it does a handshake where it sends the highest schema version it's aware of, and the cluster sends back its highest schema version number along with any schemas and transforms that the remote doesn't have, which pacemaker_remoted would save to /var/lib/pacemaker/remote-schemas. If the cluster supports a lower schema version, which could happen if the cluster is downgraded or if different nodes in a mixed-version cluster connect to the remote node, the remote node should probably remove any newer schema files from /var/lib.

2. The libcrmcommon library would be modified to check the /var/lib location (in addition to the usual /usr/share location) when loading schemas.

That should allow tools such as cibadmin to work with CIB_file on older remote nodes, as long as the remote connection has ever been active. Limitations:

* There could still theoretically be problems if the newer tools or libraries have logic specific to the newer schemas. Most of the logic should be in the daemons, so this may not be a big problem.

* This would only help when both the older remote nodes and newer cluster nodes have this fix, i.e. it won't help with any already released versions, only going forward, but that would still be good enough to let 7.8 remote nodes / bundles work with 8.1 cluster nodes (or whenever this gets implemented).

* Unrelated to this bz, this schema problem would also apply to an administration host using the CIB_server method to talk to a newer cluster node, and this fix wouldn't help that case.

Comment 15 Ken Gaillot 2019-02-19 18:20:04 UTC

Clearing dev ack until 7.8 planning can be finalized

Comment 17 Michele Baldessari 2020-01-29 11:19:31 UTC

Yo Ken,

so re-reading comment#11, I think that approach would be more than fine with us. We're probably even fine with an env variable 'DO_NOT_VALIDATE_SCHEMAS' that would be set to true inside the containers and that would allow us to dump and push the CIB?

Let's chat about that next week ;)

cheers,
Michele

Comment 18 Michele Baldessari 2020-02-10 12:34:51 UTC

Hi Ken,

so as discussed in Brno we see this issue also when there is a small discrepancy between host and container. So for example: pacemaker-2.0.3-4.el8.1.x86_64 on the host and pacemaker-2.0.2-3.el8_1.2.x86_64 inside a container will also break. in the sense that commands just will stop working for us. The issue here is the dumping of the CIB.

- Live scenario inside container -> working
()[root@controller-0 /]$ pcs node attribute controller-0 foo=bar
()[root@controller-0 /]$

- When we dump the CIB -> broken
()[root@controller-0 /]$ pcs cluster cib cib.xml
()[root@controller-0 /]$ pcs -f cib.xml node attribute controller-0 foo=bar                                                                                                                                                                                                                                                                                                                                                                                                                                                 
Error: unable to set attribute foo
Error performing operation: Protocol not supported
Error setting foo=bar (section=nodes, set=nodes-1): Protocol not supported

And the example above has exactly the rpms as specified above.

Here are more infos with --debug:
()[root@controller-0 /]$ pcs  --debug -f cib.xml node attribute controller-0 foo=bar
Running: /usr/sbin/crm_attribute -t nodes --node controller-0 --name foo --update bar
Return Value: 76
--Debug Output Start--
Error performing operation: Protocol not supported
Error setting foo=bar (section=nodes, set=nodes-1): Protocol not supported
--Debug Output End--

Error: unable to set attribute foo
Error performing operation: Protocol not supported
Error setting foo=bar (section=nodes, set=nodes-1): Protocol not supported



Thanks for your help here,
Michele

Comment 20 Ken Gaillot 2020-03-28 01:25:51 UTC

(In reply to Michele Baldessari from comment #18)
> Hi Ken,
> 
> so as discussed in Brno we see this issue also when there is a small
> discrepancy between host and container. So for example:
> pacemaker-2.0.3-4.el8.1.x86_64 on the host and
> pacemaker-2.0.2-3.el8_1.2.x86_64 inside a container will also break. in the
> sense that commands just will stop working for us. The issue here is the
> dumping of the CIB.

This makes me think it might be related to the CRM_FEATURE_SET, which changed between those two versions even though the CIB schema version did not. I wouldn't expect that to affect the commands you ran but I'll have to investigate the code to see if maybe it does.

There may be a workaround until a fix is available: the CIB has a property "validate-with" that is usually set to a pacemaker schema version that can validate the CIB XML. However it is possible to disable schema validation by setting validate-with to "none". An upgrade could conceivably go like this:

1. SCHEMA=$(cibadmin --query | head -1 | sed -e 's/.* validate-with="\([^"]*\)" .*/\1/')

2. cibadmin --modify --xml-text '<cib validate-with="none"/>'

3. Do whatever you need to do on whatever nodes.

4. cibadmin --modify --xml-text "<cib validate-with=\"$SCHEMA\"/>"

I don't believe pcs has a way to query or set <cib> tag properties currently. We could file an RFE for that.

Note that with schema validation disabled, pcs would not be able to detect invalid configuration syntax, so this is not without risk.

Comment 22 Ken Gaillot 2020-06-03 20:56:37 UTC

This won't be ready in RHEL 7.9 time frame, so leaving as RHEL 8 only

Note You need to log in before you can comment on or make changes to this bug.