Bug 1441332

Summary: pcs should take "referential integrity" seriously (as in ON DELETE CASCADE) when it makes sense to prevent unnecessary dead-locks
Product: Red Hat Enterprise Linux 7 Reporter: Jan Pokorný [poki] <jpokorny>
Component: pcsAssignee: Tomas Jelinek <tojeline>
Status: CLOSED WONTFIX QA Contact: cluster-qe <cluster-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.3CC: cfeist, cluster-maint, idevat, kgaillot, omular, tojeline
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-01-15 07:33:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jan Pokorný [poki] 2017-04-11 16:58:03 UTC
Straightforward show case:

# cat test3.cib 
> <cib admin_epoch="0" epoch="3" num_updates="0" validate-with="pacemaker-2.5">
>   <configuration>
>     <crm_config>
>       <cluster_property_set id="cib-opts">
>         <nvpair id="cib-opts-1" name="stonith-enabled" value="1"/>
>       </cluster_property_set>
>     </crm_config>
>     <nodes/>
>     <resources/>
>     <constraints/>
>     <acls>
>       <acl_role id="acl-1">
>         <acl_permission id="acl-1-deny" kind="deny" reference="cib-opts-1"/>
>       </acl_role>
>       <acl_target id="observer">
>         <role id="acl-1"/>
>       </acl_target>
>     </acls>
>   </configuration>
>   <status/>
> </cib>

# xmllint --noout --relaxng /usr/share/pacemaker/pacemaker-2.5.rng \
  test3.cib
> test3.cib validates

# pcs -f test3.cib property unset stonith-enabled
> Error: Unable to update cib
> Call failed: Update does not conform to the configured schema

Well, this is completely unuseful diagnostics!  As a beginner, I would
be dead-locked, feeling dire at that point.

Fortunately, I am not, and indeed:
# pcs --debug -f test3.cib property unset stonith-enabled 2>/dev/null \
  | sed -n '/--Debug Input Start/{:s;n;/--Debug Input End/d;p;bs}' \
  | tee > test4.cib
> <?xml version="1.0" ?><cib admin_epoch="0" epoch="3" num_updates="0" validate-with="pacemaker-2.5">
>   <configuration>
>     <crm_config>
>       <cluster_property_set id="cib-opts">
>         
>       </cluster_property_set>
>     </crm_config>
>     <nodes/>
>     <resources/>
>     <constraints/>
>     <acls>
>       <acl_role id="acl-1">
>         <acl_permission id="acl-1-deny" kind="deny" reference="cib-opts-1"/>
>       </acl_role>
>       <acl_target id="observer">
>         <role id="acl-1"/>
>       </acl_target>
>     </acls>
>   </configuration>
>   <status/>
> </cib>

# xmllint --noout --relaxng /usr/share/pacemaker/pacemaker-2.5.rng \
  test4.cib
> test4.cib:13: element acl_permission: validity error :
>   IDREF attribute reference references an unknown ID "cib-opts-1"
> test4.cib fails to validate

This is "trivially" solvable by dropping ACL enforcement bound to
the ID of element being removed along with it (akin to ON DELETE CASCADE
in the DBMS world) -- it's the only reasonable outcome of handling
this "property unset" request.

Note that "referential integrity" constraints like this (all based on
ID-IDREF XML schema datatypes as used in RelaxNG schemas) are quite
convoluted and likely not every association is so straightforward
to handle, but ACL ones pretty much are.  Hence, untwist these,
and investigate feasibility at the remaining ones, please.

Comment 1 Ken Gaillot 2017-04-11 17:32:13 UTC
This is comparable to what pcs already does for constraints on resources that are being removed.

However, automatically deleting ACLs doesn't sound like a good idea. In this example, if someone unset stonith-disabled, then later set it again, they might not realize that they lost their ACL.

An alternative would be to give a better error, in this case something like:

Error: cannot unset stonith-disabled because it is referenced in an ACL
Please delete all references first

Besides constraints and ACLs, other references that may occur:
* <primitive> may refer to a template
* <operations>, <rule>, <nvset> and <nvpair> may refer to definition elsewhere
* <tags> refers to one or more IDs (I have no idea what these are)

Comment 2 Jan Pokorný [poki] 2017-04-11 17:57:03 UTC
Hm, another alternative then would be to turn per-reference ACL
to equivalent non-intrusive per-XPath (desugaring done internally by
pacemaker, anyway), with a downside that this will no longer be
covered by proper "referential integrity" check imposed by RelaxNG
validations.

Comment 3 Jan Pokorný [poki] 2017-04-11 19:21:21 UTC
re [comment 1]:
> However, automatically deleting ACLs doesn't sound like a good idea.
> In this example, if someone unset stonith-disabled, then later set it
> again, they might not realize that they lost their ACL.

Actually there's a strong justification behind dropping such ACL
item right away in the elaborated scenario -- nvpairs do not have
user-configurable IDs (at pcs level), hence this is not something
users have firmly under their control and the IDs generated under
the hood are not something to rely on statically.  Consider
heterogenous management tools, even pcs itself provides no explicit
guarantee of stable IDs; that they are context-dependent is rather
a convenient coincidence + see also [bug 1387358 comment 4].

In that case, it's nice that ACL item would be preserved for
now-nonexistent entity in some way (see [comment 1]), however
odds are that it's ID on recreation will be different.

So I think warning about ACL item being dropped + doing so in the
process would be the way to go in this case.

Comment 4 Jan Pokorný [poki] 2017-04-11 21:26:08 UTC
+ there is no apriori protection-per-id anyway so adding something
for the first time is really not that different as re-adding it.

What might be interesting, though, is making per-id ACL items
a standard counterpart of entity-adding commands:

  pcs property set --read observer_role --deny guest_role \
    stonith-enabled=false

Automatic "garbage collecting" would then be even more natural.

Comment 5 Jan Pokorný [poki] 2017-04-13 16:36:53 UTC
re [comment 1]:
> Besides constraints and ACLs, other references that may occur:
> * <primitive> may refer to a template
> * <operations>, <rule>, <nvset> and <nvpair> may refer to definition
>   elsewhere
> * <tags> refers to one or more IDs (I have no idea what these are)

Situation is a bit more complicated.

I tried to summarize the relationships based on "referential integrity",
together with analysis as to which IDs can actually be deliberately
selected by the user of pcs.

* * *

referencable-element
- referencing-element-1@referencing-attr-1
- referencing-element-2@referencing-attr-2

template
- primitivee@template

(primitive, group, clone, master)
- rsc_colocation@rsc
- rsc_colocation@with-rsc
- rsc_location@rsc
- rsc_order@first
- rsc_order@then
- rsc_ticket@rsc

acl_role
- role@id

// reflexive-in-class-references via @id-ref
X
- X@id-ref
for X in (cluster_property_set, instance_attributes, meta_attributes,
          nvpair [since 1.3], operations, resource_set, rule,
          utilization)

// anything referencable (see next section) can be referenced by
(ID-HAVING)
- acl_permission@reference
- obj_ref@id

* * *

Let's enumerate ID-HAVING entities (see last part of previous section),
making a distinction as to whether those ID can be actively selected
using pcs commands, or if they are not supported with pcs at all.

Entities not supported by pcs at all:
- tag
- template

Entities where IDs _cannot_ be deliberately chosen in pcs commands:
- acl_permission
- cluster_property_set (fixed ID is used, though)
- date_expression
- date_spec
- duration
- expression
- fencing-level
- instance_attributes
- meta_attributes
- nvpair
- op
- operations
- resource_set
- utilization

Entities where IDs _can_ be deliberately chosen in pcs commands:
- acl_role
- rsc_colocation
- rsc_location
- rsc_order
- rsc_ticket

(
+ coming:
- bundle
- port-mapping
- storage-mapping
)

* * *

In re <tags>: it's potentially very useful though non-interoperable way
how the management tools can group selected objects (identifiable with
with IDs) in a distributed manner under some other identifier, with
no semantic interpretation of this tag identifier other than how it's
understood by the management tool.  So, for instance, I can imagine
it could be (ab)used to list some especially important resources
prominently in the list, no matter which pcs UI daemon is accessed.

Note that respective <obj_ref> (and in turn <tags> when suddenly
made empty in due course) SHOULD be treated the very same way as
ACLs -- in ON DELETE CASCADE manner, even if tags are not supported
by pcs per se, yet.

Comment 6 Jan Pokorný [poki] 2017-04-13 16:39:27 UTC
FWIW, this is the on-merge discussion for <tags>:
https://github.com/ClusterLabs/pacemaker/pull/463

Comment 7 Jan Pokorný [poki] 2017-04-21 13:45:28 UTC
I should also mention a special category where, contrary to the bug
report, pcs is keen to enforce referential integrity where it is
conceptually desired in the final product but is not mandated (typically
cannot be as constrainted by limited RelaxNG expressivness and perhaps
suboptimal way of encoding one-or-more relationships with the dependent
entities) in the schema:

primitive (or possibly whatever makes sense to encode stonith resources)
- fencing-level/@devices (possibly carrying "id1,...,idN" array
  of pointers to respective configured instances of fence agents)

Comment 9 RHEL Program Management 2021-01-15 07:33:52 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.