Bug 2095597

Summary: A duplicate ACL user causes a DC election loop (RHEL 7)
Product: Red Hat Enterprise Linux 7 Reporter: Reid Wahl <nwahl>
Component: pacemakerAssignee: Ken Gaillot <kgaillot>
Status: CLOSED WONTFIX QA Contact: cluster-qe <cluster-qe>
Severity: high Docs Contact:
Priority: low    
Version: 7.9CC: cluster-maint, sbradley
Target Milestone: rcKeywords: Triaged
Target Release: ---Flags: pm-rhel: mirror+
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2095662 (view as bug list) Environment:
Last Closed: 2022-10-06 20:37:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2095662    

Description Reid Wahl 2022-06-10 00:48:26 UTC
Description of problem:

If a duplicate ACL user (acl_target) exists in the CIB, DC elections will loop quickly and infinitely. There is no good reason to have a duplicate ACL user, but pacemaker should not fail in this way if a user makes this mistake.

    <acls>
      <acl_role id="read-access">
        <acl_permission id="read-access-read" kind="read" xpath="/"/>
      </acl_role>
      <acl_target id="testuser">
        <role id="read-access"/>
      </acl_target>
      <acl_target id="testuser">
        <role id="read-access"/>
      </acl_target>
    </acls>

[root@fastvm-rhel-7-6-21 pcs]# tail -f /var/log/messages
Jun  9 17:31:25 fastvm-rhel-7-6-21 crmd[17198]:  notice: State transition S_ELECTION -> S_INTEGRATION
Jun  9 17:31:25 fastvm-rhel-7-6-21 stonith-ng[17194]:  notice: Versions did not change in patch 0.40.3
Jun  9 17:31:25 fastvm-rhel-7-6-21 attrd[17196]:  notice: Updating all attributes after cib_refresh_notify event
Jun  9 17:31:25 fastvm-rhel-7-6-21 crmd[17198]:  notice: State transition S_ELECTION -> S_INTEGRATION
Jun  9 17:31:25 fastvm-rhel-7-6-21 stonith-ng[17194]:  notice: Versions did not change in patch 0.40.7
Jun  9 17:31:25 fastvm-rhel-7-6-21 attrd[17196]:  notice: Updating all attributes after cib_refresh_notify event

Detailed logs are in "Additional info."

Note: I have not tested on RHEL 8 yet; not sure if the issue is present there.

Also note: pcs does not allow this duplicate creation.

[root@fastvm-rhel-7-6-21 pcs]# pcs acl user create testuser read-access
Error: 'testuser' already exists

But if the duplicate already exists, pcs seems unable to remove it. It has to be removed manually and then pushed. IMO this isn't worth fixing in pcs, but others' opinions may differ.

-----

Version-Release number of selected component (if applicable):

pacemaker-1.1.23-1.el7_9.1

-----

How reproducible:

Always

-----

Steps to Reproduce:
1. Enable ACLs (`pcs acl enable`).
2. Save the CIB to a file (`pcs cluster cib > /tmp/cib.xml`), and add the following to /tmp/cib.xml, just before the closing </configuration> tag.

    <acls>
      <acl_role id="read-access">
        <acl_permission id="read-access-read" kind="read" xpath="/"/>
      </acl_role>
      <acl_target id="testuser">
        <role id="read-access"/>
      </acl_target>
      <acl_target id="testuser">
        <role id="read-access"/>
      </acl_target>
    </acls>

3. Push the updated CIB (`pcs cluster cib-push --config /tmp/cib.xml`).
4. It may be necessary to perform a full cluster stop-and-start in order to produce the issue.

-----

Actual results:

An infinite and rapid DC election loop as shown in the description.

-----

Expected results:

No DC election loop; perhaps a warning or failure due to the duplicate ACL user, or perhaps ignore the duplicate.

-----

Additional info:

Jun 09 17:31:25 [17198] fastvm-rhel-7-6-21       crmd: (     utils.c:975   )    info: update_dc:        Set DC to node1 (3.0.14)
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: ( callbacks.c:1113  )    info: cib_process_request:      Completed cib_modify operation for section crm_config: OK (rc=0, origin=node1/crmd/574, version=0.40.125)
Jun 09 17:31:25 [17198] fastvm-rhel-7-6-21       crmd: (       fsa.c:548   )    info: do_state_transition:      State transition S_INTEGRATION -> S_FINALIZE_JOIN | input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (   cib_ops.c:230   )    info: cib_process_replace:      Digest matched on replace from node1: a0c66419385d0b14385cfa03f3e8d523
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (   cib_ops.c:266   )    info: cib_process_replace:      Replaced 0.40.125 with 0.40.125 from node1
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (       xml.c:3531  )    info: __xml_diff_object:        acl_target.testuser moved from 1 to 2
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (       xml.c:891   )    info: cib_perform_op:   cib_perform_op: Local-only Change: 0.40.125
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (       xml.c:920   )    info: cib_perform_op:   +~ /cib/configuration/acls/acl_target[@id='testuser'] moved to offset 2
Jun 09 17:31:25 [17198] fastvm-rhel-7-6-21       crmd: (       cib.c:316   )    info: controld_delete_node_state:       Deleting resource history for node node1 (via CIB call 580) | xpath=//node_state[@uname='node1']/lrm
Jun 09 17:31:25 [17198] fastvm-rhel-7-6-21       crmd: (       cib.c:316   )    info: controld_delete_node_state:       Deleting resource history for node node2 (via CIB call 582) | xpath=//node_state[@uname='node2']/lrm
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (    notify.c:366   )    info: cib_replace_notify:       Local-only Replace: 0.40.125 from node1
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: ( callbacks.c:1113  )    info: cib_process_request:      Completed cib_replace operation for section 'all': OK (rc=0, origin=node1/crmd/577, version=0.40.125)
Jun 09 17:31:25 [17196] fastvm-rhel-7-6-21      attrd: (      main.c:97    )  notice: attrd_cib_replaced_cb:    Updating all attributes after cib_refresh_notify event
Jun 09 17:31:25 [17196] fastvm-rhel-7-6-21      attrd: (  commands.c:1079  )   debug: write_attributes: Writing out all attributes
Jun 09 17:31:25 [17196] fastvm-rhel-7-6-21      attrd: (  commands.c:1290  )    info: write_attribute:  Processed 2 private changes for #attrd-protocol, id=n/a, set=n/a
Jun 09 17:31:25 [17194] fastvm-rhel-7-6-21 stonith-ng: (       xml.c:1320  )  notice: xml_patch_version_check:  Versions did not change in patch 0.40.125
Jun 09 17:31:25 [17198] fastvm-rhel-7-6-21       crmd: (       fsa.c:548   )    info: do_state_transition:      State transition S_FINALIZE_JOIN -> S_ELECTION | input=I_ELECTION cause=C_FSA_INTERNAL origin=do_cib_replaced
Jun 09 17:31:25 [17198] fastvm-rhel-7-6-21       crmd: (     utils.c:979   )    info: update_dc:        Unset DC. Was node1
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: ( callbacks.c:1049  )    info: cib_process_request:      Forwarding cib_modify operation for section nodes to all (origin=local/crmd/578)
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: ( callbacks.c:1049  )    info: cib_process_request:      Forwarding cib_modify operation for section nodes to all (origin=local/crmd/579)
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: ( callbacks.c:1049  )    info: cib_process_request:      Forwarding cib_delete operation for section //node_state[@uname='node1']/lrm to all (origin=local/crmd/580)
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: ( callbacks.c:1049  )    info: cib_process_request:      Forwarding cib_modify operation for section status to all (origin=local/crmd/581)
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: ( callbacks.c:1049  )    info: cib_process_request:      Forwarding cib_delete operation for section //node_state[@uname='node2']/lrm to all (origin=local/crmd/582)
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (  cib_file.c:293   )    info: cib_file_backup:  Archived previous version as /var/lib/pacemaker/cib/cib-82.raw
Jun 09 17:31:25 [17198] fastvm-rhel-7-6-21       crmd: (  election.c:381   )    info: election_check:   election-DC won by local node
Jun 09 17:31:25 [17198] fastvm-rhel-7-6-21       crmd: (      misc.c:46    )    info: do_log:   Input I_ELECTION_DC received in state S_ELECTION from election_win_cb
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: ( callbacks.c:1113  )    info: cib_process_request:      Completed cib_modify operation for section nodes: OK (rc=0, origin=node1/crmd/578, version=0.40.125)
Jun 09 17:31:25 [17198] fastvm-rhel-7-6-21       crmd: (       fsa.c:548   )  notice: do_state_transition:      State transition S_ELECTION -> S_INTEGRATION | input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=election_win_cb
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: ( callbacks.c:1113  )    info: cib_process_request:      Completed cib_modify operation for section nodes: OK (rc=0, origin=node1/crmd/579, version=0.40.125)
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (       xml.c:884   )    info: cib_perform_op:   Diff: --- 0.40.125 2
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (       xml.c:886   )    info: cib_perform_op:   Diff: +++ 0.40.126 (null)
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (       xml.c:966   )    info: cib_perform_op:   -- /cib/status/node_state[@id='1']/lrm[@id='1']
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (       xml.c:952   )    info: cib_perform_op:   +  /cib:  @num_updates=126
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: ( callbacks.c:1113  )    info: cib_process_request:      Completed cib_delete operation for section //node_state[@uname='node1']/lrm: OK (rc=0, origin=node1/crmd/580, version=0.40.126)
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (       xml.c:884   )    info: cib_perform_op:   Diff: --- 0.40.126 2
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (       xml.c:886   )    info: cib_perform_op:   Diff: +++ 0.40.127 (null)
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (       xml.c:952   )    info: cib_perform_op:   +  /cib:  @num_updates=127
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (       xml.c:952   )    info: cib_perform_op:   +  /cib/status/node_state[@id='1']:  @crm-debug-origin=do_lrm_query_internal
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (       xml.c:908   )    info: cib_perform_op:   ++ /cib/status/node_state[@id='1']:  <lrm id="1"/>
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (       xml.c:915   )    info: cib_perform_op:   ++                                     <lrm_resources/>
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (       xml.c:915   )    info: cib_perform_op:   ++                                   </lrm>
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: ( callbacks.c:1113  )    info: cib_process_request:      Completed cib_modify operation for section status: OK (rc=0, origin=node1/crmd/581, version=0.40.127)
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (       xml.c:884   )    info: cib_perform_op:   Diff: --- 0.40.127 2
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (       xml.c:886   )    info: cib_perform_op:   Diff: +++ 0.40.128 (null)
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (       xml.c:966   )    info: cib_perform_op:   -- /cib/status/node_state[@id='2']/lrm[@id='2']
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (       xml.c:952   )    info: cib_perform_op:   +  /cib:  @num_updates=128
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: ( callbacks.c:1113  )    info: cib_process_request:      Completed cib_delete operation for section //node_state[@uname='node2']/lrm: OK (rc=0, origin=node1/crmd/582, version=0.40.128)
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: ( callbacks.c:1049  )    info: cib_process_request:      Forwarding cib_modify operation for section status to all (origin=local/crmd/583)
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: ( callbacks.c:1049  )    info: cib_process_request:      Forwarding cib_modify operation for section nodes to all (origin=local/crmd/586)
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: ( callbacks.c:1049  )    info: cib_process_request:      Forwarding cib_modify operation for section status to all (origin=local/crmd/587)
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (       xml.c:884   )    info: cib_perform_op:   Diff: --- 0.40.128 2
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (       xml.c:886   )    info: cib_perform_op:   Diff: +++ 0.40.129 (null)
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (       xml.c:952   )    info: cib_perform_op:   +  /cib:  @num_updates=129
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (       xml.c:952   )    info: cib_perform_op:   +  /cib/status/node_state[@id='2']:  @crm-debug-origin=do_lrm_query_internal
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (       xml.c:908   )    info: cib_perform_op:   ++ /cib/status/node_state[@id='2']:  <lrm id="2"/>
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (       xml.c:915   )    info: cib_perform_op:   ++                                     <lrm_resources/>
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (       xml.c:915   )    info: cib_perform_op:   ++                                   </lrm>
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: ( callbacks.c:1113  )    info: cib_process_request:      Completed cib_modify operation for section status: OK (rc=0, origin=node1/crmd/583, version=0.40.129)
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: ( callbacks.c:1113  )    info: cib_process_request:      Completed cib_modify operation for section nodes: OK (rc=0, origin=node1/crmd/586, version=0.40.129)
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (       xml.c:884   )    info: cib_perform_op:   Diff: --- 0.40.129 2
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (       xml.c:886   )    info: cib_perform_op:   Diff: +++ 0.40.130 (null)
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (       xml.c:952   )    info: cib_perform_op:   +  /cib:  @num_updates=130
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (       xml.c:952   )    info: cib_perform_op:   +  /cib/status/node_state[@id='1']:  @crm-debug-origin=do_cib_replaced
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (       xml.c:952   )    info: cib_perform_op:   +  /cib/status/node_state[@id='2']:  @crm-debug-origin=do_cib_replaced
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: ( callbacks.c:1113  )    info: cib_process_request:      Completed cib_modify operation for section status: OK (rc=0, origin=node1/crmd/587, version=0.40.130)
Jun 09 17:31:25 [17198] fastvm-rhel-7-6-21       crmd: (  election.c:225   )    info: do_dc_takeover:   Taking over DC status for this partition
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: ( callbacks.c:1113  )    info: cib_process_request:      Completed cib_master operation for section 'all': OK (rc=0, origin=local/crmd/588, version=0.40.130)
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: ( callbacks.c:1049  )    info: cib_process_request:      Forwarding cib_modify operation for section cib to all (origin=local/crmd/589)
Jun 09 17:31:25 [17194] fastvm-rhel-7-6-21 stonith-ng: (       xml.c:1325  )   debug: xml_patch_version_check:  Can apply patch 0.40.126 to 0.40.125
Jun 09 17:31:25 [17194] fastvm-rhel-7-6-21 stonith-ng: (       xml.c:1325  )   debug: xml_patch_version_check:  Can apply patch 0.40.127 to 0.40.126
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: ( callbacks.c:1113  )    info: cib_process_request:      Completed cib_modify operation for section cib: OK (rc=0, origin=node1/crmd/589, version=0.40.130)
Jun 09 17:31:25 [17194] fastvm-rhel-7-6-21 stonith-ng: (       xml.c:1325  )   debug: xml_patch_version_check:  Can apply patch 0.40.128 to 0.40.127
Jun 09 17:31:25 [17194] fastvm-rhel-7-6-21 stonith-ng: (       xml.c:1325  )   debug: xml_patch_version_check:  Can apply patch 0.40.129 to 0.40.128
Jun 09 17:31:25 [17194] fastvm-rhel-7-6-21 stonith-ng: (       xml.c:1325  )   debug: xml_patch_version_check:  Can apply patch 0.40.130 to 0.40.129
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: (  cib_file.c:423   )    info: cib_file_write_with_digest:       Wrote version 0.40.0 of the CIB to disk (digest: 6ebef8311746943742a847b50bbb11ef)
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: ( callbacks.c:1049  )    info: cib_process_request:      Forwarding cib_modify operation for section crm_config to all (origin=local/crmd/591)
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: ( callbacks.c:1113  )    info: cib_process_request:      Completed cib_modify operation for section crm_config: OK (rc=0, origin=node1/crmd/591, version=0.40.130)
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: ( callbacks.c:1049  )    info: cib_process_request:      Forwarding cib_modify operation for section crm_config to all (origin=local/crmd/593)
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: ( callbacks.c:1113  )    info: cib_process_request:      Completed cib_modify operation for section crm_config: OK (rc=0, origin=node1/crmd/593, version=0.40.130)
Jun 09 17:31:25 [17198] fastvm-rhel-7-6-21       crmd: (   join_dc.c:179   )    info: join_make_offer:  Sending join-29 offer to node1
Jun 09 17:31:25 [17198] fastvm-rhel-7-6-21       crmd: (   join_dc.c:179   )    info: join_make_offer:  Sending join-29 offer to node2
Jun 09 17:31:25 [17198] fastvm-rhel-7-6-21       crmd: (   join_dc.c:212   )    info: do_dc_join_offer_all:     Waiting on join-29 requests from 2 outstanding nodes
Jun 09 17:31:25 [17193] fastvm-rhel-7-6-21        cib: ( callbacks.c:1049  )    info: cib_process_request:      Forwarding cib_modify operation for section crm_config to all (origin=local/crmd/595)
Jun 09 17:31:25 [17198] fastvm-rhel-7-6-21       crmd: (     utils.c:975   )    info: update_dc:        Set DC to node1 (3.0.14)

Comment 4 Ken Gaillot 2022-10-06 20:37:42 UTC
Given the difficulty of backporting the changes related to the fix, and the limited scope of the problem, we will not fix this in RHEL 7