Bug 1881537
Summary: | Pacemaker Remote nodes cannot run resources that have CIB secrets configured | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Markéta Smazová <msmazova> |
Component: | pacemaker | Assignee: | Oyvind Albrigtsen <oalbrigt> |
Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 8.3 | CC: | cluster-maint, kgaillot, phagara |
Target Milestone: | rc | Keywords: | Triaged |
Target Release: | 8.4 | Flags: | pm-rhel:
mirror+
|
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
Fixed In Version: | pacemaker-2.0.5-6.el8 | Doc Type: | No Doc Update |
Doc Text: |
The Pacemaker capability being fixed was added in 8.3, but the pcs interface is not yet available (or documented), so I think we are OK not mentioning this in documentation.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2021-05-18 15:26:40 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Markéta Smazová
2020-09-22 16:11:23 UTC
The problem is that currently, Pacemaker's executor daemon (pacemaker-execd on cluster nodes and pacemaker-remoted on remote nodes) is the one that replaces the "lrm://" placeholders with the secret values. This means that the secrets must be available locally on the node running the executor daemon, but for remote nodes, that is not the case. I believe the fix will be to make the controller daemon (pacemaker-controld) provide the secret values to the executor when requesting execution. It would be simpler to make the controller substitute the secret values before requesting execution, but that would change the parameter hash used to detect configuration changes, causing affected resources to restart after a rolling upgrade. So instead, I think the controller can provide the parameters as currently (with "lrm://"), and provide the secret values in a separate (new) part of the request. The executor can then perform the substitution based on the provided values rather than on locally stored values, while still computing the same parameter hash. This approach will mean that both the cluster nodes and the remote nodes must be running a Pacemaker version with the fix in order for it to work. If either has an older version, it will simply behave as currently. The proposed fix in Comment 1 wouldn't work as described. The controller runs as hacluster, not root, so it doesn't have access to the secrets. My next idea is to modify the cibsecret tool to sync secrets to Pacemaker Remote nodes. Pacemaker Remote nodes are more likely to be down when the secret is set and to not have ssh access from the cluster nodes, and the host's secrets will need to be exported into containers in order to work with bundles, but we can document those limitations. The fix has been merged upstream as of commit 240b9ec0 We went with the approach of sync'ing secrets to remote and guest nodes. It does not work with bundles, though that capability could be added if demand arises. The remote or guest node must be available via ssh to its node name (specifically, if the node name is different from the local host name, the node name must be added to /etc/hosts or similar). env: cluster consisting of 3 full nodes and 1 remote node, dummy resource with a location constraint on the remote node before (pacemaker-2.0.4-6.el8_3.1) ================================== > [root@virt-047 ~]# pcs status > Cluster name: STSRHTS11440 > Cluster Summary: > * Stack: corosync > * Current DC: virt-047 (version 2.0.4-6.el8_3.1-2deceaa3ae) - partition with quorum > * Last updated: Mon Feb 15 10:50:26 2021 > * Last change: Mon Feb 15 10:50:09 2021 by root via cibadmin on virt-047 > * 4 nodes configured > * 6 resource instances configured > > Node List: > * Online: [ virt-047 virt-048 virt-049 ] > * RemoteOnline: [ virt-050 ] > > Full List of Resources: > * fence-virt-047 (stonith:fence_xvm): Started virt-048 > * fence-virt-048 (stonith:fence_xvm): Started virt-049 > * fence-virt-049 (stonith:fence_xvm): Started virt-049 > * fence-virt-050 (stonith:fence_xvm): Started virt-047 > * virt-050 (ocf::pacemaker:remote): Started virt-047 > * dummy (ocf::pacemaker:Dummy): Started virt-050 > > Daemon Status: > corosync: active/enabled > pacemaker: active/enabled > pcsd: active/enabled > [root@virt-047 ~]# pcs constraint > Location Constraints: > Resource: dummy > Enabled on: > Node: virt-050 (score:INFINITY) > Ordering Constraints: > Colocation Constraints: > Ticket Constraints: change the dummy resource's delay attribute to be a secret: > [root@virt-047 ~]# cibsecret set dummy delay 10 > INFO: syncing /var/lib/pacemaker/lrm/secrets/dummy/delay to virt-048 virt-049 ... > Set 'dummy' option: id=dummy-instance_attributes-delay set=dummy-instance_attributes name=delay value=lrm:// notice the secret was synced only to full nodes and not the remote node. verify the secret attribute: > [root@virt-047 ~]# cibsecret get dummy delay > 10 > [root@virt-047 ~]# pcs resource config dummy > Resource: dummy (class=ocf provider=pacemaker type=Dummy) > Attributes: delay=lrm:// > Operations: migrate_from interval=0s timeout=20s (dummy-migrate_from-interval-0s) > migrate_to interval=0s timeout=20s (dummy-migrate_to-interval-0s) > monitor interval=10s timeout=20s (dummy-monitor-interval-10s) > reload interval=0s timeout=20s (dummy-reload-interval-0s) > start interval=0s timeout=20s (dummy-start-interval-0s) > stop interval=0s timeout=20s (dummy-stop-interval-0s) dummy resource fails on the remote node due to not having access to the secret: > [root@virt-047 ~]# pcs status > Cluster name: STSRHTS11440 > Cluster Summary: > * Stack: corosync > * Current DC: virt-047 (version 2.0.4-6.el8_3.1-2deceaa3ae) - partition with quorum > * Last updated: Mon Feb 15 10:55:42 2021 > * Last change: Mon Feb 15 10:55:06 2021 by root via crm_resource on virt-047 > * 4 nodes configured > * 6 resource instances configured > > Node List: > * Online: [ virt-047 virt-048 virt-049 ] > * RemoteOnline: [ virt-050 ] > > Full List of Resources: > * fence-virt-047 (stonith:fence_xvm): Started virt-048 > * fence-virt-048 (stonith:fence_xvm): Started virt-049 > * fence-virt-049 (stonith:fence_xvm): Started virt-049 > * fence-virt-050 (stonith:fence_xvm): Started virt-047 > * virt-050 (ocf::pacemaker:remote): Started virt-047 > * dummy (ocf::pacemaker:Dummy): Stopped > > Failed Resource Actions: > * dummy_start_0 on virt-050 'not configured' (6): call=18, status='complete', exitreason='', last-rc-change='2021-02-15 10:55:06 +01:00', queued=0ms, exec=6ms > > Daemon Status: > corosync: active/enabled > pacemaker: active/enabled > pcsd: active/enabled result: remote nodes are unable host resources with secret attributes after (pacemaker-2.0.5-6.el8) ============================= > [root@virt-042 ~]# pcs status > Cluster name: STSRHTS26313 > Cluster Summary: > * Stack: corosync > * Current DC: virt-043 (version 2.0.5-6.el8-ba59be7122) - partition with quorum > * Last updated: Mon Feb 15 11:03:13 2021 > * Last change: Mon Feb 15 11:02:17 2021 by root via cibadmin on virt-042 > * 4 nodes configured > * 6 resource instances configured > > Node List: > * Online: [ virt-042 virt-043 virt-044 ] > * RemoteOnline: [ virt-045 ] > > Full List of Resources: > * fence-virt-042 (stonith:fence_xvm): Started virt-043 > * fence-virt-043 (stonith:fence_xvm): Started virt-044 > * fence-virt-044 (stonith:fence_xvm): Started virt-044 > * fence-virt-045 (stonith:fence_xvm): Started virt-042 > * virt-045 (ocf::pacemaker:remote): Started virt-042 > * dummy (ocf::pacemaker:Dummy): Started virt-045 > > Daemon Status: > corosync: active/enabled > pacemaker: active/enabled > pcsd: active/enabled > [root@virt-042 ~]# pcs constraint > Location Constraints: > Resource: dummy > Enabled on: > Node: virt-045 (score:INFINITY) > Ordering Constraints: > Colocation Constraints: > Ticket Constraints: change the dummy resource's delay attribute to be a secret: > [root@virt-042 ~]# cibsecret set dummy delay 10 > INFO: syncing /var/lib/pacemaker/lrm/secrets/dummy/delay to virt-043 virt-044 virt-045 ... > Set 'dummy' option: id=dummy-instance_attributes-delay set=dummy-instance_attributes name=delay value=lrm:// notice the secret was synced to all nodes, including the remote one. verify secret attribute: > [root@virt-042 ~]# cibsecret get dummy delay > 10 > [root@virt-042 ~]# pcs resource config dummy > Resource: dummy (class=ocf provider=pacemaker type=Dummy) > Attributes: delay=lrm:// > Operations: migrate_from interval=0s timeout=20s (dummy-migrate_from-interval-0s) > migrate_to interval=0s timeout=20s (dummy-migrate_to-interval-0s) > monitor interval=10s timeout=20s (dummy-monitor-interval-10s) > reload interval=0s timeout=20s (dummy-reload-interval-0s) > start interval=0s timeout=20s (dummy-start-interval-0s) > stop interval=0s timeout=20s (dummy-stop-interval-0s) verify the dummy resource is happily running: > [root@virt-042 ~]# pcs status > Cluster name: STSRHTS26313 > Cluster Summary: > * Stack: corosync > * Current DC: virt-043 (version 2.0.5-6.el8-ba59be7122) - partition with quorum > * Last updated: Mon Feb 15 11:04:12 2021 > * Last change: Mon Feb 15 11:03:56 2021 by root via crm_resource on virt-042 > * 4 nodes configured > * 6 resource instances configured > > Node List: > * Online: [ virt-042 virt-043 virt-044 ] > * RemoteOnline: [ virt-045 ] > > Full List of Resources: > * fence-virt-042 (stonith:fence_xvm): Started virt-043 > * fence-virt-043 (stonith:fence_xvm): Started virt-044 > * fence-virt-044 (stonith:fence_xvm): Started virt-044 > * fence-virt-045 (stonith:fence_xvm): Started virt-042 > * virt-045 (ocf::pacemaker:remote): Started virt-042 > * dummy (ocf::pacemaker:Dummy): Started virt-045 > > Daemon Status: > corosync: active/enabled > pacemaker: active/enabled > pcsd: active/enabled the logs show that the resource configuration changed and was reloaded. log excerpt from a full node: > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_process_request) info: Forwarding cib_modify operation for section resources to all (origin=local/crm_resource/6) > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_perform_op) info: Diff: --- 0.15.9 2 > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_perform_op) info: Diff: +++ 0.16.0 cce549ac945c5a82c0b01d029f486781 > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_perform_op) info: + /cib: @epoch=16, @num_updates=0 > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_perform_op) info: ++ /cib/configuration/resources/primitive[@id='dummy']: <instance_attributes id="dummy-instance_attributes"/> > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_perform_op) info: ++ <nvpair id="dummy-instance_attributes-delay" name="delay" value="lrm://"/> > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_perform_op) info: ++ </instance_attributes> > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_process_request) info: Completed cib_modify operation for section resources: OK (rc=0, origin=virt-042/crm_resource/6, version=0.16.0) > Feb 15 11:03:56 virt-042 pacemaker-fenced [49708] (update_cib_stonith_devices_v2) info: Updating device list from the cib: create primitive[@id='dummy'] > Feb 15 11:03:56 virt-042 pacemaker-fenced [49708] (cib_devices_update) info: Updating devices to version 0.16.0 > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_file_backup) info: Archived previous version as /var/lib/pacemaker/cib/cib-17.raw > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_file_write_with_digest) info: Wrote version 0.16.0 of the CIB to disk (digest: 9098cee6303e9bd84023dd3c701730fc) > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_file_write_with_digest) info: Reading cluster configuration file /var/lib/pacemaker/cib/cib.uQ1cr1 (digest: /var/lib/pacemaker/cib/cib.Q467K0) > Feb 15 11:03:56 virt-042 pacemaker-controld [49712] (lrmd_tls_recv_reply) info: queueing notify > Feb 15 11:03:56 virt-042 pacemaker-controld [49712] (lrmd_tls_recv_reply) info: notify trigger set. > Feb 15 11:03:56 virt-042 pacemaker-controld [49712] (do_lrm_rsc_op) notice: Requesting local execution of reload operation for dummy on virt-045 | transition_key=7:8:0:41ebb136-fb99-4de9-a120-949760e3b442 op_key=dummy_reload_0 > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_process_request) info: Forwarding cib_modify operation for section status to all (origin=local/crmd/55) > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_perform_op) info: Diff: --- 0.16.0 2 > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_perform_op) info: Diff: +++ 0.16.1 (null) > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_perform_op) info: + /cib: @num_updates=1 > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_perform_op) info: + /cib/status/node_state[@id='virt-045']: @crm-debug-origin=do_update_resource > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_perform_op) info: + /cib/status/node_state[@id='virt-045']/lrm[@id='virt-045']/lrm_resources/lrm_resource[@id='dummy']/lrm_rsc_op[@id='dummy_last_0']: @operation_key=dummy_monitor_0, @operation=monitor, @transition-key=7:8:0:41ebb136-fb99-4de9-a120-949760e3b442, @transition-magic=-1:193;7:8:0:41ebb136-fb99-4de9-a120-949760e3b442, @call-id=-1, @rc-code=193, @op-status=-1, @last-rc-change=1613383436, @last-run=1613 > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_process_request) info: Completed cib_modify operation for section status: OK (rc=0, origin=virt-042/crmd/55, version=0.16.1) > Feb 15 11:03:56 virt-042 pacemaker-controld [49712] (process_lrm_event) info: Result of monitor operation for dummy on virt-045: Cancelled | call=8 key=dummy_monitor_10000 confirmed=true > Feb 15 11:03:56 virt-042 pacemaker-controld [49712] (process_lrm_event) notice: Result of reload operation for dummy on virt-045: ok | rc=0 call=12 key=dummy_reload_0 confirmed=true cib-update=56 > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_process_request) info: Forwarding cib_modify operation for section status to all (origin=local/crmd/56) > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_perform_op) info: Diff: --- 0.16.1 2 > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_perform_op) info: Diff: +++ 0.16.2 (null) > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_perform_op) info: + /cib: @num_updates=2 > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_perform_op) info: + /cib/status/node_state[@id='virt-045']/lrm[@id='virt-045']/lrm_resources/lrm_resource[@id='dummy']/lrm_rsc_op[@id='dummy_last_0']: @operation_key=dummy_start_0, @operation=start, @transition-magic=0:0;7:8:0:41ebb136-fb99-4de9-a120-949760e3b442, @call-id=12, @rc-code=0, @op-status=0, @exec-time=43 > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_process_request) info: Completed cib_modify operation for section status: OK (rc=0, origin=virt-042/crmd/56, version=0.16.2) > Feb 15 11:03:56 virt-042 pacemaker-controld [49712] (do_lrm_rsc_op) notice: Requesting local execution of monitor operation for dummy on virt-045 | transition_key=6:8:0:41ebb136-fb99-4de9-a120-949760e3b442 op_key=dummy_monitor_10000 > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_process_request) info: Forwarding cib_modify operation for section status to all (origin=local/crmd/57) > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_perform_op) info: Diff: --- 0.16.2 2 > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_perform_op) info: Diff: +++ 0.16.3 (null) > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_perform_op) info: + /cib: @num_updates=3 > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_perform_op) info: + /cib/status/node_state[@id='virt-045']/lrm[@id='virt-045']/lrm_resources/lrm_resource[@id='dummy']/lrm_rsc_op[@id='dummy_monitor_10000']: @transition-key=6:8:0:41ebb136-fb99-4de9-a120-949760e3b442, @transition-magic=-1:193;6:8:0:41ebb136-fb99-4de9-a120-949760e3b442, @call-id=-1, @rc-code=193, @op-status=-1, @last-rc-change=1613383436, @exec-time=0, @op-digest=9e2e2def26ff3a0cea4121713c3108b4 > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_process_request) info: Completed cib_modify operation for section status: OK (rc=0, origin=virt-042/crmd/57, version=0.16.3) > Feb 15 11:03:56 virt-042 pacemaker-controld [49712] (process_lrm_event) notice: Result of monitor operation for dummy on virt-045: ok | rc=0 call=13 key=dummy_monitor_10000 confirmed=false cib-update=58 > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_process_request) info: Forwarding cib_modify operation for section status to all (origin=local/crmd/58) > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_perform_op) info: Diff: --- 0.16.3 2 > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_perform_op) info: Diff: +++ 0.16.4 (null) > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_perform_op) info: + /cib: @num_updates=4 > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_perform_op) info: + /cib/status/node_state[@id='virt-045']/lrm[@id='virt-045']/lrm_resources/lrm_resource[@id='dummy']/lrm_rsc_op[@id='dummy_monitor_10000']: @transition-magic=0:0;6:8:0:41ebb136-fb99-4de9-a120-949760e3b442, @call-id=13, @rc-code=0, @op-status=0, @exec-time=26 > Feb 15 11:03:56 virt-042 pacemaker-based [49707] (cib_process_request) info: Completed cib_modify operation for section status: OK (rc=0, origin=virt-042/crmd/58, version=0.16.4) and on the remote node: > Feb 15 11:03:56 virt-045 pacemaker-remoted [55200] (cancel_recurring_action) info: Cancelling ocf operation dummy_monitor_10000 > Feb 15 11:03:56 virt-045 pacemaker-remoted [55200] (log_execute) info: executing - rsc:dummy action:reload call_id:12 > Feb 15 11:03:56 Dummy(dummy)[55414]: ERROR: Reloading... > Feb 15 11:03:56 virt-045 pacemaker-remoted [55200] (log_finished) info: dummy reload (call 12, PID 55414) exited with status 0 (execution time 43ms, queue time 0ms) result: remote nodes can successfully host resources with secret atrributes configured. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2021:1782 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2021:1782 |