1502795 – RFE: Allow starting remote nodes in a instant-standby state

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1502795 - RFE: Allow starting remote nodes in a instant-standby state

Summary: RFE: Allow starting remote nodes in a instant-standby state

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	pacemaker
Sub Component:
Version:	8.0
Hardware:	All
OS:	Linux
Priority:	high
Severity:	low
Target Milestone:	pre-dev-freeze
Target Release:	8.9
Assignee:	Chris Lumens
QA Contact:	cluster-qe
Docs Contact:
URL:
Whiteboard:
Depends On:	1376556
Blocks:	1427246
TreeView+	depends on / blocked

Reported:	2017-10-16 17:23 UTC by Ken Gaillot
Modified:	2023-11-14 16:49 UTC (History)
CC List:	6 users (show)
Fixed In Version:	pacemaker-2.1.6-2.el8
Doc Type:	Enhancement
Doc Text:	Feature: Users may specify PCMK_node_start_state in /etc/sysconfig/pacemaker to force a Pacemaker Remote node to start in standby or online mode. Reason: The PCMK_node_start_state sysconfig option was supported for cluster nodes, but not Pacemaker Remote nodes. Result: Users can choose to force a Pacemaker Remote node to start in standby mode when joining the cluster.
Clone Of:	1376556
Environment:
Last Closed:	2023-11-14 15:32:34 UTC
Type:	Feature Request
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Cluster Labs	5169	None	None	None	2017-10-16 17:23:15 UTC
Red Hat Issue Tracker	CLUSTERQE-6689	None	None	None	2023-05-16 13:11:30 UTC
Red Hat Knowledge Base (Solution)	3194452	None	None	None	2017-10-16 17:23:15 UTC
Red Hat Product Errata	RHEA-2023:6970	None	None	None	2023-11-14 15:33:21 UTC

Description Ken Gaillot 2017-10-16 17:23:16 UTC

+++ This bug was initially created as a clone of Bug #1376556 +++

(This is supported for cluster nodes as of 7.5; this bz is to request the same support for Pacemaker Remote nodes.)

Description of problem: There are many instances where you might know that you don't want pacemaker to handle any management of resources when you start the cluster, and instead would prefer to have the node you're starting come up in standby mode instantly.  

I encounter this often if I discover I've horribly broken the configuration for some resource and rather than trying to untangle the web of what is messed up, I might prefer to reboot and start fresh; but when the nodes get done booting, I still have to deal with the problem that when I start that node, its just going to start trying to manage resources again.  

I've also encountered this with customers where they may be in a maintenance window and are bringing nodes back online, but they want to be able to control when they start bringing applications up.  Having the cluster start with either maintenance-mode or some/all nodes in standby would be ideal, but there's no straightforward way to do that other than hoping to get a standby request processed quickly enough to avoid any other work happening.

So, it'd be nice if pcs could offer some mechanism to start a node with automatic standby mode, and/or possibly to start with the cluster in maintenance-mode. 


Version-Release number of selected component (if applicable): All releases of pacemaker


How reproducible:


Steps to Reproduce:
1. Want to bring up the cluster on one or all nodes without managing any resources

Actual results: Can't do it


Expected results: Have the ability to run a single command and a node joins the cluster in standby mode. 


Additional info:

--- Additional comment from Ken Gaillot on 2017-02-20 12:06:47 EST ---

FYI, partial support has been merged upstream:

   https://github.com/ClusterLabs/pacemaker/pull/1141

Currently, only cluster nodes are supported. We should be able to get that into 7.4, though we probably shouldn't advertise or support it until remote node support is added (which is planned, but no time frame is available yet).

--- Additional comment from Ken Gaillot on 2017-10-16 13:18:00 EDT ---

This will be supported in 7.5 for cluster nodes (only). I will clone this bz to request remote node support in a future release.

QA: To test, create a cluster, then stop one node, or prepare a machine to be added as a new node. Add PCMK_node_start_state to the node's /etc/sysconfig with one of these values:

* "default" (unsurprisingly, the default) will use the current value of the node's "standby" node attribute (the only behavior supported by previous releases).

* "online" will force the node to join the cluster in online mode, even if it previously was put into standby mode before being stopped.

* "standby" will force the node to join the cluster in standby node.

Comment 6 RHEL Program Management 2021-02-24 07:30:28 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 7 Ken Gaillot 2021-02-24 15:15:19 UTC

This is still a goal, but it will be tracked via the upstream bz.

Comment 8 Ken Gaillot 2023-05-04 15:28:59 UTC

Support for remote nodes was added in upstream main branch as of commit 76bd508cc

Comment 14 jrehova 2023-07-17 22:33:02 UTC

Version of pacemaker:

> [root@virt-248:~]# rpm -q pacemaker
> pacemaker-2.1.6-3.el8.x86_64

Setting of 5-node cluster -> 2 nodes and 3 remote nodes:

> [root@virt-248:~]#pcs status 
> Cluster name: STSRHTS8954
> Cluster Summary:
>   * Stack: corosync (Pacemaker is running)
>   * Current DC: virt-249 (version 2.1.6-3.el8-6fdc9deea29) - partition with quorum
>   * Last updated: Mon Jul 17 14:35:21 2023 on virt-248
>   * Last change:  Mon Jul 17 14:35:04 2023 by root via cibadmin on virt-248
>   * 5 nodes configured
>   * 8 resource instances configured
> 
> Node List:
>   * online: [ virt-248 virt-249 ]
>   * Remoteonline: [ virt-256 virt-257 virt-261 ]
> 
> Full List of Resources:
>   * fence-virt-248	(stonith:fence_xvm):	 Started virt-249
>   * fence-virt-249	(stonith:fence_xvm):	 Started virt-248
>   * fence-virt-256	(stonith:fence_xvm):	 Started virt-249
>   * fence-virt-257	(stonith:fence_xvm):	 Started virt-249
>   * fence-virt-261	(stonith:fence_xvm):	 Started virt-248
>   * virt-256	(ocf::pacemaker:remote):	 Started virt-248
>   * virt-257	(ocf::pacemaker:remote):	 Started virt-249
>   * virt-261	(ocf::pacemaker:remote):	 Started virt-248
> 
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled
___________________________________________________________________________________________________________

OPTION 1:
remote node to standby              change PCMK_node_start_state          expected status after
before disable?                     on remote node?                       enable remote node
=====================               ============================          =======================
yes                                 no                                    connect as standby

Setting standby state with pcs node standby:

> [root@virt-248:~]# pcs node standby virt-256 virt-257 virt-261
> [root@virt-248:~]# pcs status 
> Cluster name: STSRHTS8954
> Cluster Summary:
>   * Stack: corosync (Pacemaker is running)
>   * Current DC: virt-249 (version 2.1.6-3.el8-6fdc9deea29) - partition with quorum
>   * Last updated: Mon Jul 17 22:06:12 2023 on virt-248
>   * Last change:  Mon Jul 17 22:06:07 2023 by root via cibadmin on virt-248
>   * 5 nodes configured
>   * 8 resource instances configured
> 
> Node List:
>   * RemoteNode virt-256: standby
>   * RemoteNode virt-257: standby
>   * RemoteNode virt-261: standby
>   * online: [ virt-248 virt-249 ]

Disabling remote nodes:

> [root@virt-248:~]# pcs resource disable virt-256
> [root@virt-248:~]# pcs resource disable virt-257
> [root@virt-248:~]# pcs resource disable virt-261
> [root@virt-248:~]# pcs status
> ...
> Node List:
>   * online: [ virt-248 virt-249 ]
>   * RemoteOFFLIno: [ virt-256 virt-257 virt-261 ]
> ...

Enabling remote nodes:

> [root@virt-248:~]# pcs resource enable virt-257
> [root@virt-248:~]# pcs resource enable virt-256
> [root@virt-248:~]# pcs resource enable virt-261
> [root@virt-248:~]# pcs status 
> Cluster name: STSRHTS8954
> Cluster Summary:
>   * Stack: corosync (Pacemaker is running)
>   * Current DC: virt-249 (version 2.1.6-3.el8-6fdc9deea29) - partition with quorum
>   * Last updated: Mon Jul 17 22:10:55 2023 on virt-248
>   * Last change:  Mon Jul 17 22:10:50 2023 by root via cibadmin on virt-248
>   * 5 nodes configured
>   * 8 resource instances configured
> 
> Node List:
>   * RemoteNode virt-256: standby
>   * RemoteNode virt-257: standby
>   * RemoteNode virt-261: standby
>   * online: [ virt-248 virt-249 ]

RESULT: It is working well for this option -> all remote nodes have standby state.
___________________________________________________________________________________________________________

OPTION 2:
remote node to standby              change PCMK_node_start_state          expected status after
before disable?                     on remote node?                       enable remote node
=====================               ============================          =======================
yes                                 yes, to "online"                      connect as online
yes                                 yes, to "standby"                     connect as standby
yes                                 yes, to "default"                     connect as standby   

Setting standby state with pcs node standby:

> [root@virt-248:~]# pcs node standby virt-256 virt-257 virt-261
> [root@virt-248:~]# pcs status
> Cluster name: STSRHTS8954
> Cluster Summary:
>   * Stack: corosync (Pacemaker is running)
>   * Current DC: virt-249 (version 2.1.6-3.el8-6fdc9deea29) - partition with quorum
>   * Last updated: Mon Jul 17 23:29:17 2023 on virt-248
>   * Last change:  Mon Jul 17 23:29:13 2023 by root via cibadmin on virt-248
>   * 5 nodes configured
>   * 8 resource instances configured
> 
> Node List:
>   * RemoteNode virt-256: standby
>   * RemoteNode virt-257: standby
>   * RemoteNode virt-261: standby
>   * Online: [ virt-248 virt-249 ]

Disabling remote nodes:

> [root@virt-248:~]# pcs resource disable virt-261
> [root@virt-248:~]# pcs resource disable virt-257
> [root@virt-248:~]# pcs resource disable virt-256
> [root@virt-248:~]# pcs status
> ...
> Node List:
>   * online: [ virt-248 virt-249 ]
>   * RemoteOFFLIno: [ virt-256 virt-257 virt-261 ]
> ...

Changing PCMK_node_start_state in /etc/sysconfig/pacemaker for each remote nodes different values - "online", "default" and "standby":

> [root@virt-261 ~]# vim /etc/sysconfig/pacemaker
> VALGRIND_OPTS="--leak-check=full --trace-children=no --vgdb=no --num-callers=25"
> VALGRIND_OPTS="$VALGRIND_OPTS --log-file=/var/lib/pacemaker/valgrind-%p"
> VALGRIND_OPTS="$VALGRIND_OPTS --suppressions=/usr/share/pacemaker/tests/valgrind-pcmk.suppressions"
> VALGRIND_OPTS="$VALGRIND_OPTS --gen-suppressions=all"
> PCMK_node_start_state="default"
> [root@virt-261 ~]# systemctl restart pacemaker-remote

> [root@virt-257 ~]# vim /etc/sysconfig/pacemaker
> VALGRIND_OPTS="--leak-check=full --trace-children=no --vgdb=no --num-callers=25"
> VALGRIND_OPTS="$VALGRIND_OPTS --log-file=/var/lib/pacemaker/valgrind-%p"
> VALGRIND_OPTS="$VALGRIND_OPTS --suppressions=/usr/share/pacemaker/tests/valgrind-pcmk.suppressions"
> VALGRIND_OPTS="$VALGRIND_OPTS --gen-suppressions=all"
> PCMK_node_start_state="online"
> [root@virt-257 ~]# systemctl restart pacemaker-remote

> [root@virt-256 ~]# vim /etc/sysconfig/pacemaker
> VALGRIND_OPTS="--leak-check=full --trace-children=no --vgdb=no --num-callers=25"
> VALGRIND_OPTS="$VALGRIND_OPTS --log-file=/var/lib/pacemaker/valgrind-%p"
> VALGRIND_OPTS="$VALGRIND_OPTS --suppressions=/usr/share/pacemaker/tests/valgrind-pcmk.suppressions"
> VALGRIND_OPTS="$VALGRIND_OPTS --gen-suppressions=all"
> PCMK_node_start_state="standby"
> [root@virt-256 ~]# systemctl restart pacemaker-remote

Enabling remote nodes:

> [root@virt-248:~]# pcs resource enable virt-261
> [root@virt-248:~]# pcs resource enable virt-256
> [root@virt-248:~]# pcs resource enable virt-257
> [root@virt-248:~]# pcs status
> Cluster name: STSRHTS8954
> Cluster Summary:
>   * Stack: corosync (Pacemaker is running)
>   * Current DC: virt-249 (version 2.1.6-3.el8-6fdc9deea29) - partition with quorum
>   * Last updated: Mon Jul 17 23:33:05 2023 on virt-248
>   * Last change:  Mon Jul 17 23:33:01 2023 by root via cibadmin on virt-248
>   * 5 nodes configured
>   * 8 resource instances configured
> 
> Node List:
>   * RemoteNode virt-256: standby
>   * RemoteNode virt-261: standby
>   * Online: [ virt-248 virt-249 ]
>   * RemoteOnline: [ virt-257 ]

RESULT: It is working well for this option -> 2 remote nodes have standby state (RNs with "default" and "standby") and 1 remote node is online (RN with "online").
___________________________________________________________________________________________________________

OPTION 3: 
remote node to standby              change PCMK_node_start_state          expected status after
before disable?                     on remote node?                       enable remote node
=====================               ============================          =======================
no                                  no                                    connect as online

Setting of cluster -> all remote nodes = online:

> [root@virt-248:~]# pcs status
> Cluster name: STSRHTS8954
> Cluster Summary:
>   * Stack: corosync (Pacemaker is running)
>   * Current DC: virt-249 (version 2.1.6-3.el8-6fdc9deea29) - partition with quorum
>   * Last updated: Mon Jul 17 22:37:24 2023 on virt-248
>   * Last change:  Mon Jul 17 22:35:11 2023 by root via cibadmin on virt-248
>   * 5 nodes configured
>   * 8 resource instances configured
> 
> Node List:
>   * online: [ virt-248 virt-249 ]
>   * Remoteonline: [ virt-256 virt-257 virt-261 ]

Disabling remote nodes:

> [root@virt-248:~]# pcs resource disable virt-261
> [root@virt-248:~]# pcs resource disable virt-257
> [root@virt-248:~]# pcs resource disable virt-256
> [root@virt-248:~]# pcs status
> ...
> Node List:
>   * online: [ virt-248 virt-249 ]
>   * RemoteOFFLIno: [ virt-256 virt-257 virt-261 ]
> ...

Enabling remote nodes:

> [root@virt-248:~]# pcs resource enable virt-256
> [root@virt-248:~]# pcs resource enable virt-257
> [root@virt-248:~]# pcs resource enable virt-261
> [root@virt-248:~]# pcs status
> Cluster name: STSRHTS8954
> Cluster Summary:
>   * Stack: corosync (Pacemaker is running)
>   * Current DC: virt-249 (version 2.1.6-3.el8-6fdc9deea29) - partition with quorum
>   * Last updated: Mon Jul 17 22:39:03 2023 on virt-248
>   * Last change:  Mon Jul 17 22:39:00 2023 by root via cibadmin on virt-248
>   * 5 nodes configured
>   * 8 resource instances configured
> 
> Node List:
>   * online: [ virt-248 virt-249 ]
>   * Remoteonline: [ virt-256 virt-257 virt-261 ]

RESULT: It is working well for this option -> all remote nodes are online.
___________________________________________________________________________________________________________

OPTION 4:
remote node to standby              change PCMK_node_start_state          expected status after
before disable?                     on remote node?                       enable remote node
=====================               ============================          =======================
no                                  yes, to "online"                      connect as online
no                                  yes, to "standby"                     connect as standby
no                                  yes, to "default"                     connect as online

Setting of cluster -> all remote nodes = online:

> [root@virt-248:~]# pcs status
> Cluster name: STSRHTS8954
> Cluster Summary:
>   * Stack: corosync (Pacemaker is running)
>   * Current DC: virt-249 (version 2.1.6-3.el8-6fdc9deea29) - partition with quorum
>   * Last updated: Mon Jul 17 22:39:03 2023 on virt-248
>   * Last change:  Mon Jul 17 22:39:00 2023 by root via cibadmin on virt-248
>   * 5 nodes configured
>   * 8 resource instances configured
> 
> Node List:
>   * online: [ virt-248 virt-249 ]
>   * Remoteonline: [ virt-256 virt-257 virt-261 ]

Disabling remote nodes:

> [root@virt-248:~]# pcs resource disable virt-256
> [root@virt-248:~]# pcs resource disable virt-257
> [root@virt-248:~]# pcs resource disable virt-261
> [root@virt-248:~]# pcs status
> ...
> Node List:
>   * online: [ virt-248 virt-249 ]
>   * RemoteOFFLIno: [ virt-256 virt-257 virt-261 ]
> ...

Changing PCMK_node_start_state in /etc/sysconfig/pacemaker for each remote nodes different values - "online", "default" and "standby":

> [root@virt-261 ~]# vim /etc/sysconfig/pacemaker
> VALGRIND_OPTS="--leak-check=full --trace-children=no --vgdb=no --num-callers=25"
> VALGRIND_OPTS="$VALGRIND_OPTS --log-file=/var/lib/pacemaker/valgrind-%p"
> VALGRIND_OPTS="$VALGRIND_OPTS --suppressions=/usr/share/pacemaker/tests/valgrind-pcmk.suppressions"
> VALGRIND_OPTS="$VALGRIND_OPTS --gen-suppressions=all"
> PCMK_node_start_state="default"
> [root@virt-261 ~]# systemctl restart pacemaker-remote

> [root@virt-257 ~]# vim /etc/sysconfig/pacemaker
> VALGRIND_OPTS="--leak-check=full --trace-children=no --vgdb=no --num-callers=25"
> VALGRIND_OPTS="$VALGRIND_OPTS --log-file=/var/lib/pacemaker/valgrind-%p"
> VALGRIND_OPTS="$VALGRIND_OPTS --suppressions=/usr/share/pacemaker/tests/valgrind-pcmk.suppressions"
> VALGRIND_OPTS="$VALGRIND_OPTS --gen-suppressions=all"
> PCMK_node_start_state="online"
> [root@virt-257 ~]# systemctl restart pacemaker-remote

> [root@virt-256 ~]# vim /etc/sysconfig/pacemaker
> VALGRIND_OPTS="--leak-check=full --trace-children=no --vgdb=no --num-callers=25"
> VALGRIND_OPTS="$VALGRIND_OPTS --log-file=/var/lib/pacemaker/valgrind-%p"
> VALGRIND_OPTS="$VALGRIND_OPTS --suppressions=/usr/share/pacemaker/tests/valgrind-pcmk.suppressions"
> VALGRIND_OPTS="$VALGRIND_OPTS --gen-suppressions=all"
> PCMK_node_start_state="standby"
> [root@virt-256 ~]# systemctl restart pacemaker-remote

Enabling remote nodes:

> [root@virt-248:~]# pcs resource enable virt-256
> [root@virt-248:~]# pcs resource enable virt-261
> [root@virt-248:~]# pcs resource enable virt-257
> [root@virt-248:~]# pcs status
> Cluster name: STSRHTS8954
> Cluster Summary:
>   * Stack: corosync (Pacemaker is running)
>   * Current DC: virt-249 (version 2.1.6-3.el8-6fdc9deea29) - partition with quorum
>   * Last updated: Tue Jul 18 00:10:48 2023 on virt-248
>   * Last change:  Tue Jul 18 00:10:42 2023 by hacluster via crmd on virt-248
>   * 5 nodes configured
>   * 8 resource instances configured
> 
> Node List:
>   * RemoteNode virt-256: standby
>   * Online: [ virt-248 virt-249 ]
>   * RemoteOnline: [ virt-257 virt-261 ]

RESULT: It is working well for this option -> 2 remote nodes are online (RNs with "default" and "online") and 1 remote node has standby state (RN with "standby").

Comment 17 errata-xmlrpc 2023-11-14 15:32:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:6970

Note You need to log in before you can comment on or make changes to this bug.