Bug 1707851

Summary:	unsatisfactory recovery from pacemaker-daemons stalled via SIGSTOP
Product:	Red Hat Enterprise Linux 9	Reporter:	Klaus Wenninger <kwenning>
Component:	pacemaker	Assignee:	Ken Gaillot <kgaillot>
Status:	CLOSED ERRATA	QA Contact:	cluster-qe <cluster-qe>
Severity:	medium	Docs Contact:
Priority:	high
Version:	9.0	CC:	ccaulfie, cluster-maint, jseunghw, kgaillot, msmazova, phagara
Target Milestone:	rc	Keywords:	Triaged
Target Release:	9.0	Flags:	pm-rhel: mirror+
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	pacemaker-2.1.2-3.el9	Doc Type:	Enhancement
Doc Text:	Feature: Pacemaker now monitors its component subdaemons for IPC responsiveness. Reason: Previously, if a daemon stopped being responsive (for example, after receiving a SIGSTOP signal), the cluster might not detect any problem. Result: Now, Pacemaker will detect unresponsive subdaemons and recover them if necessary.	Story Points:	---
Clone Of:
Clones:	2031865 (view as bug list)		Environment:
Last Closed:	2022-05-17 12:20:40 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	2031865
Bug Blocks:

Description Klaus Wenninger 2019-05-08 14:49:16 UTC

Description of problem:

When the pacemaker-daemons are stalled using SIGSTOP this leads to

- very sluggish recovery when done on non-DC nodes
- no recovery at all when done on the DC 


Version-Release number of selected component (if applicable):

2.0.1-5.el8

How reproducible:

100%

Steps to Reproduce:
1. killall -STOP pacemaker-...
2.
3.

Actual results:

If this is done on the DC nothing happens at all.
If done on a non-DC node after a long timeout (few min) the node is discovered to be unclean and fenced.

Expected results:

Recovery actions (e.g. fencing)  are started after a couple of seconds.

Additional info:

When we are running sbd stalling 'based' leads to immediate recovery via sbd detecting that it can't get the node-state from pacemaker via cib (as long as sbd is used without shared disk - with the disk sbd would be content having access to the disk).

Comment 1 Klaus Wenninger 2019-05-08 16:23:56 UTC

Behaviour with stalling corosync-daemon is btw. a little different.
Left over nodes will form a new partition with a new DC that decides
then to fence the node with corosync stalled.
Of course stalling corosync will break the path from the cib of the
new DC back to the cib of the node with corosync stalled.
Thus when using sbd with watchdog-fencing pacemaker-watcher isn't
gonna read the 'unclean' state from the cib and thus won't trigger
self-fencing.
This is where

bz1702727 - sbd doesn't detect non-responsive corosync-daemon

comes into the game.

Comment 2 Patrik Hagara 2020-03-23 09:51:39 UTC

qa_ack+, reproducer in description and comment#1

Comment 16 Ken Gaillot 2021-11-22 16:51:14 UTC

It turns out this will require changes in libqb for a full fix. This bz might end up getting bumped to 8.7, or we might implement a partial fix for 8.6.

Comment 17 Ken Gaillot 2021-12-13 22:12:10 UTC

The fix for this depends on the libqb feature in Bug 2031865, which will likely land in 9.0 but not make RHEL 8 until 8.7, so this bz is being re-targeted to 9.0.

Comment 19 Ken Gaillot 2022-01-19 23:04:27 UTC

Fixed upstream as of commit 4b60aa100

Comment 23 Patrik Hagara 2022-02-25 11:05:23 UTC

before
======

> [root@virt-146 ~]# rpm -q pacemaker libqb
> pacemaker-2.1.0-8.el8.x86_64
> libqb-1.0.3-12.el8.x86_64
> [root@virt-146 ~]# pcs status
> Cluster name: STSRHTS15235
> Cluster Summary:
>   * Stack: corosync
>   * Current DC: virt-146 (version 2.1.0-8.el8-7c3f660707) - partition with quorum
>   * Last updated: Fri Feb 25 10:44:32 2022
>   * Last change:  Fri Feb 25 10:24:02 2022 by root via cibadmin on virt-144
>   * 3 nodes configured
>   * 3 resource instances configured
> 
> Node List:
>   * Online: [ virt-144 virt-145 virt-146 ]
> 
> Full List of Resources:
>   * fence-virt-144      (stonith:fence_xvm):     Started virt-144
>   * fence-virt-145      (stonith:fence_xvm):     Started virt-145
>   * fence-virt-146      (stonith:fence_xvm):     Started virt-146
> 
> Daemon Status:
>   corosync: active/enabled
>   pacemaker: active/enabled
>   pcsd: active/enabled
> [root@virt-146 ~]# killall -STOP pacemakerd pacemaker-based pacemaker-fenced pacemaker-execd pacemaker-attrd pacemaker-schedulerd pacemaker-controld
> [root@virt-146 ~]# ps faux | grep pacemaker
> root       50480  0.0  0.2 134948 10236 ?        Ts   10:23   0:00 /usr/sbin/pacemakerd
> haclust+   50481  0.0  0.5 156104 22728 ?        Ts   10:23   0:00  \_ /usr/libexec/pacemaker/pacemaker-based
> root       50482  0.0  0.3 154060 15440 ?        Ts   10:23   0:00  \_ /usr/libexec/pacemaker/pacemaker-fenced
> root       50483  0.0  0.2 116964 10176 ?        Ts   10:23   0:00  \_ /usr/libexec/pacemaker/pacemaker-execd
> haclust+   50484  0.0  0.2 145064 12360 ?        Ts   10:23   0:00  \_ /usr/libexec/pacemaker/pacemaker-attrd
> haclust+   50485  0.0  0.6 160532 26332 ?        Ts   10:23   0:00  \_ /usr/libexec/pacemaker/pacemaker-schedulerd
> haclust+   50486  0.0  0.4 202912 17812 ?        Ts   10:23   0:00  \_ /usr/libexec/pacemaker/pacemaker-controld
> root       56689  0.0  0.0  25980  3352 ?        S    10:58   0:00  \_ sh -c ps faux | grep pacemaker
> root       56691  0.0  0.0  12136  1044 ?        S    10:58   0:00      \_ grep pacemaker

result: minutes pass, stalled DC does not get fenced, other nodes log nothing at all.


after
=====

> [root@virt-499 ~]# rpm -q pacemaker libqb
> pacemaker-2.1.2-4.el9.x86_64
> libqb-2.0.3-7.el9.x86_64
> [root@virt-499 ~]# pcs status
> Cluster name: STSRHTS12845
> Cluster Summary:
>   * Stack: corosync
>   * Current DC: virt-499 (version 2.1.2-4.el9-ada5c3b36e2) - partition with quorum
>   * Last updated: Fri Feb 25 11:21:58 2022
>   * Last change:  Fri Feb 25 10:20:41 2022 by root via cibadmin on virt-497
>   * 3 nodes configured
>   * 3 resource instances configured
> 
> Node List:
>   * Online: [ virt-497 virt-498 virt-499 ]
> 
> Full List of Resources:
>   * fence-virt-497      (stonith:fence_xvm):     Started virt-497
>   * fence-virt-498      (stonith:fence_xvm):     Started virt-498
>   * fence-virt-499      (stonith:fence_xvm):     Started virt-499
> 
> Daemon Status:
>   corosync: active/enabled
>   pacemaker: active/enabled
>   pcsd: active/enabled
> [root@virt-499 ~]# killall -STOP pacemakerd pacemaker-based pacemaker-fenced pacemaker-execd pacemaker-attrd pacemaker-schedulerd pacemaker-controld
> [root@virt-499 ~]# ps faux | grep pacemaker
> root       62034  0.0  0.0   6416  2208 pts/0    S+   11:22   0:00              \_ grep --color=auto pacemaker
> root       54199  0.0  0.2  32312 11580 ?        Ts   10:20   0:01 /usr/sbin/pacemakerd
> haclust+   54200  0.0  0.6  49468 24768 ?        Ts   10:20   0:01  \_ /usr/libexec/pacemaker/pacemaker-based
> root       54201  0.0  0.4  41588 17456 ?        Ts   10:20   0:01  \_ /usr/libexec/pacemaker/pacemaker-fenced
> root       54202  0.0  0.3  26632 12200 ?        Ts   10:20   0:01  \_ /usr/libexec/pacemaker/pacemaker-execd
> haclust+   54203  0.0  0.3  39464 15280 ?        Ts   10:20   0:01  \_ /usr/libexec/pacemaker/pacemaker-attrd
> haclust+   54204  0.0  0.7  62092 28464 ?        Ts   10:20   0:01  \_ /usr/libexec/pacemaker/pacemaker-schedulerd
> haclust+   54205  0.0  0.4  90128 20088 ?        Ts   10:20   0:01  \_ /usr/libexec/pacemaker/pacemaker-controld

result: same as before the fix, rest of the cluster does not notice DC is stalled.

only after unblocking the pacemakerd process (but not the other pacemaker-{base,fence,exec,attr,scheduler,control}d daemons) using `killall -CONT pacemakerd`, the DC is finally fenced with a few seconds delay... (before this fix, unblocking pacemakerd on the DC had no effect, cluster remained in the "zombie" state)

still, this seems like only a marginal improvement compared to the previous behavior.


peeking at the code changes, i'm surprised this was implemented in a way that the DC's pacemakerd must be alive & well in order to detect the other pacemaker-*d daemon stalls.

@kgaillot is there any way for the other nodes (ie. not the DC itself) to detect that the DC's daemons are stalled? or is the pacemakerd code considered simple enough (read: practically impossible to deadlock/stall due to eg. disk/network/other blocking operations)?

Comment 24 Ken Gaillot 2022-02-25 16:10:22 UTC

> peeking at the code changes, i'm surprised this was implemented in a way
> that the DC's pacemakerd must be alive & well in order to detect the other
> pacemaker-*d daemon stalls.
> 
> @kgaillot is there any way for the other nodes (ie. not the DC
> itself) to detect that the DC's daemons are stalled? or is the pacemakerd
> code considered simple enough (read: practically impossible to
> deadlock/stall due to eg. disk/network/other blocking operations)?

That's correct, this fix applies only to the subdaemons, not pacemakerd itself. The idea is that clusters can use sbd to monitor pacemakerd itself. And of course systemd will respawn pacemakerd if it crashes.

Comment 25 Patrik Hagara 2022-02-25 16:17:16 UTC

moving to verified as per https://bugzilla.redhat.com/show_bug.cgi?id=1707851#c23 and https://bugzilla.redhat.com/show_bug.cgi?id=1707851#c24

Comment 27 errata-xmlrpc 2022-05-17 12:20:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (new packages: pacemaker), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2293