Bug 1869728

Summary:	sbd always triggers a reboot while with no-quorum-action=stop assuring that all resources are down within watchdog-timeout might be safe enough
Product:	Red Hat Enterprise Linux 8	Reporter:	Klaus Wenninger <kwenning>
Component:	sbd	Assignee:	Klaus Wenninger <kwenning>
Status:	CLOSED WONTFIX	QA Contact:	cluster-qe <cluster-qe>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	8.3	CC:	cfeist, cluster-maint, kgaillot
Target Milestone:	rc	Keywords:	Triaged
Target Release:	8.4	Flags:	pm-rhel: mirror+
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-02-18 07:27:18 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Klaus Wenninger 2020-08-18 14:08:02 UTC

Description of problem:
sbd always triggers a reboot while with no-quorum-action=stop assuring that all resources are down within watchdog-timeout might be safe enough

Version-Release number of selected component (if applicable):
sbd-1.4.0-7.el8

How reproducible:
100%

Steps to Reproduce:
1. Setup simple 3-node-cluster with resources that shut down quickly
2. Configure watchdog-fencing with SBD
3. Set no-quorum-action=stop (or leave it to default)
3. Disconnect the nodes from each other

Actual results:
all 3 nodes will reboot after watchdog-timeout

Expected results:
cluster will try to shutdown resources and if successful within watchdog-timeout nodes won't reboot

Additional info:

Comment 1 Klaus Wenninger 2020-08-31 09:00:30 UTC

Discussion

Comment 2 Klaus Wenninger 2020-08-31 09:18:38 UTC

Discussion revealed that rejoining such a node that has successfully stopped resources within the given timeout won't mimic a fence-reboot well enough for the rest of the cluster. Examples are e.g. leftover transient attributes and in general this would probably impose some burden on testing as we'd somehow have to ensure behavior being comparable to after a reboot.

In general we might think over use-cases where it really is that desirable to prevent a reboot.

A list for brain-storming could start with:
- certain server-hardware is quite slow on a reboot while a quorum-loss might go away quickly and we could recover the cluster quicker
- it is always a pain for an admin to find a shell he was using to observe the node behavior to be starved/closed because of a reboot
- the node might run services outside of pacemaker-control that would be unnecessarily affected
- ...
These arguments are valid for most cluster-scenarios but might be more annoying with watchdog fencing as we might expect issues happening more frequently.
Should a cluster-node run anything but services under pacemaker control - maybe not - maybe there are reasons why it makes sense ...

Another possibility that came to my mind was introduction of a new no-quorum-policy=shutdown (or whatever imposes less risk of missunderstanding) that would make the node attempt a graceful pacemaker-shutdown. SBD would again allow watchdog-timeout for this to happen and if it detects a graceful-shutdown of pacemaker (without resources running - meaning not in maintenance mode) it would be content and not trigger an actual reboot.
Like this from a testing-perspective we would have the same case as a manual service-stop/start.

Comment 3 Ken Gaillot 2020-08-31 16:47:57 UTC

(In reply to Klaus Wenninger from comment #2)
> Discussion revealed that rejoining such a node that has successfully stopped
> resources within the given timeout won't mimic a fence-reboot well enough
> for the rest of the cluster. Examples are e.g. leftover transient attributes
> and in general this would probably impose some burden on testing as we'd
> somehow have to ensure behavior being comparable to after a reboot.

It's a tough question. There's no way to mimic what happens with nodes not using sbd:

- If any nodes retain quorum, they will fence the nodes without quorum.

- Any node that loses quorum will stop resources if it's able, but leave pacemaker running so it can rejoin the cluster if quorum is regained before fencing is scheduled against it.

The problem of course is that sbd can't know if any other nodes retain quorum, so it has to fence to be safe.

As you suggest, if we can absolutely guarantee that all resources are stopped, and pacemaker and corosync are restarted, then perhaps fencing should be considered unnecessary. On the other hand, sbd can't guarantee that pacemaker and corosync behave correctly once restarted, which may violate assumptions held by any surviving partition. We could stop pacemaker and corosync instead of restarting them, but then the node can't rejoin if quorum is regained, so the only practical benefit is less chance of losing logs.

> In general we might think over use-cases where it really is that desirable
> to prevent a reboot.
> 
> A list for brain-storming could start with:
>   - certain server-hardware is quite slow on a reboot while a quorum-loss
> might go away quickly and we could recover the cluster quicker

Just brainstorming, what about a separate quorum loss timeout? If pacemaker detects sbd running and sees a quorum loss timeout from the sbd sysconfig, it would wait that long before declaring the node fencing successful. The timeout would have to be identical on all nodes.

That would slow down quorum recovery for the chance of the node rejoining more quickly. Users would have to balance the two concerns.

>   - it is always a pain for an admin to find a shell he was using to observe
> the node behavior to be starved/closed because of a reboot
>   - the node might run services outside of pacemaker-control that would be
> unnecessarily affected

I don't think that's an issue since fencing is always a possibility, so the admin must already incorporate that into any policy regarding non-clustered services.

>   - ...
> These arguments are valid for most cluster-scenarios but might be more
> annoying with watchdog fencing as we might expect issues happening more
> frequently.
> Should a cluster-node run anything but services under pacemaker control -
> maybe not - maybe there are reasons why it makes sense ...
> 
> Another possibility that came to my mind was introduction of a new
> no-quorum-policy=shutdown (or whatever imposes less risk of
> missunderstanding) that would make the node attempt a graceful
> pacemaker-shutdown. SBD would again allow watchdog-timeout for this to
> happen and if it detects a graceful-shutdown of pacemaker (without resources
> running - meaning not in maintenance mode) it would be content and not
> trigger an actual reboot.
> Like this from a testing-perspective we would have the same case as a manual
> service-stop/start.

Per above, I think the problem is that either the node can't rejoin if quorum is regained, or we risk corosync/pacemaker operating without any observation or check from the quorate partition.

Comment 8 RHEL Program Management 2022-02-18 07:27:18 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.