1449155 – Influence Fencing Direction Dynamically Instead of Static Fence Delay

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1449155 - Influence Fencing Direction Dynamically Instead of Static Fence Delay

Summary: Influence Fencing Direction Dynamically Instead of Static Fence Delay

Keywords:
Status:	CLOSED DUPLICATE of bug 1476401
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	pacemaker
Sub Component:
Version:	7.4
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Klaus Wenninger
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-05-09 10:03 UTC by Daniel Peess
Modified:	2020-12-14 08:38 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-10-11 15:56:03 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Daniel Peess 2017-05-09 10:03:27 UTC

Description of problem:

IHAC who wants to only fence nodes by nodes that still are able to reach their service gateway.

2-Node cluster, heartbeat split-brain,
which node shall fence the other node first?
To prevent fence races, the delays have to be different.
You can use a random fence delay, but then you can't predict which node will fence first.

You can already define service gateways (pingd) to place your resources according to the (best) connectivity of the node,
 but you can't use these scores as replacement for static fence delays (yet).
delay='XY' only accepts static values, you can't pass something like attribute_score variable integers, at least fence_sbd.

Instead, i've used the pingd resource to create a -infinity location constraint for the 2 individual fence agents (2xfence_sbd) running one on each node.
as long as the service gateway is gone the fencing agent is stopped.
(with some delay: monitoring interval, dampen).

When the node loses its connectivity it stops its fence agent,
and therefore did not fence the other node at all, great.
if both nodes lose their service gateway, they do not start to fence each other.

one can argue that a node losing its own service gateway might want to restart as the loss might be related to its own integrated malfunctioning components.
but why should a node fence the other node although it knows it has lost its own connectivity?

basically i've re-implemented the qdisk + heuristics with fence_sbd + pingd as long as qdevice heuristics are out of reach.
the downside is you have to be aware that the pingd monitor interval is lower then your stonith timeout, the cluster may only start its stonith procedure when the fencing agent has already been stopped.

the question is,
can i rely on stopped fencing agents to prevent their utilization?
It works for RHEL-HA 7.3, but is that intentional or pure luck?

Comment 10 Ken Gaillot 2017-05-10 19:42:30 UTC

According to Bug 1240330, as of 7.3, "pcs resource disable" prevents the cluster from using a fence device at all, and a negative location constraint for a fence device on a node prevents that node from using that device (which is what you're asking here).

However, I'm not sure offhand whether that's a firm guarantee that applies in all cases (e.g. external fencing by stonith_admin as well as cluster-initiated fencing). We'll have to confirm that.

I suspect a race condition in your approach of using node attributes to set location constraints. The monitor+dampening delay you mentioned means that if one node loses connectivity to both the other node and the outside world, it might decide to fence the other node before its next monitor disables the device.

One approach I had thought about long ago but never tried, was to have a fence agent that existed solely to insert a delay if needed, e.g.:

pcs stonith level add 1 node1 fence_heuristics,fence_node1

The cluster would run fence_heuristics first. That agent would do whatever tests are desired (such as pinging an IP), and always return success, either immediately (allowing fence_node1 to be called without delay) or after sleeping a while (to give node1 a chance to win a race). That would avoid race conditions, as the ping would always be done exactly and only when needed.

The downsides I see are (1) there would always be some delay due to the test time, (2) people could misuse the agent by itself as a dummy device, and (3) it doesn't exist. :-)

Comment 11 Daniel Peess 2017-05-11 08:55:52 UTC

hi ken,

(In reply to Ken Gaillot from comment #10)
> According to Bug 1240330, as of 7.3, "pcs resource disable" prevents the
> cluster from using a fence device at all, and a negative location constraint
> for a fence device on a node prevents that node from using that device
> (which is what you're asking here).

great, thank you for the reference, appreciated. 7.3, just in time.

> I suspect a race condition in your approach of using node attributes to set
> location constraints. The monitor+dampening delay you mentioned means that
> if one node loses connectivity to both the other node and the outside world,
> it might decide to fence the other node before its next monitor disables the
> device.

yes, i know, even more correlated timeouts you have to calculate correctly.
if there were *currently* anything better, i would gratefully grade up.

> One approach I had thought about long ago but never tried, was to have a
> fence agent that existed solely to insert a delay if needed, e.g.:
> 
>     pcs stonith level add 1 node1 fence_heuristics,fence_node1

AFAIK if a fence agent returns success, there's no reason for the cluster to go down to try the next level. it would always have to return false.
so if someone would forget to configure a deeper level it wouldn't hurt.
no correlated timeouts, so way more elegant and robust then my solution.
but you can't have the 'do not fence others at all because i'm broken/unreachable' by using fence levels, hence my constraint approach.

solved.

Comment 12 Ken Gaillot 2017-05-11 14:52:38 UTC

(In reply to Daniel Peess from comment #11)
> hi ken,
> 
> (In reply to Ken Gaillot from comment #10)
> > One approach I had thought about long ago but never tried, was to have a
> > fence agent that existed solely to insert a delay if needed, e.g.:
> > 
> >     pcs stonith level add 1 node1 fence_heuristics,fence_node1
> 
> AFAIK if a fence agent returns success, there's no reason for the cluster to
> go down to try the next level. it would always have to return false.
> so if someone would forget to configure a deeper level it wouldn't hurt.
> no correlated timeouts, so way more elegant and robust then my solution.
> but you can't have the 'do not fence others at all because i'm
> broken/unreachable' by using fence levels, hence my constraint approach.

Good point, I was thinking of a level with two devices (which means both devices must return success). But if you did two levels, the heuristics agent could always fail, which would prevent it from being used alone as a dummy.

While that wouldn't prevent fencing entirely, it could put a long enough delay to ensure that some other node could fence it first. If you went with my first approach (two devices in the same level), you could make the heuristics agent fail instead of delay, and that would prevent fencing altogether.

Comment 13 Klaus Wenninger 2017-05-11 15:23:06 UTC

(In reply to Ken Gaillot from comment #12)

> 
> While that wouldn't prevent fencing entirely, it could put a long enough
> delay to ensure that some other node could fence it first. If you went with
> my first approach (two devices in the same level), you could make the
> heuristics agent fail instead of delay, and that would prevent fencing
> altogether.

Wouldn't it still fall back to the next lower-prio level?
Being watchdog-fencing ...

Comment 14 Ken Gaillot 2017-05-11 15:33:07 UTC

(In reply to Klaus Wenninger from comment #13)
> (In reply to Ken Gaillot from comment #12)
> 
> > 
> > While that wouldn't prevent fencing entirely, it could put a long enough
> > delay to ensure that some other node could fence it first. If you went with
> > my first approach (two devices in the same level), you could make the
> > heuristics agent fail instead of delay, and that would prevent fencing
> > altogether.
> 
> Wouldn't it still fall back to the next lower-prio level?
> Being watchdog-fencing ...

My idea was fence_node1 would be a fence_sbd device in this case. So, fence_sbd would never be called if fence_heuristics failed, and there wouldn't be any other topology levels configured.

Would sbd fall back to watchdog-only in such a case? If so, that would be a problem, but I would think the proposed constraint-based solution would have the same issue.

Comment 15 Klaus Wenninger 2017-05-11 15:39:29 UTC

(In reply to Ken Gaillot from comment #14)

> 
> Would sbd fall back to watchdog-only in such a case? If so, that would be a
> problem, but I would think the proposed constraint-based solution would have
> the same issue.

I have to verify but it is registered as invisible stonith device ...

Comment 16 Klaus Wenninger 2017-05-16 13:29:34 UTC

(In reply to Ken Gaillot from comment #14)

> 
> Would sbd fall back to watchdog-only in such a case? If so, that would be a
> problem, but I would think the proposed constraint-based solution would have
> the same issue.

With other fencing-devices either disabled or banned from the nodes available current behaviour is definitely a fallback to watchdog-fencing.
Haven't tried with level explicitly but I guess that shouldn't make a difference.

This fallback-behaviour can be prevented by setting
  pcs property set stonith-watchdog-timeout=0
That is working at least for 7.4 while the message
  "Relying on watchdog integration for fencing"
is replaced by
  "Watchdog will be used via SBD if fencing is required".
Rather search for
  "Watchdog may be enabled but stonith-watchdog-timeout is disabled"


On testing I found an issue:

Once the fencing-device had been started in the cluster at a time in the past it is still being used even when at the moment of fencing ban-rules are effective that keep it from being started right at that moment.
That behaviour is consistent with the output of 'stonith_admin -L'.
That probably would prevent usage of a location-rule based on a dynamically adapted attribute to control fencing.

Comment 17 Marek Grac 2017-05-17 17:46:20 UTC

@Klaus, @Ken: To which package we should re-assign this bug. It is clearly problem on a higher level than fence agent itself.

Comment 18 Ken Gaillot 2017-05-17 21:04:00 UTC

(In reply to Marek Grac from comment #17)
> @Klaus, @Ken: To which package we should re-assign this bug. It is clearly
> problem on a higher level than fence agent itself.

It depends on what we decide to do about it, but most likely pacemaker if anything, so reassigning for now

Comment 19 Klaus Wenninger 2017-10-11 15:56:03 UTC

A slight variation of the level-based idea Ken had turned out to be successful in the use-case that triggered this bug.

Development of a fencing-agent for that very use-case is handled by bz1476401.

Short description of the scenario is a 2-node-cluster using SBD with a single disk (actually a replicated-solution - but for SBD it is a single disk).
It is seen unnecessary and even unwanted (imagine short networking hickups) that a node that itself doesn't have proper networking connectivity to provide services would trigger the partner node to die.
So a connectivity-check is being done in a separate fencing-agent which returns success if own connectivity is OK and ERROR if not.
If SBD is added to the same fencing level fencing via SBD would then just be triggerd if connectivity is OK.

The basic principle behind that solution is actually generic:

- check my own fitness for providing a service first
- if I would be able to provide a service try to fence the partner-node
- otherwise rather not fence the partner-node to prevent an unnecessary reboot
of a node that might possibly be able to provide the service right away
- if we expect the inability to provide a service short-time in comparison
to the time it would take to reboot a node and bring up services just wait
for a certain time and delay the decision

Thus at the moment it seems desirable to encourage adding further fence-agents like fence_heuristics_ping that do some kind of generic fitness-check of a node.
It would even make sense to concatenate them on a fencing-level to sort out the nodes that pass multiple fitness-checks.

Particularly for situations where an additional node can be spared to run qnetd-service on, heuristics for corosync are a topic to consider (bz1413573).
As a "heuristics only" model is considered there as well the reference might be valuable for strict 2-node-cases (no additional node of any kind available) alike.

*** This bug has been marked as a duplicate of bug 1476401 ***

Note You need to log in before you can comment on or make changes to this bug.