Bug 1449155
| Summary: | Influence Fencing Direction Dynamically Instead of Static Fence Delay | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Daniel Peess <dpeess> |
| Component: | pacemaker | Assignee: | Klaus Wenninger <kwenning> |
| Status: | CLOSED DUPLICATE | QA Contact: | cluster-qe <cluster-qe> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.4 | CC: | abeekhof, cfeist, cluster-maint, jruemker, kgaillot, kwenning, mreinke |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-10-11 15:56:03 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Daniel Peess
2017-05-09 10:03:27 UTC
According to Bug 1240330, as of 7.3, "pcs resource disable" prevents the cluster from using a fence device at all, and a negative location constraint for a fence device on a node prevents that node from using that device (which is what you're asking here). However, I'm not sure offhand whether that's a firm guarantee that applies in all cases (e.g. external fencing by stonith_admin as well as cluster-initiated fencing). We'll have to confirm that. I suspect a race condition in your approach of using node attributes to set location constraints. The monitor+dampening delay you mentioned means that if one node loses connectivity to both the other node and the outside world, it might decide to fence the other node before its next monitor disables the device. One approach I had thought about long ago but never tried, was to have a fence agent that existed solely to insert a delay if needed, e.g.: pcs stonith level add 1 node1 fence_heuristics,fence_node1 The cluster would run fence_heuristics first. That agent would do whatever tests are desired (such as pinging an IP), and always return success, either immediately (allowing fence_node1 to be called without delay) or after sleeping a while (to give node1 a chance to win a race). That would avoid race conditions, as the ping would always be done exactly and only when needed. The downsides I see are (1) there would always be some delay due to the test time, (2) people could misuse the agent by itself as a dummy device, and (3) it doesn't exist. :-) hi ken, (In reply to Ken Gaillot from comment #10) > According to Bug 1240330, as of 7.3, "pcs resource disable" prevents the > cluster from using a fence device at all, and a negative location constraint > for a fence device on a node prevents that node from using that device > (which is what you're asking here). great, thank you for the reference, appreciated. 7.3, just in time. > I suspect a race condition in your approach of using node attributes to set > location constraints. The monitor+dampening delay you mentioned means that > if one node loses connectivity to both the other node and the outside world, > it might decide to fence the other node before its next monitor disables the > device. yes, i know, even more correlated timeouts you have to calculate correctly. if there were *currently* anything better, i would gratefully grade up. > One approach I had thought about long ago but never tried, was to have a > fence agent that existed solely to insert a delay if needed, e.g.: > > pcs stonith level add 1 node1 fence_heuristics,fence_node1 AFAIK if a fence agent returns success, there's no reason for the cluster to go down to try the next level. it would always have to return false. so if someone would forget to configure a deeper level it wouldn't hurt. no correlated timeouts, so way more elegant and robust then my solution. but you can't have the 'do not fence others at all because i'm broken/unreachable' by using fence levels, hence my constraint approach. solved. (In reply to Daniel Peess from comment #11) > hi ken, > > (In reply to Ken Gaillot from comment #10) > > One approach I had thought about long ago but never tried, was to have a > > fence agent that existed solely to insert a delay if needed, e.g.: > > > > pcs stonith level add 1 node1 fence_heuristics,fence_node1 > > AFAIK if a fence agent returns success, there's no reason for the cluster to > go down to try the next level. it would always have to return false. > so if someone would forget to configure a deeper level it wouldn't hurt. > no correlated timeouts, so way more elegant and robust then my solution. > but you can't have the 'do not fence others at all because i'm > broken/unreachable' by using fence levels, hence my constraint approach. Good point, I was thinking of a level with two devices (which means both devices must return success). But if you did two levels, the heuristics agent could always fail, which would prevent it from being used alone as a dummy. While that wouldn't prevent fencing entirely, it could put a long enough delay to ensure that some other node could fence it first. If you went with my first approach (two devices in the same level), you could make the heuristics agent fail instead of delay, and that would prevent fencing altogether. (In reply to Ken Gaillot from comment #12) > > While that wouldn't prevent fencing entirely, it could put a long enough > delay to ensure that some other node could fence it first. If you went with > my first approach (two devices in the same level), you could make the > heuristics agent fail instead of delay, and that would prevent fencing > altogether. Wouldn't it still fall back to the next lower-prio level? Being watchdog-fencing ... (In reply to Klaus Wenninger from comment #13) > (In reply to Ken Gaillot from comment #12) > > > > > While that wouldn't prevent fencing entirely, it could put a long enough > > delay to ensure that some other node could fence it first. If you went with > > my first approach (two devices in the same level), you could make the > > heuristics agent fail instead of delay, and that would prevent fencing > > altogether. > > Wouldn't it still fall back to the next lower-prio level? > Being watchdog-fencing ... My idea was fence_node1 would be a fence_sbd device in this case. So, fence_sbd would never be called if fence_heuristics failed, and there wouldn't be any other topology levels configured. Would sbd fall back to watchdog-only in such a case? If so, that would be a problem, but I would think the proposed constraint-based solution would have the same issue. (In reply to Ken Gaillot from comment #14) > > Would sbd fall back to watchdog-only in such a case? If so, that would be a > problem, but I would think the proposed constraint-based solution would have > the same issue. I have to verify but it is registered as invisible stonith device ... (In reply to Ken Gaillot from comment #14) > > Would sbd fall back to watchdog-only in such a case? If so, that would be a > problem, but I would think the proposed constraint-based solution would have > the same issue. With other fencing-devices either disabled or banned from the nodes available current behaviour is definitely a fallback to watchdog-fencing. Haven't tried with level explicitly but I guess that shouldn't make a difference. This fallback-behaviour can be prevented by setting pcs property set stonith-watchdog-timeout=0 That is working at least for 7.4 while the message "Relying on watchdog integration for fencing" is replaced by "Watchdog will be used via SBD if fencing is required". Rather search for "Watchdog may be enabled but stonith-watchdog-timeout is disabled" On testing I found an issue: Once the fencing-device had been started in the cluster at a time in the past it is still being used even when at the moment of fencing ban-rules are effective that keep it from being started right at that moment. That behaviour is consistent with the output of 'stonith_admin -L'. That probably would prevent usage of a location-rule based on a dynamically adapted attribute to control fencing. @Klaus, @Ken: To which package we should re-assign this bug. It is clearly problem on a higher level than fence agent itself. (In reply to Marek Grac from comment #17) > @Klaus, @Ken: To which package we should re-assign this bug. It is clearly > problem on a higher level than fence agent itself. It depends on what we decide to do about it, but most likely pacemaker if anything, so reassigning for now A slight variation of the level-based idea Ken had turned out to be successful in the use-case that triggered this bug. Development of a fencing-agent for that very use-case is handled by bz1476401. Short description of the scenario is a 2-node-cluster using SBD with a single disk (actually a replicated-solution - but for SBD it is a single disk). It is seen unnecessary and even unwanted (imagine short networking hickups) that a node that itself doesn't have proper networking connectivity to provide services would trigger the partner node to die. So a connectivity-check is being done in a separate fencing-agent which returns success if own connectivity is OK and ERROR if not. If SBD is added to the same fencing level fencing via SBD would then just be triggerd if connectivity is OK. The basic principle behind that solution is actually generic: - check my own fitness for providing a service first - if I would be able to provide a service try to fence the partner-node - otherwise rather not fence the partner-node to prevent an unnecessary reboot of a node that might possibly be able to provide the service right away - if we expect the inability to provide a service short-time in comparison to the time it would take to reboot a node and bring up services just wait for a certain time and delay the decision Thus at the moment it seems desirable to encourage adding further fence-agents like fence_heuristics_ping that do some kind of generic fitness-check of a node. It would even make sense to concatenate them on a fencing-level to sort out the nodes that pass multiple fitness-checks. Particularly for situations where an additional node can be spared to run qnetd-service on, heuristics for corosync are a topic to consider (bz1413573). As a "heuristics only" model is considered there as well the reference might be valuable for strict 2-node-cases (no additional node of any kind available) alike. *** This bug has been marked as a duplicate of bug 1476401 *** |