Bug 437867
| Summary: | Fence storm with fence_egenera | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Retired] Red Hat Cluster Suite | Reporter: | Bryn M. Reeves <bmr> | ||||
| Component: | fence | Assignee: | Jim Parsons <jparsons> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Cluster QE <mspqa-list> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 4 | CC: | bstevens, cfeist, cluster-maint, cww, edamato, lpleiman, mgrac, nstraz, tao | ||||
| Target Milestone: | --- | Keywords: | ZStream | ||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | fence-1.32.65-1.el4 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2009-06-10 16:31:08 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 459501 | ||||||
| Attachments: |
|
||||||
|
Description
Bryn M. Reeves
2008-03-17 19:59:07 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Once a patch has been provided, the customer has agreed to test. Oh, nevermind! Above comment says they will test for us ;) Moving this out to 4.8, and we need a way to reproduce this issue. As we know, the egenera has redundant control blades that can power on/off the individual blades. The cluster configuration here has been configured to fence a machine twice, once from each cblade. This is creating a race condition between fence_egenera script on each one. There are two solutions to this the way that I see it. 1. Fix configuration to fence from the second control blade only if the attempted fence failed on the first control blade. 2. Fix issue in fence_egenera that causes race condition in the first place. It appears that the script does not understand what to do with the "Booting" status, so it reboots the node. This can be fixed by either waiting until the "Booting" status changes to something that it recognizes or just go ahead and force a reboot in the "Booting" stage. In either case it must return success at the end to prevent the current condition. This event sent from IssueTracker by calvin_g_smith issue 164929 Attached is a patch for the fence_egenera script that recognizes if the machine is already booting up and returns success if that is the case. This event sent from IssueTracker by calvin_g_smith issue 164929 it_file 147843 Created attachment 313993 [details]
test boot watch patch
This is fixed in RHEL4.8 in fence-1.32.65-1.el4 and beyond. |