Bug 435189
| Summary: | fenced admin override does not update cman, preventing rgmanager recovery | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Lon Hohberger <lhh> | ||||
| Component: | cman | Assignee: | Lon Hohberger <lhh> | ||||
| Status: | CLOSED ERRATA | QA Contact: | GFS Bugs <gfs-bugs> | ||||
| Severity: | low | Docs Contact: | |||||
| Priority: | low | ||||||
| Version: | 5.2 | CC: | cluster-maint, edamato, ricardo.arguello, teigland | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | RHBA-2008-0347 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2008-05-21 15:58:52 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Created attachment 296116 [details]
Fix
Test packages (sorry, x86_64 + srpms only) here: http://people.redhat.com/lhh/cman-2.0.73-1.el5.5.1lhh.src.rpm http://people.redhat.com/lhh/cman-2.0.73-1.el5.5.1lhh.x86_64.rpm Other workaround includes configuring manual fencing (blech) and adding it as a second fence level for each cluster node. Pushed to git in rhel5/master branches. Mar 27 15:30:43 molly clurgmgrd[3942]: <info> Waiting for node #1 to be fenced
Mar 27 15:32:18 molly fenced[1603]: agent "fence_xvm" reports: Timed out waiting
for response
Mar 27 15:32:18 molly fenced[1603]: fence "frederick" failed
...
[root@molly ~]# echo frederick > /var/run/cluster/fenced_override
...
Mar 27 15:32:21 molly fenced[1603]: fence "frederick" overridden by
administrator intervention
Mar 27 15:32:21 molly clurgmgrd[3942]: <info> Node #1 fenced; continuing
Mar 27 15:32:22 molly clurgmgrd[3942]: <notice> Taking over service service:test
from down member frederick
Mar 27 15:32:22 molly clurgmgrd[3942]: <notice> Service service:test started
...
[root@molly ~]# cman_tool nodes
Node Sts Inc Joined Name
0 M 0 2008-03-27 14:17:26 /dev/xvdb1
1 X 1032 frederick
2 M 1020 2008-03-27 14:17:12 molly
[root@molly ~]# cman_tool nodes -f
Node Sts Inc Joined Name
0 M 0 2008-03-27 14:17:26 /dev/xvdb1
1 X 1032 frederick
Last fenced: 2008-03-27 15:27:55 by override
2 M 1020 2008-03-27 14:17:12 molly
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0347.html |
Description of problem: If you override failed fencing, the member is never considered fenced: 1 M 4 2008-02-27 16:10:38 node1.local 2 X 16 node2.local Node has not been fenced since it went down This prevents rgmanager from recovering, since it waits for fencing. It should say: 1 M 4 2008-02-27 16:10:38 node1.local 2 X 16 node2.local Last fenced: 2008-02-27 15:24:16 by override Version-Release number of selected component (if applicable): Current How reproducible: 100% Steps to Reproduce: 1. Configure fencing 2. Take a node's fencing device down 3. Take the node down 4. Issue override Actual results: Override works, but rgmanager doesn't recover services. Restarting rgmanager will cause services to be started. Expected results: Rgmanager should recover services immediately after the override completes.