Description of problem: The management of "recovery sets" in groupd isn't smart enough to handle at least a couple situations where a node fails, returns and fails again before recovery is done for the first failure. Example 1 observed on smoke cluster once: salem killed from revolver fenced fences salem winston killed by revolver fenced fences winston merit killed by revolver quorum lost, fencing delayed winston rejoins cluster quorum regained, fenced continues fenced begins fencing merit merit rejoins cluster fence_apc against merit fails (reason unknown) fence_apc against merit succeeds (on retry) merit down, killed by fencing fenced reports success against merit recovery for merit is initiated while the earlier recovery for merit is still in progress Example 2: - groupd process (with no current groups) is killed, this causes a recovery set to be created (on another node) from the cpg callback, but since the node didn't die there will never be a cman nodedown callback to clear away the recovery set - groupd is restarted - groupd process is killed again, this tries to create a recovery set for the node again from the cpg callback, this triggers an assertion failure. We need groupd to distinguish between groupd on a remote node failing due to the node going down vs just the groupd process exiting. If the process exits and the node wasn't in any groups we don't care about it and don't want a recovery set; if the node _was_ in any groups we need to kill the node via cman_kill_node() so we'll get a proper cman nodedown and can go through recovery for it. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
A package has been built which should help the problem described in this bug report. This report is therefore being closed with a resolution of CURRENTRELEASE. You may reopen this bug report if the solution does not work for you.
Moving all RHCS ver 5 bugs to RHEL 5 so we can remove RHCS v5 which never existed.