Description of problem: If the cluster quorum dissolves, all remaining nodes halt services (as expected). If one or more services fails to stop cleanly, resources could still be allocated on the owners of those services. When a new quorum forms, it's conceivable that the initialization (e.g. stop-before-start) would also fail, since the previous stop attempt failed. If a service has an ordered failover domain configured and, when the new quorum forms, a higher-priority node than the previous owner is a part of the new quorum, the service will be started on the new, higher priority node. At this point, there is a potential for resources to be allocated on multiple members. Version-Release number of selected component (if applicable): 1.2.3, 1.2.9, 1.2.12, 1.2.16 How reproducible: Theoretical problem. Solution: Reboot the node immediately if it can not free clustered resources after a loss of quorum.
Created attachment 103390 [details] Fix as described.
1.2.18pre1 patch (unsupported; test only, etc.) http://people.redhat.com/lhh/clumanager-1.2.16-1.2.18pre1.patch This includes the fix for this bug and a few others.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-491.html