Description of problem: openais-0.80.3-22.el5_3.7 Nate found this bug running revolver on 5 nodes, when two or three nodes were killed at the same time. The groupd logs show that cpg delivers the confchg's for the killed nodes in different orders. The first time, nodes 1,3,4 were killed, leaving 2,5. nodeid 2 got confchg order: 4,3,1 nodeid 5 got confchg order: 1,3,4 The second time, nodes 2,5 were killed, leaving 1,3,4. 1 got 2,5 3 got 5,2 4 got 5,2 Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
The result of this bug is that after a cluster failure, all cluster services will be stuck because recovery can not complete. The entire cluster needs to be rebooted to recover from this scenario.
changed to 5.4, all archs, urgent, urgent.
one liner patch in testing now, assuming it fixes the problem this is a serious regression in the 5.4 version.
5.4 regression.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-1366.html