Bug 504195 - cpg confchg's delivered in different order
Summary: cpg confchg's delivered in different order
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: openais
Version: 5.4
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Steven Dake
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks: 504867
TreeView+ depends on / blocked
 
Reported: 2009-06-04 17:14 UTC by David Teigland
Modified: 2016-04-26 14:26 UTC (History)
4 users (show)

Fixed In Version: openais-0.80.6-3.el5_4
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-09-02 11:30:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2009:1366 0 normal SHIPPED_LIVE openais bug-fix and enhancement update 2009-09-01 11:00:17 UTC

Description David Teigland 2009-06-04 17:14:26 UTC
Description of problem:

openais-0.80.3-22.el5_3.7

Nate found this bug running revolver on 5 nodes, when two or three nodes were killed at the same time.

The groupd logs show that cpg delivers the confchg's for the killed nodes in different orders.

The first time, nodes 1,3,4 were killed, leaving 2,5.
nodeid 2 got confchg order: 4,3,1
nodeid 5 got confchg order: 1,3,4

The second time, nodes 2,5 were killed, leaving 1,3,4.
1 got 2,5
3 got 5,2
4 got 5,2


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Nate Straz 2009-06-04 20:09:54 UTC
The result of this bug is that after a cluster failure, all cluster services will be stuck because recovery can not complete.  The entire cluster needs to be rebooted to recover from this scenario.

Comment 2 Steven Dake 2009-06-04 21:32:19 UTC
changed to 5.4, all archs, urgent, urgent.

Comment 3 Steven Dake 2009-06-04 21:33:34 UTC
one liner patch in testing now, assuming it fixes the problem this is a serious regression in the 5.4 version.

Comment 4 Steven Dake 2009-06-05 16:45:56 UTC
5.4 regression.

Comment 10 errata-xmlrpc 2009-09-02 11:30:12 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1366.html


Note You need to log in before you can comment on or make changes to this bug.