Bug 504195

Summary: cpg confchg's delivered in different order
Product: Red Hat Enterprise Linux 5 Reporter: David Teigland <teigland>
Component: openaisAssignee: Steven Dake <sdake>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 5.4CC: cluster-maint, edamato, nstraz, syeghiay
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: openais-0.80.6-3.el5_4 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-02 11:30:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 504867    

Description David Teigland 2009-06-04 17:14:26 UTC
Description of problem:

openais-0.80.3-22.el5_3.7

Nate found this bug running revolver on 5 nodes, when two or three nodes were killed at the same time.

The groupd logs show that cpg delivers the confchg's for the killed nodes in different orders.

The first time, nodes 1,3,4 were killed, leaving 2,5.
nodeid 2 got confchg order: 4,3,1
nodeid 5 got confchg order: 1,3,4

The second time, nodes 2,5 were killed, leaving 1,3,4.
1 got 2,5
3 got 5,2
4 got 5,2


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Nate Straz 2009-06-04 20:09:54 UTC
The result of this bug is that after a cluster failure, all cluster services will be stuck because recovery can not complete.  The entire cluster needs to be rebooted to recover from this scenario.

Comment 2 Steven Dake 2009-06-04 21:32:19 UTC
changed to 5.4, all archs, urgent, urgent.

Comment 3 Steven Dake 2009-06-04 21:33:34 UTC
one liner patch in testing now, assuming it fixes the problem this is a serious regression in the 5.4 version.

Comment 4 Steven Dake 2009-06-05 16:45:56 UTC
5.4 regression.

Comment 10 errata-xmlrpc 2009-09-02 11:30:12 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1366.html