Bug 639018

Summary: [RHEL6] active cluster nodes with higher config version gets killed
Product: Red Hat Enterprise Linux 6 Reporter: Debbie Johnson <dejohnso>
Component: clusterAssignee: Fabio Massimo Di Nitto <fdinitto>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: urgent    
Version: 6.0CC: bkahn, ccaulfie, cluster-maint, grimme, hlawatschek, lhh, ndoane, rpeterso, ssaha, teigland
Target Milestone: rcKeywords: RHELNAK, ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: cluster-3.0.12-24.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-19 12:54:13 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 639958    

Description Debbie Johnson 2010-09-30 16:00:42 UTC
Description of problem:
In 2-node cluster configuration we observed the following problem:

NodeA is an active cluster member with cluster config version X
NodeB has cluster config version X-1

If NodeB tries to join the cluster, NodeA gets killed and NodeB cannot join the cluster.

We assume that this is a critical bug. The expectation is that NodeA survives and NodeB recieves the latest cluster configuration from NodeA and is able to join the cluster. 

Version-Release number of selected component (if applicable):
RHEL6 BETA

How reproducible:


Steps to Reproduce:

2 node cluster (rhel6) 
lilc052a-ics0 192.168.51.1 
lilc052b-ics0 192.168.51.2

Date of the test: Sep 20 16:54:56 

Quoting customer: 

""" NodeA is an active cluster member with cluster config version X NodeB has cluster config version X-1 If NodeB tries to join the cluster, NodeA gets killed and NodeB cannot join the cluster. We assume that this is a critical bug. The expectation is that NodeA survives and NodeB recieves the latest cluster configuration from NodeA and is able to join the cluster. """  

Expected results:


Additional info:

Comment 1 RHEL Program Management 2010-09-30 16:07:47 UTC
Thank you for your bug report. This issue was evaluated for inclusion
in the current release of Red Hat Enterprise Linux. Unfortunately, we
are unable to address this request in the current release. Because we
are in the final stage of Red Hat Enterprise Linux 6 development, only
significant, release-blocking issues involving serious regressions and
data corruption can be considered.

If you believe this issue meets the release blocking criteria as
defined and communicated to you by your Red Hat Support representative,
please ask your representative to file this issue as a blocker for the
current release. Otherwise, ask that it be evaluated for inclusion in
the next minor release of Red Hat Enterprise Linux.

Comment 2 Lon Hohberger 2010-09-30 19:15:39 UTC
Wrong component.

Comment 10 Sayan Saha 2010-10-01 16:48:39 UTC
Agree with proposed solution 1 in Fabbio's comment 7.

Comment 14 Nate Straz 2011-03-03 22:20:19 UTC
Verified with cman-3.0.12-34.el6.x86_64 using procedure in comment #8.

Joined buzz-02 to buzz-01 with older config version and buzz-02 refused to join.  buzz-01 remained alive and in the cluster.

Mar  3 16:15:39 buzz-02 corosync[29195]:   [CMAN  ] Node 1 conflict, remote config version id=2, local=1

Comment 15 errata-xmlrpc 2011-05-19 12:54:13 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0537.html