Bug 743813 - A CPG client can sometimes lockup if the local node is in the downlist
Summary: A CPG client can sometimes lockup if the local node is in the downlist
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: corosync
Version: 6.2
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 6.3
Assignee: Jan Friesse
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks: 756082
TreeView+ depends on / blocked
 
Reported: 2011-10-06 07:32 UTC by Jan Friesse
Modified: 2012-06-20 12:22 UTC (History)
3 users (show)

Fixed In Version: corosync-1.4.1-5.el6
Doc Type: Bug Fix
Doc Text:
Cause Booting large cluster in same moment / stating many corosync at same time. Consequence Sometimes, CPG events are not send to user. Fix Properly check leave reason in cpg service. Result CPG events are sent to user.
Clone Of:
Environment:
Last Closed: 2012-06-20 12:22:46 UTC
Target Upstream Version:


Attachments (Terms of Use)
Proposed patch (1.32 KB, patch)
2011-10-06 07:32 UTC, Jan Friesse
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2012:0777 0 normal SHIPPED_LIVE corosync bug fix and enhancement update 2012-06-19 20:35:04 UTC

Description Jan Friesse 2011-10-06 07:32:58 UTC
Created attachment 526639 [details]
Proposed patch

Description of problem:
In a 10-node cluster where all nodes are booting up and starting corosync
    at the same time, sometimes during this process corosync detects a node as
    leaving and rejoining the cluster.

    Occasionally the downlist that gets picked contains the local node. When the
    local node sends leave events for the downlist (including itself), it sets
    its cpd state to CPD_STATE_UNJOINED and clears the cpd->group_name. This
    means it no longer sends CPG events to the CPG client.

Version-Release number of selected component (if applicable):
1.4.1

Comment 3 Jan Friesse 2012-03-07 07:28:00 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause
Booting large cluster in same moment / stating many corosync at same time.

Consequence
Sometimes, CPG events are not send to user.

Fix
Properly check leave reason in cpg service.

Result
CPG events are sent to user.

Comment 7 errata-xmlrpc 2012-06-20 12:22:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0777.html


Note You need to log in before you can comment on or make changes to this bug.