Bug 727960 - corosync crashes with combo of lossy network and config changes
Summary: corosync crashes with combo of lossy network and config changes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: corosync
Version: 6.1
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Jan Friesse
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On: 722522 729081
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-08-03 19:19 UTC by RHEL Product and Program Management
Modified: 2011-10-12 08:36 UTC (History)
7 users (show)

Fixed In Version: corosync-1.2.3-21.el6_0.5
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-10-12 08:36:04 UTC


Attachments (Terms of Use)
Patch from 6.2 (1.52 KB, patch)
2011-08-12 09:24 UTC, Jan Friesse
no flags Details | Diff
6.0.z-bz727960-2-Deliver-all-messages-from-my_high_seq_recieved-to-th (2.33 KB, patch)
2011-09-26 15:16 UTC, Jan Friesse
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:1363 normal SHIPPED_LIVE corosync bug fix update 2011-10-12 08:35:55 UTC

Description RHEL Product and Program Management 2011-08-03 19:19:21 UTC
This bug has been copied from bug #722522 and has been proposed
to be backported to 6.0 z-stream (EUS).

Comment 4 Jan Friesse 2011-08-12 09:24:58 UTC
Created attachment 518000 [details]
Patch from 6.2

Comment 6 Jan Friesse 2011-09-26 15:16:28 UTC
Created attachment 524926 [details]
6.0.z-bz727960-2-Deliver-all-messages-from-my_high_seq_recieved-to-th


Deliver all messages from my_high_seq_recieved to the last gap

This patch passes two test cases:

-------
Test #1
-------
Two node cluster - run cpgbench on each node

modify totemsrp with following defines:
Two test cases:

-------
Test #2
-------
5 node cluster

start 5 nodes randomly at about same time, start 5 nodes randomly at about
same time, wait 10 seconds and attempt to send a message.  If message blocks
on "TRY_AGAIN" likely a message loss has occured.  Wait a few minutes without
cyclng the nodes and see if the TRY_AGAIN state becomes unblocked.

If it doesn't the test case has failed

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Reviewed-by: Jan Friesse <jfriesse@redhat.com>
(cherry picked from commit 2ec4ddb039b310b308a8748c88332155afd62608)

Comment 9 errata-xmlrpc 2011-10-12 08:36:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1363.html


Note You need to log in before you can comment on or make changes to this bug.