Bug 727962 - corosync crashes with combo of lossy network and config changes
Summary: corosync crashes with combo of lossy network and config changes
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: corosync
Version: 6.1
Hardware: All
OS: Linux
Target Milestone: rc
: ---
Assignee: Jan Friesse
QA Contact: Cluster QE
Depends On: 722522 729081
TreeView+ depends on / blocked
Reported: 2011-08-03 19:19 UTC by RHEL Product and Program Management
Modified: 2011-10-11 07:32 UTC (History)
8 users (show)

Fixed In Version: corosync-1.2.3-36.el6_1.3
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2011-10-11 07:32:31 UTC

Attachments (Terms of Use)
Patch (1.52 KB, patch)
2011-08-12 09:27 UTC, Jan Friesse
no flags Details | Diff
6.1.z-bz727962-2-Deliver-all-messages-from-my_high_seq_recieved-to-th (2.33 KB, patch)
2011-09-26 15:26 UTC, Jan Friesse
no flags Details | Diff

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:1361 normal SHIPPED_LIVE corosync bug fix update 2011-10-11 07:32:21 UTC

Description RHEL Product and Program Management 2011-08-03 19:19:53 UTC
This bug has been copied from bug #722522 and has been proposed
to be backported to 6.1 z-stream (EUS).

Comment 7 Jan Friesse 2011-08-12 09:27:08 UTC
Created attachment 518004 [details]

Comment 11 Jan Friesse 2011-09-26 15:26:52 UTC
Created attachment 524929 [details]

Deliver all messages from my_high_seq_recieved to the last gap

This patch passes two test cases:

Test #1
Two node cluster - run cpgbench on each node

modify totemsrp with following defines:
Two test cases:

Test #2
5 node cluster

start 5 nodes randomly at about same time, start 5 nodes randomly at about
same time, wait 10 seconds and attempt to send a message.  If message blocks
on "TRY_AGAIN" likely a message loss has occured.  Wait a few minutes without
cyclng the nodes and see if the TRY_AGAIN state becomes unblocked.

If it doesn't the test case has failed

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Reviewed-by: Jan Friesse <jfriesse@redhat.com>
(cherry picked from commit 2ec4ddb039b310b308a8748c88332155afd62608)

Comment 14 errata-xmlrpc 2011-10-11 07:32:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.