Bug 727962

Summary: corosync crashes with combo of lossy network and config changes
Product: Red Hat Enterprise Linux 6 Reporter: RHEL Program Management <pm-rhel>
Component: corosyncAssignee: Jan Friesse <jfriesse>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.1CC: cluster-maint, djansa, jfriesse, jwest, mjuricek, pm-eus, rrajaram, sdake
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: corosync-1.2.3-36.el6_1.3 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-10-11 07:32:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 722522, 729081    
Bug Blocks:    
Attachments:
Description Flags
Patch
none
6.1.z-bz727962-2-Deliver-all-messages-from-my_high_seq_recieved-to-th none

Description RHEL Program Management 2011-08-03 19:19:53 UTC
This bug has been copied from bug #722522 and has been proposed
to be backported to 6.1 z-stream (EUS).

Comment 7 Jan Friesse 2011-08-12 09:27:08 UTC
Created attachment 518004 [details]
Patch

Comment 11 Jan Friesse 2011-09-26 15:26:52 UTC
Created attachment 524929 [details]
6.1.z-bz727962-2-Deliver-all-messages-from-my_high_seq_recieved-to-th


Deliver all messages from my_high_seq_recieved to the last gap

This patch passes two test cases:

-------
Test #1
-------
Two node cluster - run cpgbench on each node

modify totemsrp with following defines:
Two test cases:

-------
Test #2
-------
5 node cluster

start 5 nodes randomly at about same time, start 5 nodes randomly at about
same time, wait 10 seconds and attempt to send a message.  If message blocks
on "TRY_AGAIN" likely a message loss has occured.  Wait a few minutes without
cyclng the nodes and see if the TRY_AGAIN state becomes unblocked.

If it doesn't the test case has failed

Signed-off-by: Steven Dake <sdake>
Reviewed-by: Reviewed-by: Jan Friesse <jfriesse>
(cherry picked from commit 2ec4ddb039b310b308a8748c88332155afd62608)

Comment 14 errata-xmlrpc 2011-10-11 07:32:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1361.html