Bug 727960

Summary: corosync crashes with combo of lossy network and config changes
Product: Red Hat Enterprise Linux 6 Reporter: RHEL Program Management <pm-rhel>
Component: corosyncAssignee: Jan Friesse <jfriesse>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.1CC: cluster-maint, djansa, jfriesse, jwest, mjuricek, pm-eus, sdake
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: corosync-1.2.3-21.el6_0.5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-10-12 08:36:04 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 722522, 729081    
Bug Blocks:    
Attachments:
Description Flags
Patch from 6.2
none
6.0.z-bz727960-2-Deliver-all-messages-from-my_high_seq_recieved-to-th none

Description RHEL Program Management 2011-08-03 19:19:21 UTC
This bug has been copied from bug #722522 and has been proposed
to be backported to 6.0 z-stream (EUS).

Comment 4 Jan Friesse 2011-08-12 09:24:58 UTC
Created attachment 518000 [details]
Patch from 6.2

Comment 6 Jan Friesse 2011-09-26 15:16:28 UTC
Created attachment 524926 [details]
6.0.z-bz727960-2-Deliver-all-messages-from-my_high_seq_recieved-to-th


Deliver all messages from my_high_seq_recieved to the last gap

This patch passes two test cases:

-------
Test #1
-------
Two node cluster - run cpgbench on each node

modify totemsrp with following defines:
Two test cases:

-------
Test #2
-------
5 node cluster

start 5 nodes randomly at about same time, start 5 nodes randomly at about
same time, wait 10 seconds and attempt to send a message.  If message blocks
on "TRY_AGAIN" likely a message loss has occured.  Wait a few minutes without
cyclng the nodes and see if the TRY_AGAIN state becomes unblocked.

If it doesn't the test case has failed

Signed-off-by: Steven Dake <sdake>
Reviewed-by: Reviewed-by: Jan Friesse <jfriesse>
(cherry picked from commit 2ec4ddb039b310b308a8748c88332155afd62608)

Comment 9 errata-xmlrpc 2011-10-12 08:36:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1363.html