Bug 729081 - openais crashes with combo of lossy network and config changes
Summary: openais crashes with combo of lossy network and config changes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: openais
Version: 5.8
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Jan Friesse
QA Contact: Cluster QE
URL:
Whiteboard:
: 818644 (view as bug list)
Depends On: 722522
Blocks: 727960 727962 731457 731458 731460
TreeView+ depends on / blocked
 
Reported: 2011-08-08 17:00 UTC by Steven Dake
Modified: 2018-12-01 18:30 UTC (History)
7 users (show)

Fixed In Version: openais-0.80.6-34.el5
Doc Type: Bug Fix
Doc Text:
Previously, when OpenAIS was used in a lossy network, and a large number of configuration changes occurred, OpenAIS sometimes terminated unexpectedly. To solve this problem, the underlying source code has been modified, and OpenAIS no longer crashes in the scenario described.
Clone Of: 722522
Environment:
Last Closed: 2012-02-21 05:22:01 UTC


Attachments (Terms of Use)
Backported patch from Corosync (1.53 KB, patch)
2011-08-17 14:39 UTC, Jan Friesse
no flags Details | Diff
2011-09-27-0001-Deliver-all-messages-from-my_high_seq_recieved-to-th (2.41 KB, patch)
2011-09-27 09:02 UTC, Jan Friesse
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2012:0180 normal SHIPPED_LIVE openais bug fix and enhancement update 2012-02-20 14:54:52 UTC

Comment 2 Jan Friesse 2011-08-17 14:39:50 UTC
Created attachment 518697 [details]
Backported patch from Corosync

Backport of Corosync b8a061ae28e7c874b66fa1d35ab01f53d1d36b42

Comment 8 Jan Friesse 2011-09-20 11:58:07 UTC
Waiting for resolving of https://bugzilla.redhat.com/show_bug.cgi?id=722522

Comment 9 Jan Friesse 2011-09-27 09:02:03 UTC
Created attachment 525057 [details]
2011-09-27-0001-Deliver-all-messages-from-my_high_seq_recieved-to-th


Deliver all messages from my_high_seq_recieved to the last gap

Backport of corosync 2ec4ddb039b310b308a8748c88332155afd62608

This patch passes two test cases:

-------
Test #1
-------
Two node cluster - run cpgbench on each node

modify totemsrp with following defines:
Two test cases:

-------
Test #2
-------
5 node cluster

start 5 nodes randomly at about same time, start 5 nodes randomly at about
same time, wait 10 seconds and attempt to send a message.  If message blocks
on "TRY_AGAIN" likely a message loss has occured.  Wait a few minutes without
cyclng the nodes and see if the TRY_AGAIN state becomes unblocked.

If it doesn't the test case has failed

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

Comment 12 Miroslav Svoboda 2011-11-03 18:01:28 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Previously, when OpenAIS was used in a lossy network, and a large number of configuration changes occurred, OpenAIS sometimes terminated unexpectedly. To solve this problem, the underlying source code has been modified, and OpenAIS no longer crashes in the scenario described.

Comment 14 errata-xmlrpc 2012-02-21 05:22:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0180.html

Comment 15 Jan Friesse 2012-05-07 15:16:20 UTC
*** Bug 818644 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.