Bug 729081
Summary: | openais crashes with combo of lossy network and config changes | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Steven Dake <sdake> | ||||||
Component: | openais | Assignee: | Jan Friesse <jfriesse> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | 5.8 | CC: | cluster-maint, djansa, edamato, jkortus, jruemker, jwest, msvoboda | ||||||
Target Milestone: | rc | Keywords: | ZStream | ||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | openais-0.80.6-34.el5 | Doc Type: | Bug Fix | ||||||
Doc Text: |
Previously, when OpenAIS was used in a lossy network, and a large number of configuration changes occurred, OpenAIS sometimes terminated unexpectedly. To solve this problem, the underlying source code has been modified, and OpenAIS no longer crashes in the scenario described.
|
Story Points: | --- | ||||||
Clone Of: | 722522 | Environment: | |||||||
Last Closed: | 2012-02-21 05:22:01 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 722522 | ||||||||
Bug Blocks: | 727960, 727962, 731457, 731458, 731460 | ||||||||
Attachments: |
|
Waiting for resolving of https://bugzilla.redhat.com/show_bug.cgi?id=722522 Created attachment 525057 [details]
2011-09-27-0001-Deliver-all-messages-from-my_high_seq_recieved-to-th
Deliver all messages from my_high_seq_recieved to the last gap
Backport of corosync 2ec4ddb039b310b308a8748c88332155afd62608
This patch passes two test cases:
-------
Test #1
-------
Two node cluster - run cpgbench on each node
modify totemsrp with following defines:
Two test cases:
-------
Test #2
-------
5 node cluster
start 5 nodes randomly at about same time, start 5 nodes randomly at about
same time, wait 10 seconds and attempt to send a message. If message blocks
on "TRY_AGAIN" likely a message loss has occured. Wait a few minutes without
cyclng the nodes and see if the TRY_AGAIN state becomes unblocked.
If it doesn't the test case has failed
Signed-off-by: Steven Dake <sdake>
Reviewed-by: Jan Friesse <jfriesse>
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Previously, when OpenAIS was used in a lossy network, and a large number of configuration changes occurred, OpenAIS sometimes terminated unexpectedly. To solve this problem, the underlying source code has been modified, and OpenAIS no longer crashes in the scenario described. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2012-0180.html *** Bug 818644 has been marked as a duplicate of this bug. *** |
Created attachment 518697 [details] Backported patch from Corosync Backport of Corosync b8a061ae28e7c874b66fa1d35ab01f53d1d36b42