Bug 480684 - certain types of messages can be ignored
certain types of messages can be ignored
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: openais (Show other bugs)
5.3
All Linux
urgent Severity high
: rc
: ---
Assigned To: Steven Dake
Cluster QE
: ZStream
Depends On:
Blocks: 486388 509893
  Show dependency treegraph
 
Reported: 2009-01-19 15:05 EST by Steven Dake
Modified: 2016-04-26 10:14 EDT (History)
4 users (show)

See Also:
Fixed In Version: openais-0.80.5-2
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-09-02 07:29:23 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Steven Dake 2009-01-19 15:05:58 EST
Description of problem:
Messages record their source location with message_source_set.  This uses a totem API to retrieve the current node id to store in the source location.  On receipt of messages, a local response will be given to an ipc connection if the message source set with message_source_set equals the current node id gathered from the totem node id.  Also the synchronization engine heavily uses message_source_set and checks for local messages when processing requests.

Part of totem checks that an interface has not gone up or down once per second.  The checking of the state change of the interface does a memset of 0 bytes for the node id, then sets the nodeid, which is read by the totem function to retrieve the node id.  In some rare circumstances related to timing under heavy load, a message may be inadvertantly ignored because its source address will be set to 0 by the memset operation, but then compared against the valid node id.  This race condition results in messages that are ignored that should not be.


Version-Release number of selected component (if applicable):
openais-0.80.3-21

How reproducible:
Could not reproduce with current ipc, but it is slow.  With the IPC rework (higher performance, more chances for the race to happen) the condition occurs after 10-15 seconds of load.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 3 Steven Dake 2009-02-18 00:31:07 EST
fixed in openais-0.80.5-2
Comment 8 errata-xmlrpc 2009-09-02 07:29:23 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1366.html

Note You need to log in before you can comment on or make changes to this bug.