Bug 506119 - [EVT ] Evt config msg from nodeid r(0) ip(10.15.89.14) , but not in membership change
[EVT ] Evt config msg from nodeid r(0) ip(10.15.89.14) , but not in membersh...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: openais (Show other bugs)
5.4
All Linux
urgent Severity medium
: rc
: ---
Assigned To: Steven Dake
Cluster QE
: ZStream
: 507749 (view as bug list)
Depends On:
Blocks: 507749 508303
  Show dependency treegraph
 
Reported: 2009-06-15 12:41 EDT by Nate Straz
Modified: 2016-04-26 09:33 EDT (History)
6 users (show)

See Also:
Fixed In Version: openais-0.80.6-8
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 507749 (view as bug list)
Environment:
Last Closed: 2009-09-02 07:30:15 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
patch to resolve problem (768 bytes, application/octet-stream)
2009-06-23 20:56 EDT, Steven Dake
no flags Details

  None (edit)
Description Nate Straz 2009-06-15 12:41:08 EDT
Description of problem:

If I'm running openais in a mixed 32-bit/64-bit cluster and I kill aisexec on a 32-bit nodes, when I restart aisexec on that 32-bit node, cpg activity stops and cpg_join() fails on the 32-bit node with error 6.

If I fail the 32-bit node a second time and restart it, the cpg activity resumes and I cpg_join() succeeds on the 32-bit node.

Version-Release number of selected component (if applicable):
openais-0.80.3-22.el5_3.8

How reproducible:
Every time

Steps to Reproduce:
1. start openais on a set of 32-bit and 64-bit nodes
2. start cpgbench on all nodes
3. kill aisexec on the 32-bit node and restart it
4. check for additional messages from cpgbench on all nodes.  Attempt to restart cpgbench on failed node or any other node.
  
Actual results:
cpg_join() fails with error 6.

Expected results:


Additional info:

I see this message on the 64-bit nodes when the 32-bit node rejoins and things hang:
Jun 15 11:33:53.157643 [EVT  ] Evt config msg from nodeid r(0) ip(10.15.89.14) , but not in membership change
Comment 3 Steven Dake 2009-06-23 20:56:39 EDT
Created attachment 349177 [details]
patch to resolve problem
Comment 4 Steven Dake 2009-06-23 20:58:02 EDT
------------------------------------------------------------------------
r1998 | sdake | 2009-06-23 17:54:32 -0700 (Tue, 23 Jun 2009) | 10 lines

Add assembly to free list when it is removed from a configuration change as
indicated by being in the left list.

This has side effect of clearing the assembly buffer the next time it is
referenced from the free list.  This fixes a defect that stops forward
processing of the message streams because sync fails to finish when receiving
a sync message from a restarted processor because it throws away the message.
Comment 8 Perry Myers 2009-06-24 14:36:41 EDT
*** Bug 507749 has been marked as a duplicate of this bug. ***
Comment 10 Steven Dake 2009-06-25 11:41:51 EDT
changed version to 5.4 bugzilla.
Comment 16 errata-xmlrpc 2009-09-02 07:30:15 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1366.html

Note You need to log in before you can comment on or make changes to this bug.