Bug 506119 - [EVT ] Evt config msg from nodeid r(0) ip(10.15.89.14) , but not in membership change
Summary: [EVT ] Evt config msg from nodeid r(0) ip(10.15.89.14) , but not in membersh...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: openais
Version: 5.4
Hardware: All
OS: Linux
urgent
medium
Target Milestone: rc
: ---
Assignee: Steven Dake
QA Contact: Cluster QE
URL:
Whiteboard:
: 507749 (view as bug list)
Depends On:
Blocks: 507749 508303
TreeView+ depends on / blocked
 
Reported: 2009-06-15 16:41 UTC by Nate Straz
Modified: 2016-04-26 13:33 UTC (History)
6 users (show)

Fixed In Version: openais-0.80.6-8
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 507749 (view as bug list)
Environment:
Last Closed: 2009-09-02 11:30:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
patch to resolve problem (768 bytes, application/octet-stream)
2009-06-24 00:56 UTC, Steven Dake
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2009:1366 0 normal SHIPPED_LIVE openais bug-fix and enhancement update 2009-09-01 11:00:17 UTC

Description Nate Straz 2009-06-15 16:41:08 UTC
Description of problem:

If I'm running openais in a mixed 32-bit/64-bit cluster and I kill aisexec on a 32-bit nodes, when I restart aisexec on that 32-bit node, cpg activity stops and cpg_join() fails on the 32-bit node with error 6.

If I fail the 32-bit node a second time and restart it, the cpg activity resumes and I cpg_join() succeeds on the 32-bit node.

Version-Release number of selected component (if applicable):
openais-0.80.3-22.el5_3.8

How reproducible:
Every time

Steps to Reproduce:
1. start openais on a set of 32-bit and 64-bit nodes
2. start cpgbench on all nodes
3. kill aisexec on the 32-bit node and restart it
4. check for additional messages from cpgbench on all nodes.  Attempt to restart cpgbench on failed node or any other node.
  
Actual results:
cpg_join() fails with error 6.

Expected results:


Additional info:

I see this message on the 64-bit nodes when the 32-bit node rejoins and things hang:
Jun 15 11:33:53.157643 [EVT  ] Evt config msg from nodeid r(0) ip(10.15.89.14) , but not in membership change

Comment 3 Steven Dake 2009-06-24 00:56:39 UTC
Created attachment 349177 [details]
patch to resolve problem

Comment 4 Steven Dake 2009-06-24 00:58:02 UTC
------------------------------------------------------------------------
r1998 | sdake | 2009-06-23 17:54:32 -0700 (Tue, 23 Jun 2009) | 10 lines

Add assembly to free list when it is removed from a configuration change as
indicated by being in the left list.

This has side effect of clearing the assembly buffer the next time it is
referenced from the free list.  This fixes a defect that stops forward
processing of the message streams because sync fails to finish when receiving
a sync message from a restarted processor because it throws away the message.

Comment 8 Perry Myers 2009-06-24 18:36:41 UTC
*** Bug 507749 has been marked as a duplicate of this bug. ***

Comment 10 Steven Dake 2009-06-25 15:41:51 UTC
changed version to 5.4 bugzilla.

Comment 16 errata-xmlrpc 2009-09-02 11:30:15 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1366.html


Note You need to log in before you can comment on or make changes to this bug.