Description of problem: If I'm running openais in a mixed 32-bit/64-bit cluster and I kill aisexec on a 32-bit nodes, when I restart aisexec on that 32-bit node, cpg activity stops and cpg_join() fails on the 32-bit node with error 6. If I fail the 32-bit node a second time and restart it, the cpg activity resumes and I cpg_join() succeeds on the 32-bit node. Version-Release number of selected component (if applicable): openais-0.80.3-22.el5_3.8 How reproducible: Every time Steps to Reproduce: 1. start openais on a set of 32-bit and 64-bit nodes 2. start cpgbench on all nodes 3. kill aisexec on the 32-bit node and restart it 4. check for additional messages from cpgbench on all nodes. Attempt to restart cpgbench on failed node or any other node. Actual results: cpg_join() fails with error 6. Expected results: Additional info: I see this message on the 64-bit nodes when the 32-bit node rejoins and things hang: Jun 15 11:33:53.157643 [EVT ] Evt config msg from nodeid r(0) ip(10.15.89.14) , but not in membership change
Created attachment 349177 [details] patch to resolve problem
------------------------------------------------------------------------ r1998 | sdake | 2009-06-23 17:54:32 -0700 (Tue, 23 Jun 2009) | 10 lines Add assembly to free list when it is removed from a configuration change as indicated by being in the left list. This has side effect of clearing the assembly buffer the next time it is referenced from the free list. This fixes a defect that stops forward processing of the message streams because sync fails to finish when receiving a sync message from a restarted processor because it throws away the message.
*** Bug 507749 has been marked as a duplicate of this bug. ***
changed version to 5.4 bugzilla.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-1366.html