Bug 472815 - Message continuation doesn't match previous frag e: 0 - a: 242
Summary: Message continuation doesn't match previous frag e: 0 - a: 242
Keywords:
Status: CLOSED DUPLICATE of bug 261381
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: openais
Version: 5.3
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Steven Dake
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-11-24 20:13 UTC by Nate Straz
Modified: 2016-04-26 15:10 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-12-05 21:01:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
core dump from tank-01, gzipped (71.62 KB, application/x-gzip)
2008-11-24 21:01 UTC, Nate Straz
no flags Details

Description Nate Straz 2008-11-24 20:13:19 UTC
Description of problem:

I saw $summary on one node while I was running revolver.  Four nodes out of a six node cluster were shot.

[TOTEM] entering GATHER state from 8. 
[TOTEM] entering GATHER state from 11. 
[TOTEM] Saving state aru 0 high seq received 0 
[TOTEM] Storing new sequence id for ring e60 
[TOTEM] entering COMMIT state. 
[TOTEM] entering RECOVERY state. 
[TOTEM] position [0] member 10.15.89.61: 
[TOTEM] previous ring seq 3676 rep 10.15.89.61 
[TOTEM] aru 331 high delivered 287 received flag 1 
[TOTEM] position [1] member 10.15.89.63: 
[TOTEM] previous ring seq 3676 rep 10.15.89.61 
[TOTEM] aru 331 high delivered 287 received flag 1 
[TOTEM] position [2] member 10.15.89.64: 
[TOTEM] previous ring seq 3676 rep 10.15.89.61 
[TOTEM] aru 331 high delivered 287 received flag 1 
[TOTEM] position [3] member 10.15.89.91: 
[TOTEM] previous ring seq 3660 rep 10.15.89.91 
[TOTEM] aru 0 high delivered 0 received flag 1 
[TOTEM] position [4] member 10.15.89.93: 
[TOTEM] previous ring seq 3676 rep 10.15.89.61 
[TOTEM] aru 331 high delivered 287 received flag 1 
[TOTEM] position [5] member 10.15.89.94: 
[TOTEM] previous ring seq 3676 rep 10.15.89.61 
[TOTEM] aru 331 high delivered 287 received flag 1 
[TOTEM] Did not need to originate any messages in recovery. 
[CLM  ] CLM CONFIGURATION CHANGE 
[CLM  ] New Configuration: 
[CLM  ] Members Left: 
[CLM  ] Members Joined: 
[CLM  ] CLM CONFIGURATION CHANGE 
[CLM  ] New Configuration: 
[CLM  ]  r(0) ip(10.15.89.61)  
[CLM  ]  r(0) ip(10.15.89.63)  
[CLM  ]  r(0) ip(10.15.89.64)  
[CLM  ]  r(0) ip(10.15.89.91)  
[CLM  ]  r(0) ip(10.15.89.93)  
[CLM  ]  r(0) ip(10.15.89.94)  
[CLM  ] Members Left: 
[CLM  ] Members Joined: 
[CLM  ]  r(0) ip(10.15.89.61)  
[CLM  ]  r(0) ip(10.15.89.63)  
[CLM  ]  r(0) ip(10.15.89.64)  
[CLM  ]  r(0) ip(10.15.89.91)  
[CLM  ]  r(0) ip(10.15.89.93)  
[CLM  ]  r(0) ip(10.15.89.94)  
[SYNC ] This node is within the primary component and will provide service. 
[TOTEM] entering OPERATIONAL state. 
[CMAN ] quorum regained, resuming activity 
[CMAN ] quorum lost, blocking activity 
[TOTEM] Message continuation doesn't match previous frag e: 0 - a: 242 
[TOTEM] Throwing away broken message: continuation 0, index 0 

After this, aisexec was not running on the system.  The cman init script failed trying to start cman.

Version-Release number of selected component (if applicable):
openais-0.80.3-21.el5
cman-2.0.97-1.el5

How reproducible:
Unknown

Comment 1 Nate Straz 2008-11-24 20:16:55 UTC
On other nodes I did see messages like this:

morph-03 openais[2707]: [CLM  ] got nodejoin message 10.15.89.93 
morph-03 openais[2707]: [CLM  ] got nodejoin message 10.15.89.94 
morph-03 openais[2707]: [CLM  ] got nodejoin message 10.15.89.61 
morph-03 openais[2707]: [CLM  ] got nodejoin message 10.15.89.63 
morph-03 openais[2707]: [CLM  ] got nodejoin message 10.15.89.64 
morph-03 openais[2707]: [EVT  ] Can't find cluster node at r(0) ip(10.15.89.91)  
morph-03 openais[2707]: [CPG  ] got joinlist message from node 4 
morph-03 openais[2707]: [CPG  ] got joinlist message from node 6 
morph-03 openais[2707]: [CPG  ] got joinlist message from node 7 
morph-03 openais[2707]: [CPG  ] got joinlist message from node 2

Comment 3 Nate Straz 2008-11-24 21:01:44 UTC
Created attachment 324536 [details]
core dump from tank-01, gzipped

Here's the core dump from tank-01.  It's an i386 core from aisexec from package openais-0.80.3-21.el5

Comment 4 Steven Dake 2008-12-05 21:01:59 UTC
this is a dup of 261381.

*** This bug has been marked as a duplicate of bug 261381 ***


Note You need to log in before you can comment on or make changes to this bug.