Bug 466221 - Corosync randomly assert on ckpt when a peer disconnect/reconnect
Corosync randomly assert on ckpt when a peer disconnect/reconnect
Product: Fedora
Classification: Fedora
Component: openais (Show other bugs)
All Other
medium Severity medium
: ---
: ---
Assigned To: Steven Dake
Fedora Extras Quality Assurance
Depends On:
  Show dependency treegraph
Reported: 2008-10-09 05:06 EDT by Mathieu Virbel
Modified: 2016-04-26 09:36 EDT (History)
2 users (show)

See Also:
Fixed In Version: corosync-1.0.0 openais-1.0.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2009-08-18 19:42:57 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Mathieu Virbel 2008-10-09 05:06:58 EDT
Description of problem:
Corosync randomly assert on Checkpoint when a peer node disconnect/reconnect on the cluster. Here is the backtrace :

#0  0x28100c0b in thr_kill () from /lib/libc.so.7
#1  0x280b45b6 in _thr_send_sig () from /lib/libthr.so.3
#2  0x280b2163 in raise () from /lib/libthr.so.3
#3  0x28192e6a in abort () from /lib/libc.so.7
#4  0x2817a2f6 in __assert () from /lib/libc.so.7
#5  0x281f9477 in message_handler_req_exec_ckpt_sync_checkpoint_section (message=0x3fbf5c10, nodeid=33597632) at ckpt.c:3726
#6  0x0804da0f in deliver_fn (nodeid=33597632, iovec=0x3fbf5f00, iov_len=1, endian_conversion_required=0) at main.c:457
#7  0x0805a88e in app_deliver_fn (nodeid=33597632, iovec=0x3fbf5ef4, iov_len=1, endian_conversion_required=0) at totempg.c:433
#8  0x0805a59b in totempg_deliver_fn (nodeid=33597632, iovec=0x829fdc8, iov_len=1, endian_conversion_required=0) at totempg.c:592
#9  0x08065887 in totemmrp_deliver_fn (nodeid=33597632, iovec=0x829fdc8, iov_len=3, endian_conversion_required=0) at totemmrp.c:83
#10 0x080632fd in messages_deliver_to_app (instance=0x8236000, skip=0, end_point=119) at totemsrp.c:3524
#11 0x0805e8bd in memb_state_operational_enter (instance=0x8236000) at totemsrp.c:1606
#12 0x08062ebe in message_handler_orf_token (instance=0x8236000, msg=0x82a562c, msg_len=70, endian_conversion_needed=0) at totemsrp.c:3378
#13 0x080656bb in main_deliver_fn (context=0x8236000, msg=0x82a562c, msg_len=70) at totemsrp.c:4107
#14 0x08065c62 in none_token_recv (rrp_instance=0x82350a0, iface_no=0, context=0x8236000, msg=0x82a562c, msg_len=70, token_seq=6) at totemrrp.c:506
#15 0x080676cd in rrp_deliver_fn (context=0x820b1f0, msg=0x82a562c, msg_len=70) at totemrrp.c:1308
#16 0x0806904b in net_deliver_fn (handle=0, fd=6, revents=1, data=0x82a5000) at totemnet.c:675
#17 0x08058f4c in poll_run (handle=0) at coropoll.c:382
#18 0x0804e98d in main (argc=2, argv=0x3fbfee1c) at main.c:733

And the code :

3721     checkpoint = checkpoint_find_specific (
3722         &sync_checkpoint_list_head,
3723         &req_exec_ckpt_sync_checkpoint_section->checkpoint_name,
3724         req_exec_ckpt_sync_checkpoint_section->ckpt_id);
3726     assert (checkpoint != NULL); <<<<<<

Version-Release number of selected component (if applicable):
corosync/openais trunk, rev 1667
Comment 1 Bug Zapper 2008-11-25 22:41:41 EST
This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle.
Changing version to '10'.

More information and reason for this action is here:
Comment 2 Steven Dake 2009-03-18 17:08:48 EDT
This is fixed in corosync/openais in current f10/f11.
Comment 3 Steven Dake 2009-08-18 19:42:57 EDT
f10 updated with resolved problem.  If problem persists, please reopen.

Note You need to log in before you can comment on or make changes to this bug.