This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 981111 - Corosync assertion failure
Corosync assertion failure
Status: CLOSED DUPLICATE of bug 854216
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: corosync (Show other bugs)
6.4
Unspecified Unspecified
unspecified Severity high
: rc
: ---
Assigned To: Jan Friesse
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-04 00:12 EDT by Andrew Beekhof
Modified: 2013-07-09 02:52 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-07-04 01:14:30 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Andrew Beekhof 2013-07-04 00:12:11 EDT
Description of problem:

# gdb corosync /var/lib/corosync/core.23123
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
...
(gdb) where
#0  0x000000307d2328a5 in raise () from /lib64/libc.so.6
#1  0x000000307d234085 in abort () from /lib64/libc.so.6
#2  0x000000307d22ba1e in __assert_fail_base () from /lib64/libc.so.6
#3  0x000000307d22bae0 in __assert_fail () from /lib64/libc.so.6
#4  0x00000030c9013996 in memb_consensus_agreed (instance=0x7fdb2d4bd010) at totemsrp.c:1243
#5  0x00000030c901792f in memb_join_process (instance=0x7fdb2d4bd010, memb_join=0xcc065c) at totemsrp.c:4059
#6  0x00000030c9017cd9 in message_handler_memb_join (instance=0x7fdb2d4bd010, msg=<value optimized out>, msg_len=<value optimized out>, endian_conversion_needed=<value optimized out>) at totemsrp.c:4304
#7  0x00000030c9011728 in rrp_deliver_fn (context=<value optimized out>, msg=0xcc065c, msg_len=333) at totemrrp.c:1747
#8  0x00000030c900c104 in net_deliver_fn (handle=<value optimized out>, fd=<value optimized out>, revents=<value optimized out>, data=0xcbff90) at totemudp.c:1284
#9  0x00000030c90073c2 in poll_run (handle=5450352153329664) at coropoll.c:513
#10 0x0000000000407009 in main (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at main.c:1882


Version-Release number of selected component (if applicable):

corosync-1.4.1-15.el6_4.1.x86_64 

How reproducible:

Reasonably often

Steps to Reproduce:
1. All four nodes running in CTS
2. Execute StopOnebyOne test 
3.

Actual results:

corosync dies at:  

1243		assert (token_memb_entries >= 1);

Expected results:

corosync does not die :)

Additional info:

cluster.conf:

<?xml version="1.0"?>
<cluster config_version="1" name="r6">
  <logging debug="off"/>
  <clusternodes>
    <clusternode name="pcmk-5" nodeid="1">
      <fence>
        <method name="pcmk-redirect">
          <device name="pcmk" port="pcmk-5"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="pcmk-6" nodeid="2">
      <fence>
        <method name="pcmk-redirect">
          <device name="pcmk" port="pcmk-6"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="pcmk-7" nodeid="3">
      <fence>
        <method name="pcmk-redirect">
          <device name="pcmk" port="pcmk-7"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="pcmk-8" nodeid="4">
      <fence>
        <method name="pcmk-redirect">
          <device name="pcmk" port="pcmk-8"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <fencedevices>
    <fencedevice name="pcmk" agent="fence_pcmk"/>
  </fencedevices>
</cluster>


Logs showing the last corosync node starting and then the first one leaving:

Jul  4 12:54:24 pcmk-6 corosync[23123]:   [QUORUM] This node is within the primary component and will provide service.
Jul  4 12:54:24 pcmk-6 corosync[23123]:   [QUORUM] Members[3]: 1 2 3
Jul  4 12:54:24 pcmk-6 corosync[23123]:   [QUORUM] Members[3]: 1 2 3
Jul  4 12:54:24 pcmk-6 corosync[23123]:   [CPG   ] chosen downlist: sender r(0) ip(192.168.122.105) ; members(old:2 left:0)
Jul  4 12:54:24 pcmk-6 corosync[23123]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jul  4 12:56:55 pcmk-6 corosync[23123]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jul  4 12:56:55 pcmk-6 corosync[23123]:   [QUORUM] Members[4]: 1 2 3 4
Jul  4 12:56:55 pcmk-6 corosync[23123]:   [QUORUM] Members[4]: 1 2 3 4
Jul  4 12:56:55 pcmk-6 corosync[23123]:   [CPG   ] chosen downlist: sender r(0) ip(192.168.122.105) ; members(old:3 left:0)
Jul  4 12:56:55 pcmk-6 corosync[23123]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jul  4 12:57:58 pcmk-6 corosync[23123]:   [QUORUM] Members[3]: 2 3 4
Jul  4 12:57:58 pcmk-6 corosync[23123]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jul  4 12:57:58 pcmk-6 corosync[23123]:   [CPG   ] chosen downlist: sender r(0) ip(192.168.122.106) ; members(old:4 left:1)
Jul  4 12:57:58 pcmk-6 corosync[23123]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jul  4 13:00:43 pcmk-6 corosync[23123]:   [TOTEM ] Retransmit List: f0 f1 f2 
Jul  4 13:00:43 pcmk-6 corosync[23123]:   [TOTEM ] Retransmit List: f0 f1 f2 

The retransmit list keeps growing until you get these lines over and over followed by "boom":

Jul  4 13:26:31 pcmk-6 corosync[23123]:   [TOTEM ] Retransmit List: 100 101 102 103 107 108 f0 f1 f2 fc fd fe ff 104 105 106 109 10a 10b 10c 
Jul  4 13:26:32 pcmk-6 corosync[23123]:   [TOTEM ] Retransmit List: 109 10a ec ed ee ef f0 f1 f2 fc fd fe ff 100 101 102 103 104 105 106 107 108 10b 10c 
Jul  4 13:26:33 pcmk-6 corosync[23123]:   [TOTEM ] Retransmit List: 104 105 106 107 108 ef f0 f1 f2 fc fd fe ff 100 101 102 103 109 10a 10b 10c 
Jul  4 13:26:34 pcmk-6 corosync[23123]:   [TOTEM ] Retransmit List: 100 101 102 103 109 10a f0 f1 f2 fc fd fe ff 104 105 106 107 108 10b 10c 
Jul  4 13:26:35 pcmk-6 corosync[23123]:   [TOTEM ] Retransmit List: 107 108 ec ed ee ef f0 f1 f2 fc fd fe ff 100 101 102 103 104 105 106 109 10a 10b 10c 
Jul  4 13:26:36 pcmk-6 corosync[23123]:   [TOTEM ] Retransmit List: 104 105 106 109 10a ef f0 f1 f2 fc fd fe ff 100 101 102 103 107 108 10b 10c 
Jul  4 13:26:37 pcmk-6 corosync[23123]:   [TOTEM ] Retransmit List: 100 101 102 103 107 108 f0 f1 f2 fc fd fe ff 104 105 106 109 10a 10b 10c 
Jul  4 13:26:37 pcmk-6 corosync[23123]:   [TOTEM ] FAILED TO RECEIVE
Jul  4 13:26:49 pcmk-6 abrtd: Directory 'ccpp-2013-07-04-13:26:49-23123' creation detected
Jul  4 13:26:49 pcmk-6 abrt[28436]: Saved core dump of pid 23123 (/usr/sbin/corosync) to /var/spool/abrt/ccpp-2013-07-04-13:26:49-23123 (40853504 bytes)
Jul  4 13:26:49 pcmk-6 abrtd: 'post-create' on '/var/spool/abrt/ccpp-2013-07-04-13:26:49-23123' exited with 1
Jul  4 13:26:49 pcmk-6 abrtd: Corrupted or bad directory '/var/spool/abrt/ccpp-2013-07-04-13:26:49-23123', deleting

Various gdb variables


(gdb) p token_memb_entries
$1 = <value optimized out>


(gdb) p *token_memb
$2 = {addr = {{nodeid = 1407323704, family = 8502, addr = "m\302Kʊ\344Ly\264\206\"\260XqL\237"}, {nodeid = 1506751852, family = 43124, addr = "k\vH\214g\326!\263\375\314-\225D9\001", <incomplete sequence \311>}}}
(gdb) p token_memb[0]
$3 = {addr = {{nodeid = 1407323704, family = 8502, addr = "m\302Kʊ\344Ly\264\206\"\260XqL\237"}, {nodeid = 1506751852, family = 43124, addr = "k\vH\214g\326!\263\375\314-\225D9\001", <incomplete sequence \311>}}}
(gdb) p token_memb[1]
$4 = {addr = {{nodeid = 48, family = 39653, addr = "p\336\373\254\362P\001\000\000\000\003\000\000\000TZ"}, {nodeid = 2145070412, family = 0, addr = "\320\334a\311\377\177\000\000\003\000\000\000\002\000\300\250"}}}
(gdb) p token_memb[2]
$5 = {addr = {{nodeid = 551802, family = 4, addr = "\300\250z\377\t\000\003\000\000\000\000\000\000\000\000"}, {nodeid = 0, family = 0, addr = "\000\000\000\000\000\000\000\000\004\000\000\000\002\000\300\250"}}}
(gdb) p token_memb[3]
$6 = {addr = {{nodeid = 552058, family = 4, addr = "\300\250z\377\t\000\003\000\000\000\000\000\000\000\000"}, {nodeid = 0, family = 0, addr = "\000\000\000\000\000\000\000\000\207\307ƖN$4", <incomplete sequence \331>}}}
(gdb) p token_memb[4]
$7 = {addr = {{nodeid = 1365481784, family = 45312, addr = "\372@\270\262\222\206\377\177\000\000 d\002\311\060"}, {nodeid = 4456448, family = 0, addr = "\000\000\000\000 \337a\311\377\177\000\000\304\341a", <incomplete sequence \311>}}}
(gdb) p token_memb[5]
$8 = {addr = {{nodeid = 32767, family = 58052, addr = "a\311\377\177\000\000\203ܧ\362\264Ǔ1M&"}, {nodeid = 38411, family = 0, addr = "\240\340a\311\377\177\000\000\260\341a\311\377\177\000"}}}
(gdb) p token_memb[6]
$9 = {addr = {{nodeid = 3378635204, family = 32767, addr = "\000\000\364\000\000\000\000\000\000\000 \337a\311\377\177"}, {nodeid = 3788767232, family = 51553, addr = "\377\177\000\000L\305\000\311\060\000\000\000\240\341a", <incomplete sequence \311>}}}
(gdb) p token_memb[7]
$10 = {addr = {{nodeid = 32767, family = 0, addr = "\000\000\000\000\000\000\030\001\000\000\000\000\000\000\320", <incomplete sequence \335>}, {nodeid = 2147469665, family = 0, addr = "\340\065J\336N\256\225\205\365\t\374\213HĤt"}}}
(gdb) p token_memb[8]
$11 = {addr = {{nodeid = 1113511162, family = 49256, addr = "\240f_I\\\352:`r\021\r\341\345Y\315P"}, {nodeid = 3596543632, family = 17199, addr = "@)\177\342\310ܚ\364\244\v\023\220", <incomplete sequence \355>}}}
(gdb) p token_memb[9]
$12 = {addr = {{nodeid = 712195769, family = 34816, addr = "\363\344\301ѐ\200dev\252}\002\\\322\313\067"}, {nodeid = 678181885, family = 6573, addr = "\323\353\310\331\006a\274\230\240\216\377\033\221\000\215\257"}}}
(gdb) p token_memb[10]
$13 = {addr = {{nodeid = 3154467245, family = 55000, addr = "W\355\370ҭXM\365\321\353<\314\354\070Ѻ"}, {nodeid = 1411350569, family = 11550, addr = "W\245s\374\ta\213\326\000\000\000\000\000\000\000"}}}
(gdb) p token_memb[11]
$14 = {addr = {{nodeid = 0, family = 1, addr = "\000\000\000\000\000\000\314\004_3\203i\355\071ՙ"}, {nodeid = 4283034587, family = 37840, addr = "\253f\177\372m>{\341\230\021M\240B\022\251X"}}}


(gdb) p *instance
$16 = {iface_changes = 1, failed_to_recv = 1, fcc_remcast_last = 0, fcc_mcast_last = 0, fcc_remcast_current = 16, consensus_list = {{addr = {addr = {{nodeid = 2, family = 2, addr = "\300\250zj\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, 
            family = 0, addr = '\000' <repeats 15 times>}}}, set = 1}, {addr = {addr = {{nodeid = 4, family = 2, addr = "\300\250zl\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}}, set = 1}, {addr = {
        addr = {{nodeid = 3, family = 2, addr = "\300\250zk\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}}, set = 1}, {addr = {addr = {{nodeid = 3, family = 2, 
            addr = "\300\250zk\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}}, set = 1}, {addr = {addr = {{nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}, {nodeid = 0, family = 0, 
            addr = '\000' <repeats 15 times>}}}, set = 0} <repeats 380 times>}, consensus_list_entries = 1, my_id = {addr = {{nodeid = 2, family = 2, addr = "\300\250zj\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, 
        addr = '\000' <repeats 15 times>}}}, my_proc_list = {{addr = {{nodeid = 2, family = 2, addr = "\300\250zj\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 3, family = 2, 
          addr = "\300\250zk\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 4, family = 2, addr = "\300\250zl\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, 
          addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}} <repeats 381 times>}, my_failed_list = {{addr = {{nodeid = 2, family = 2, 
          addr = "\300\250zj\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 3, family = 2, addr = "\300\250zk\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, 
          addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 4, family = 2, addr = "\300\250zl\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 0, family = 0, 
          addr = '\000' <repeats 15 times>}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}} <repeats 381 times>}, my_new_memb_list = {{addr = {{nodeid = 2, family = 2, addr = "\300\250zj\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, 
          family = 0, addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 3, family = 2, addr = "\300\250zk\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 4, family = 2, 
          addr = "\300\250zl\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 4, family = 2, addr = "\300\250zl\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, 
          addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}} <repeats 380 times>}, my_trans_memb_list = {{addr = {{nodeid = 2, family = 2, 
          addr = "\300\250zj\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 3, family = 2, addr = "\300\250zk\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, 
          addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 4, family = 2, addr = "\300\250zl\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 0, family = 0, 
          addr = '\000' <repeats 15 times>}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}} <repeats 381 times>}, my_memb_list = {{addr = {{nodeid = 2, family = 2, addr = "\300\250zj\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, 
          family = 0, addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 3, family = 2, addr = "\300\250zk\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 4, family = 2, 
          addr = "\300\250zl\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 4, family = 2, addr = "\300\250zl\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, 
          addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}} <repeats 380 times>}, my_deliver_memb_list = {{addr = {{nodeid = 0, 
          family = 0, addr = '\000' <repeats 15 times>}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}} <repeats 384 times>}, my_left_memb_list = {{addr = {{nodeid = 1, family = 2, addr = "\300\250zi\b\000\004\000\300\250z\377\t\000\003"}, {
          nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}} <repeats 383 times>}, my_proc_list_entries = 3, 
  my_failed_list_entries = 3, my_new_memb_entries = 3, my_trans_memb_entries = 3, my_memb_entries = 3, my_deliver_memb_entries = 0, my_left_memb_entries = 1, my_ring_id = {rep = {nodeid = 2, family = 2, 
      addr = "\300\250zj\b\000\004\000\300\250z\377\t\000\003"}, seq = 4516}, my_old_ring_id = {rep = {nodeid = 2, family = 2, addr = "\300\250zj\b\000\004\000\300\250z\377\t\000\003"}, seq = 4516}, my_aru_count = 2501, 
  my_merge_detect_timeout_outstanding = 0, my_last_aru = 229, my_seq_unchanged = 739, my_received_flg = 1, my_high_seq_received = 266, my_install_seq = 0, my_rotation_counter = 0, my_set_retrans_flg = 0, my_retrans_flg_count = 0, 
  my_high_ring_delivered = 0, heartbeat_timeout = 0, new_message_queue = {head = 379, tail = 368, used = 10, usedhw = 38, size = 783, items = 0xcc5530, size_per_item = 16, iterator = 0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, 
        __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}, new_message_queue_trans = {head = 59, tail = 58, used = 0, usedhw = 3, size = 783, items = 0xcc8630, 
    size_per_item = 16, iterator = 0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}, retrans_message_queue = {
    head = 0, tail = 16383, used = 0, usedhw = 0, size = 16384, items = 0x7fdb2d47c010, size_per_item = 16, iterator = 0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, 
          __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}, regular_sort_queue = {head = 230, size = 16384, items = 0x7fdb2d43b010, items_inuse = 0xc7fa70, items_miss_count = 0xc8fa80, size_per_item = 16, head_seqid = 230, 
    item_count = 16384, pos_max = 266}, recovery_sort_queue = {head = 0, size = 16384, items = 0x7fdb2d3fa010, items_inuse = 0xc9fa90, items_miss_count = 0xcafaa0, size_per_item = 16, head_seqid = 0, item_count = 16384, pos_max = 0}, my_aru = 229, 
  my_high_delivered = 229, token_callback_received_listhead = {next = 0xc75bd0, prev = 0xc7e1e0}, token_callback_sent_listhead = {next = 0xc75ba0, prev = 0xc75ba0}, 
  orf_token_retransmit = "\000\000\"\377\002\000\000\000\f\001\000\000Y.\000\000\345\000\000\000\002\000\000\000\002\000\000\000\002\000\300\250zj\b\000\004\000\300\250z\377\t\000\003\000\244\021\000\000\000\000\000\000\n\000\000\000!\000\000\000\000\000\000\000\023\000\000\000\002\000\000\000\002\000\300\250zj\b\000\004\000\300\250z\377\t\000\003\000\244\021\000\000\000\000\000\000\000\001\000\000\002\000\000\000\002\000\300\250zj\b\000\004\000\300\250z\377\t\000\003\000\244\021\000\000\000\000\000\000\001\001\000\000\002\000\000\000\002\000\300\250zj\b\000\004\000\300\250z\377\t\000\003\000\244\021\000\000\000\000\000\000\002\001\000\000\002\000\000\000\002\000\300\250zj\b\000\004\000\300\250z\377\t\000\003\000\244\021\000\000\000\000\000\000\003\001\000\000\002\000\000\000\002\000\300\250zj\b\000\004\000\300\250z\377\t\000\003\000\244\021\000\000\000\000\000\000\a\001"..., orf_token_retransmit_size = 716, my_token_seq = 11864, timer_pause_timeout = 0xd06170, timer_orf_token_timeout = 0x0, 
  timer_orf_token_retransmit_timeout = 0x0, timer_orf_token_hold_retransmit_timeout = 0x0, timer_merge_detect_timeout = 0x0, memb_timer_state_gather_join_timeout = 0xccc4a0, memb_timer_state_gather_consensus_timeout = 0xcd25c0, 
  memb_timer_state_commit_timeout = 0x0, timer_heartbeat_timeout = 0x0, totemsrp_log_level_security = 4, totemsrp_log_level_error = 3, totemsrp_log_level_warning = 4, totemsrp_log_level_notice = 5, totemsrp_log_level_debug = 7, totemsrp_subsys_id = 6, 
  totemsrp_log_printf = 0x403ca0 <_logsys_log_printf@plt>, memb_state = MEMB_STATE_GATHER, totemsrp_poll_handle = 5450352153329664, mcast_address = {nodeid = 0, family = 2, addr = "\357\300\367B", '\000' <repeats 11 times>}, 
  totemsrp_deliver_fn = 0x30c901af50 <totemmrp_deliver_fn>, totemsrp_confchg_fn = 0x30c901af60 <totemmrp_confchg_fn>, totemsrp_service_ready_fn = 0x4054e0 <main_service_ready>, 
  totemsrp_waiting_trans_ack_cb_fn = 0x30c901b180 <totempg_waiting_trans_ack_cb>, global_seqno = 428, my_token_held = 1, token_ring_id_seq = 4516, last_released = 229, set_aru = 4294967295, old_ring_state_saved = 0, old_ring_state_aru = 585, 
  old_ring_state_high_seq_received = 585, my_last_seq = 268, tv_old = {tv_sec = 0, tv_usec = 0}, totemrrp_context = 0xcbfab0, totem_config = 0x7fffc962b8a0, use_heartbeat = 0, my_trc = 16, my_pbl = 10, my_cbl = 10, pause_timestamp = 37728268192834, 
  commit_token = 0x7fdb2d4ee928, stats = {hdr = {handle = 1222976449883930708, is_dirty = 0, last_updated = 0}, rrp = 0x0, orf_token_tx = 2, orf_token_rx = 11299, memb_merge_detect_tx = 1936, memb_merge_detect_rx = 2077, memb_join_tx = 180, 
    memb_join_rx = 191, mcast_tx = 438, mcast_retx = 35180, mcast_rx = 36683, memb_commit_token_tx = 10, memb_commit_token_rx = 10, token_hold_cancel_tx = 74, token_hold_cancel_rx = 191, operational_entered = 5, operational_token_lost = 0, 
    gather_entered = 7, gather_token_lost = 0, commit_entered = 5, commit_token_lost = 0, recovery_entered = 5, recovery_token_lost = 0, consensus_timeouts = 1, rx_msg_dropped = 0, continuous_gather = 0, continuous_sendmsg_failures = 0, 
    earliest_token = 95, latest_token = 94, token = {{rx = 37627067, tx = 37627068, backlog_calc = 10}, {rx = 37627447, tx = 37627448, backlog_calc = 10}, {rx = 37628974, tx = 37628975, backlog_calc = 10}, {rx = 37629353, tx = 37629354, 
        backlog_calc = 10}, {rx = 37630880, tx = 37630881, backlog_calc = 10}, {rx = 37631260, tx = 37631261, backlog_calc = 10}, {rx = 37632787, tx = 37632788, backlog_calc = 10}, {rx = 37633166, tx = 37633167, backlog_calc = 10}, {rx = 37634693, 
        tx = 37634694, backlog_calc = 10}, {rx = 37635074, tx = 37635074, backlog_calc = 10}, {rx = 37636600, tx = 37636601, backlog_calc = 10}, {rx = 37636981, tx = 37636982, backlog_calc = 10}, {rx = 37638506, tx = 37638507, backlog_calc = 10}, {
        rx = 37638888, tx = 37638889, backlog_calc = 10}, {rx = 37640412, tx = 37640413, backlog_calc = 10}, {rx = 37640795, tx = 37640796, backlog_calc = 10}, {rx = 37642319, tx = 37642319, backlog_calc = 10}, {rx = 37642703, tx = 37642704, 
        backlog_calc = 10}, {rx = 37644225, tx = 37644226, backlog_calc = 10}, {rx = 37644610, tx = 37644611, backlog_calc = 10}, {rx = 37646132, tx = 37646133, backlog_calc = 10}, {rx = 37646516, tx = 37646517, backlog_calc = 10}, {rx = 37648040, 
        tx = 37648040, backlog_calc = 10}, {rx = 37648423, tx = 37648424, backlog_calc = 10}, {rx = 37649945, tx = 37649945, backlog_calc = 10}, {rx = 37650331, tx = 37650331, backlog_calc = 10}, {rx = 37651851, tx = 37651852, backlog_calc = 10}, {
        rx = 37652238, tx = 37652238, backlog_calc = 10}, {rx = 37653758, tx = 37653759, backlog_calc = 10}, {rx = 37654144, tx = 37654145, backlog_calc = 10}, {rx = 37655665, tx = 37655666, backlog_calc = 10}, {rx = 37656051, tx = 37656052, 
        backlog_calc = 10}, {rx = 37657572, tx = 37657573, backlog_calc = 10}, {rx = 37657958, tx = 37657959, backlog_calc = 10}, {rx = 37659480, tx = 37659480, backlog_calc = 10}, {rx = 37659865, tx = 37659866, backlog_calc = 10}, {rx = 37661386, 
        tx = 37661387, backlog_calc = 10}, {rx = 37661772, tx = 37661773, backlog_calc = 10}, {rx = 37663292, tx = 37663293, backlog_calc = 10}, {rx = 37663679, tx = 37663680, backlog_calc = 10}, {rx = 37665199, tx = 37665200, backlog_calc = 10}, {
        rx = 37665586, tx = 37665587, backlog_calc = 10}, {rx = 37667105, tx = 37667106, backlog_calc = 10}, {rx = 37667493, tx = 37667494, backlog_calc = 10}, {rx = 37669012, tx = 37669012, backlog_calc = 10}, {rx = 37669400, tx = 37669401, 
        backlog_calc = 10}, {rx = 37670918, tx = 37670919, backlog_calc = 10}, {rx = 37671307, tx = 37671307, backlog_calc = 10}, {rx = 37672825, tx = 37672826, backlog_calc = 10}, {rx = 37673212, tx = 37673212, backlog_calc = 10}, {rx = 37674732, 
        tx = 37674733, backlog_calc = 10}, {rx = 37675118, tx = 37675119, backlog_calc = 10}, {rx = 37676639, tx = 37676640, backlog_calc = 10}, {rx = 37677025, tx = 37677025, backlog_calc = 10}, {rx = 37678547, tx = 37678547, backlog_calc = 10}, {
        rx = 37678931, tx = 37678932, backlog_calc = 10}, {rx = 37680453, tx = 37680454, backlog_calc = 10}, {rx = 37680838, tx = 37680839, backlog_calc = 10}, {rx = 37682360, tx = 37682361, backlog_calc = 10}, {rx = 37682745, tx = 37682746, 
        backlog_calc = 10}, {rx = 37684266, tx = 37684267, backlog_calc = 10}, {rx = 37684652, tx = 37684653, backlog_calc = 10}, {rx = 37686173, tx = 37686174, backlog_calc = 10}, {rx = 37686559, tx = 37686560, backlog_calc = 10}, {rx = 37688074, 
        tx = 37688074, backlog_calc = 10}, {rx = 37688466, tx = 37688466, backlog_calc = 10}, {rx = 37689980, tx = 37689980, backlog_calc = 10}, {rx = 37690373, tx = 37690374, backlog_calc = 10}, {rx = 37691886, tx = 37691887, backlog_calc = 10}, {
        rx = 37692280, tx = 37692281, backlog_calc = 10}, {rx = 37693793, tx = 37693794, backlog_calc = 10}, {rx = 37694187, tx = 37694189, backlog_calc = 10}, {rx = 37695700, tx = 37695701, backlog_calc = 10}, {rx = 37696094, tx = 37696095, 
        backlog_calc = 10}, {rx = 37697607, tx = 37697608, backlog_calc = 10}, {rx = 37698000, tx = 37698000, backlog_calc = 10}, {rx = 37699515, tx = 37699515, backlog_calc = 10}, {rx = 37699905, tx = 37699906, backlog_calc = 10}, {rx = 37701421, 
        tx = 37701421, backlog_calc = 10}, {rx = 37701812, tx = 37701812, backlog_calc = 10}, {rx = 37703327, tx = 37703328, backlog_calc = 10}, {rx = 37703718, tx = 37703718, backlog_calc = 10}, {rx = 37705235, tx = 37705236, backlog_calc = 10}, {
        rx = 37705624, tx = 37705625, backlog_calc = 10}, {rx = 37707142, tx = 37707142, backlog_calc = 10}, {rx = 37707531, tx = 37707532, backlog_calc = 10}, {rx = 37709049, tx = 37709049, backlog_calc = 10}, {rx = 37709439, tx = 37709439, 
        backlog_calc = 10}, {rx = 37710956, tx = 37710957, backlog_calc = 10}, {rx = 37711345, tx = 37711346, backlog_calc = 10}, {rx = 37712862, tx = 37712863, backlog_calc = 10}, {rx = 37713252, tx = 37713253, backlog_calc = 10}, {rx = 37714769, 
        tx = 37714769, backlog_calc = 10}, {rx = 37715160, tx = 37715160, backlog_calc = 10}, {rx = 37716675, tx = 0, backlog_calc = 10}, {rx = 0, tx = 0, backlog_calc = 0}, {rx = 37623253, tx = 37623254, backlog_calc = 10}, {rx = 37623633, tx = 37623634, 
        backlog_calc = 10}, {rx = 37625161, tx = 37625162, backlog_calc = 10}, {rx = 37625540, tx = 37625541, backlog_calc = 10}}}, orf_token_discard = 1, waiting_trans_ack = 0, token_recv_event_handle = 0xc7e1e0, token_sent_event_handle = 0xc75ba0, 
  commit_token_storage = "\004\000\"\377\002\000\000\000\001\000\000\000\002\000\000\000\002\000\300\250zj\b\000\004\000\300\250z\377\t\000\003\000\244\021\000\000\000\000\000\000\000\000\000\000\001\000\000\000\003\000\000\000\002\000\000\000\002\000\300\250zj\b\000\004\000\300\250z\377\t\000\003", '\000' <repeats 23 times>, "\003\000\000\000\002\000\300\250zk\b\000\004\000\300\250z\377\t\000\003", '\000' <repeats 23 times>, "\004\000\000\000\002\000\300\250zl\b\000\004\000\300\250z\377\t\000\003", '\000' <repeats 23 times>, "\001\000\000\000\002\000\300\250zi\b\000\004\000\300\250z\377\t\000\003\000\240\021\000\000\000\000\000\000I\002\000\000I\002\000\000\001", '\000' <repeats 39774 times>}
Comment 2 Jan Friesse 2013-07-04 01:14:30 EDT
Andrew, this is known problem (assert). Growing retransmit list is sign of network problem.

*** This bug has been marked as a duplicate of bug 854216 ***
Comment 3 Andrew Beekhof 2013-07-09 02:52:46 EDT
If the root cause is a network problem, I would expect the other cluster (running rhel7) to be affected too.  That is not the case here.

Also, this has happened on two unrelated clusters (different underlying hardware).

Note You need to log in before you can comment on or make changes to this bug.