RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 981111 - Corosync assertion failure
Summary: Corosync assertion failure
Keywords:
Status: CLOSED DUPLICATE of bug 854216
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: corosync
Version: 6.4
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: rc
: ---
Assignee: Jan Friesse
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-07-04 04:12 UTC by Andrew Beekhof
Modified: 2013-07-09 06:52 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-04 05:14:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Andrew Beekhof 2013-07-04 04:12:11 UTC
Description of problem:

# gdb corosync /var/lib/corosync/core.23123
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
...
(gdb) where
#0  0x000000307d2328a5 in raise () from /lib64/libc.so.6
#1  0x000000307d234085 in abort () from /lib64/libc.so.6
#2  0x000000307d22ba1e in __assert_fail_base () from /lib64/libc.so.6
#3  0x000000307d22bae0 in __assert_fail () from /lib64/libc.so.6
#4  0x00000030c9013996 in memb_consensus_agreed (instance=0x7fdb2d4bd010) at totemsrp.c:1243
#5  0x00000030c901792f in memb_join_process (instance=0x7fdb2d4bd010, memb_join=0xcc065c) at totemsrp.c:4059
#6  0x00000030c9017cd9 in message_handler_memb_join (instance=0x7fdb2d4bd010, msg=<value optimized out>, msg_len=<value optimized out>, endian_conversion_needed=<value optimized out>) at totemsrp.c:4304
#7  0x00000030c9011728 in rrp_deliver_fn (context=<value optimized out>, msg=0xcc065c, msg_len=333) at totemrrp.c:1747
#8  0x00000030c900c104 in net_deliver_fn (handle=<value optimized out>, fd=<value optimized out>, revents=<value optimized out>, data=0xcbff90) at totemudp.c:1284
#9  0x00000030c90073c2 in poll_run (handle=5450352153329664) at coropoll.c:513
#10 0x0000000000407009 in main (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at main.c:1882


Version-Release number of selected component (if applicable):

corosync-1.4.1-15.el6_4.1.x86_64 

How reproducible:

Reasonably often

Steps to Reproduce:
1. All four nodes running in CTS
2. Execute StopOnebyOne test 
3.

Actual results:

corosync dies at:  

1243		assert (token_memb_entries >= 1);

Expected results:

corosync does not die :)

Additional info:

cluster.conf:

<?xml version="1.0"?>
<cluster config_version="1" name="r6">
  <logging debug="off"/>
  <clusternodes>
    <clusternode name="pcmk-5" nodeid="1">
      <fence>
        <method name="pcmk-redirect">
          <device name="pcmk" port="pcmk-5"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="pcmk-6" nodeid="2">
      <fence>
        <method name="pcmk-redirect">
          <device name="pcmk" port="pcmk-6"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="pcmk-7" nodeid="3">
      <fence>
        <method name="pcmk-redirect">
          <device name="pcmk" port="pcmk-7"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="pcmk-8" nodeid="4">
      <fence>
        <method name="pcmk-redirect">
          <device name="pcmk" port="pcmk-8"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <fencedevices>
    <fencedevice name="pcmk" agent="fence_pcmk"/>
  </fencedevices>
</cluster>


Logs showing the last corosync node starting and then the first one leaving:

Jul  4 12:54:24 pcmk-6 corosync[23123]:   [QUORUM] This node is within the primary component and will provide service.
Jul  4 12:54:24 pcmk-6 corosync[23123]:   [QUORUM] Members[3]: 1 2 3
Jul  4 12:54:24 pcmk-6 corosync[23123]:   [QUORUM] Members[3]: 1 2 3
Jul  4 12:54:24 pcmk-6 corosync[23123]:   [CPG   ] chosen downlist: sender r(0) ip(192.168.122.105) ; members(old:2 left:0)
Jul  4 12:54:24 pcmk-6 corosync[23123]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jul  4 12:56:55 pcmk-6 corosync[23123]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jul  4 12:56:55 pcmk-6 corosync[23123]:   [QUORUM] Members[4]: 1 2 3 4
Jul  4 12:56:55 pcmk-6 corosync[23123]:   [QUORUM] Members[4]: 1 2 3 4
Jul  4 12:56:55 pcmk-6 corosync[23123]:   [CPG   ] chosen downlist: sender r(0) ip(192.168.122.105) ; members(old:3 left:0)
Jul  4 12:56:55 pcmk-6 corosync[23123]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jul  4 12:57:58 pcmk-6 corosync[23123]:   [QUORUM] Members[3]: 2 3 4
Jul  4 12:57:58 pcmk-6 corosync[23123]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jul  4 12:57:58 pcmk-6 corosync[23123]:   [CPG   ] chosen downlist: sender r(0) ip(192.168.122.106) ; members(old:4 left:1)
Jul  4 12:57:58 pcmk-6 corosync[23123]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jul  4 13:00:43 pcmk-6 corosync[23123]:   [TOTEM ] Retransmit List: f0 f1 f2 
Jul  4 13:00:43 pcmk-6 corosync[23123]:   [TOTEM ] Retransmit List: f0 f1 f2 

The retransmit list keeps growing until you get these lines over and over followed by "boom":

Jul  4 13:26:31 pcmk-6 corosync[23123]:   [TOTEM ] Retransmit List: 100 101 102 103 107 108 f0 f1 f2 fc fd fe ff 104 105 106 109 10a 10b 10c 
Jul  4 13:26:32 pcmk-6 corosync[23123]:   [TOTEM ] Retransmit List: 109 10a ec ed ee ef f0 f1 f2 fc fd fe ff 100 101 102 103 104 105 106 107 108 10b 10c 
Jul  4 13:26:33 pcmk-6 corosync[23123]:   [TOTEM ] Retransmit List: 104 105 106 107 108 ef f0 f1 f2 fc fd fe ff 100 101 102 103 109 10a 10b 10c 
Jul  4 13:26:34 pcmk-6 corosync[23123]:   [TOTEM ] Retransmit List: 100 101 102 103 109 10a f0 f1 f2 fc fd fe ff 104 105 106 107 108 10b 10c 
Jul  4 13:26:35 pcmk-6 corosync[23123]:   [TOTEM ] Retransmit List: 107 108 ec ed ee ef f0 f1 f2 fc fd fe ff 100 101 102 103 104 105 106 109 10a 10b 10c 
Jul  4 13:26:36 pcmk-6 corosync[23123]:   [TOTEM ] Retransmit List: 104 105 106 109 10a ef f0 f1 f2 fc fd fe ff 100 101 102 103 107 108 10b 10c 
Jul  4 13:26:37 pcmk-6 corosync[23123]:   [TOTEM ] Retransmit List: 100 101 102 103 107 108 f0 f1 f2 fc fd fe ff 104 105 106 109 10a 10b 10c 
Jul  4 13:26:37 pcmk-6 corosync[23123]:   [TOTEM ] FAILED TO RECEIVE
Jul  4 13:26:49 pcmk-6 abrtd: Directory 'ccpp-2013-07-04-13:26:49-23123' creation detected
Jul  4 13:26:49 pcmk-6 abrt[28436]: Saved core dump of pid 23123 (/usr/sbin/corosync) to /var/spool/abrt/ccpp-2013-07-04-13:26:49-23123 (40853504 bytes)
Jul  4 13:26:49 pcmk-6 abrtd: 'post-create' on '/var/spool/abrt/ccpp-2013-07-04-13:26:49-23123' exited with 1
Jul  4 13:26:49 pcmk-6 abrtd: Corrupted or bad directory '/var/spool/abrt/ccpp-2013-07-04-13:26:49-23123', deleting

Various gdb variables


(gdb) p token_memb_entries
$1 = <value optimized out>


(gdb) p *token_memb
$2 = {addr = {{nodeid = 1407323704, family = 8502, addr = "m\302Kʊ\344Ly\264\206\"\260XqL\237"}, {nodeid = 1506751852, family = 43124, addr = "k\vH\214g\326!\263\375\314-\225D9\001", <incomplete sequence \311>}}}
(gdb) p token_memb[0]
$3 = {addr = {{nodeid = 1407323704, family = 8502, addr = "m\302Kʊ\344Ly\264\206\"\260XqL\237"}, {nodeid = 1506751852, family = 43124, addr = "k\vH\214g\326!\263\375\314-\225D9\001", <incomplete sequence \311>}}}
(gdb) p token_memb[1]
$4 = {addr = {{nodeid = 48, family = 39653, addr = "p\336\373\254\362P\001\000\000\000\003\000\000\000TZ"}, {nodeid = 2145070412, family = 0, addr = "\320\334a\311\377\177\000\000\003\000\000\000\002\000\300\250"}}}
(gdb) p token_memb[2]
$5 = {addr = {{nodeid = 551802, family = 4, addr = "\300\250z\377\t\000\003\000\000\000\000\000\000\000\000"}, {nodeid = 0, family = 0, addr = "\000\000\000\000\000\000\000\000\004\000\000\000\002\000\300\250"}}}
(gdb) p token_memb[3]
$6 = {addr = {{nodeid = 552058, family = 4, addr = "\300\250z\377\t\000\003\000\000\000\000\000\000\000\000"}, {nodeid = 0, family = 0, addr = "\000\000\000\000\000\000\000\000\207\307ƖN$4", <incomplete sequence \331>}}}
(gdb) p token_memb[4]
$7 = {addr = {{nodeid = 1365481784, family = 45312, addr = "\372@\270\262\222\206\377\177\000\000 d\002\311\060"}, {nodeid = 4456448, family = 0, addr = "\000\000\000\000 \337a\311\377\177\000\000\304\341a", <incomplete sequence \311>}}}
(gdb) p token_memb[5]
$8 = {addr = {{nodeid = 32767, family = 58052, addr = "a\311\377\177\000\000\203ܧ\362\264Ǔ1M&"}, {nodeid = 38411, family = 0, addr = "\240\340a\311\377\177\000\000\260\341a\311\377\177\000"}}}
(gdb) p token_memb[6]
$9 = {addr = {{nodeid = 3378635204, family = 32767, addr = "\000\000\364\000\000\000\000\000\000\000 \337a\311\377\177"}, {nodeid = 3788767232, family = 51553, addr = "\377\177\000\000L\305\000\311\060\000\000\000\240\341a", <incomplete sequence \311>}}}
(gdb) p token_memb[7]
$10 = {addr = {{nodeid = 32767, family = 0, addr = "\000\000\000\000\000\000\030\001\000\000\000\000\000\000\320", <incomplete sequence \335>}, {nodeid = 2147469665, family = 0, addr = "\340\065J\336N\256\225\205\365\t\374\213HĤt"}}}
(gdb) p token_memb[8]
$11 = {addr = {{nodeid = 1113511162, family = 49256, addr = "\240f_I\\\352:`r\021\r\341\345Y\315P"}, {nodeid = 3596543632, family = 17199, addr = "@)\177\342\310ܚ\364\244\v\023\220", <incomplete sequence \355>}}}
(gdb) p token_memb[9]
$12 = {addr = {{nodeid = 712195769, family = 34816, addr = "\363\344\301ѐ\200dev\252}\002\\\322\313\067"}, {nodeid = 678181885, family = 6573, addr = "\323\353\310\331\006a\274\230\240\216\377\033\221\000\215\257"}}}
(gdb) p token_memb[10]
$13 = {addr = {{nodeid = 3154467245, family = 55000, addr = "W\355\370ҭXM\365\321\353<\314\354\070Ѻ"}, {nodeid = 1411350569, family = 11550, addr = "W\245s\374\ta\213\326\000\000\000\000\000\000\000"}}}
(gdb) p token_memb[11]
$14 = {addr = {{nodeid = 0, family = 1, addr = "\000\000\000\000\000\000\314\004_3\203i\355\071ՙ"}, {nodeid = 4283034587, family = 37840, addr = "\253f\177\372m>{\341\230\021M\240B\022\251X"}}}


(gdb) p *instance
$16 = {iface_changes = 1, failed_to_recv = 1, fcc_remcast_last = 0, fcc_mcast_last = 0, fcc_remcast_current = 16, consensus_list = {{addr = {addr = {{nodeid = 2, family = 2, addr = "\300\250zj\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, 
            family = 0, addr = '\000' <repeats 15 times>}}}, set = 1}, {addr = {addr = {{nodeid = 4, family = 2, addr = "\300\250zl\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}}, set = 1}, {addr = {
        addr = {{nodeid = 3, family = 2, addr = "\300\250zk\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}}, set = 1}, {addr = {addr = {{nodeid = 3, family = 2, 
            addr = "\300\250zk\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}}, set = 1}, {addr = {addr = {{nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}, {nodeid = 0, family = 0, 
            addr = '\000' <repeats 15 times>}}}, set = 0} <repeats 380 times>}, consensus_list_entries = 1, my_id = {addr = {{nodeid = 2, family = 2, addr = "\300\250zj\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, 
        addr = '\000' <repeats 15 times>}}}, my_proc_list = {{addr = {{nodeid = 2, family = 2, addr = "\300\250zj\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 3, family = 2, 
          addr = "\300\250zk\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 4, family = 2, addr = "\300\250zl\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, 
          addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}} <repeats 381 times>}, my_failed_list = {{addr = {{nodeid = 2, family = 2, 
          addr = "\300\250zj\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 3, family = 2, addr = "\300\250zk\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, 
          addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 4, family = 2, addr = "\300\250zl\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 0, family = 0, 
          addr = '\000' <repeats 15 times>}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}} <repeats 381 times>}, my_new_memb_list = {{addr = {{nodeid = 2, family = 2, addr = "\300\250zj\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, 
          family = 0, addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 3, family = 2, addr = "\300\250zk\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 4, family = 2, 
          addr = "\300\250zl\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 4, family = 2, addr = "\300\250zl\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, 
          addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}} <repeats 380 times>}, my_trans_memb_list = {{addr = {{nodeid = 2, family = 2, 
          addr = "\300\250zj\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 3, family = 2, addr = "\300\250zk\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, 
          addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 4, family = 2, addr = "\300\250zl\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 0, family = 0, 
          addr = '\000' <repeats 15 times>}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}} <repeats 381 times>}, my_memb_list = {{addr = {{nodeid = 2, family = 2, addr = "\300\250zj\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, 
          family = 0, addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 3, family = 2, addr = "\300\250zk\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 4, family = 2, 
          addr = "\300\250zl\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 4, family = 2, addr = "\300\250zl\b\000\004\000\300\250z\377\t\000\003"}, {nodeid = 0, family = 0, 
          addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}} <repeats 380 times>}, my_deliver_memb_list = {{addr = {{nodeid = 0, 
          family = 0, addr = '\000' <repeats 15 times>}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}} <repeats 384 times>}, my_left_memb_list = {{addr = {{nodeid = 1, family = 2, addr = "\300\250zi\b\000\004\000\300\250z\377\t\000\003"}, {
          nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}}, {addr = {{nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}, {nodeid = 0, family = 0, addr = '\000' <repeats 15 times>}}} <repeats 383 times>}, my_proc_list_entries = 3, 
  my_failed_list_entries = 3, my_new_memb_entries = 3, my_trans_memb_entries = 3, my_memb_entries = 3, my_deliver_memb_entries = 0, my_left_memb_entries = 1, my_ring_id = {rep = {nodeid = 2, family = 2, 
      addr = "\300\250zj\b\000\004\000\300\250z\377\t\000\003"}, seq = 4516}, my_old_ring_id = {rep = {nodeid = 2, family = 2, addr = "\300\250zj\b\000\004\000\300\250z\377\t\000\003"}, seq = 4516}, my_aru_count = 2501, 
  my_merge_detect_timeout_outstanding = 0, my_last_aru = 229, my_seq_unchanged = 739, my_received_flg = 1, my_high_seq_received = 266, my_install_seq = 0, my_rotation_counter = 0, my_set_retrans_flg = 0, my_retrans_flg_count = 0, 
  my_high_ring_delivered = 0, heartbeat_timeout = 0, new_message_queue = {head = 379, tail = 368, used = 10, usedhw = 38, size = 783, items = 0xcc5530, size_per_item = 16, iterator = 0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, 
        __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}, new_message_queue_trans = {head = 59, tail = 58, used = 0, usedhw = 3, size = 783, items = 0xcc8630, 
    size_per_item = 16, iterator = 0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}, retrans_message_queue = {
    head = 0, tail = 16383, used = 0, usedhw = 0, size = 16384, items = 0x7fdb2d47c010, size_per_item = 16, iterator = 0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, 
          __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}, regular_sort_queue = {head = 230, size = 16384, items = 0x7fdb2d43b010, items_inuse = 0xc7fa70, items_miss_count = 0xc8fa80, size_per_item = 16, head_seqid = 230, 
    item_count = 16384, pos_max = 266}, recovery_sort_queue = {head = 0, size = 16384, items = 0x7fdb2d3fa010, items_inuse = 0xc9fa90, items_miss_count = 0xcafaa0, size_per_item = 16, head_seqid = 0, item_count = 16384, pos_max = 0}, my_aru = 229, 
  my_high_delivered = 229, token_callback_received_listhead = {next = 0xc75bd0, prev = 0xc7e1e0}, token_callback_sent_listhead = {next = 0xc75ba0, prev = 0xc75ba0}, 
  orf_token_retransmit = "\000\000\"\377\002\000\000\000\f\001\000\000Y.\000\000\345\000\000\000\002\000\000\000\002\000\000\000\002\000\300\250zj\b\000\004\000\300\250z\377\t\000\003\000\244\021\000\000\000\000\000\000\n\000\000\000!\000\000\000\000\000\000\000\023\000\000\000\002\000\000\000\002\000\300\250zj\b\000\004\000\300\250z\377\t\000\003\000\244\021\000\000\000\000\000\000\000\001\000\000\002\000\000\000\002\000\300\250zj\b\000\004\000\300\250z\377\t\000\003\000\244\021\000\000\000\000\000\000\001\001\000\000\002\000\000\000\002\000\300\250zj\b\000\004\000\300\250z\377\t\000\003\000\244\021\000\000\000\000\000\000\002\001\000\000\002\000\000\000\002\000\300\250zj\b\000\004\000\300\250z\377\t\000\003\000\244\021\000\000\000\000\000\000\003\001\000\000\002\000\000\000\002\000\300\250zj\b\000\004\000\300\250z\377\t\000\003\000\244\021\000\000\000\000\000\000\a\001"..., orf_token_retransmit_size = 716, my_token_seq = 11864, timer_pause_timeout = 0xd06170, timer_orf_token_timeout = 0x0, 
  timer_orf_token_retransmit_timeout = 0x0, timer_orf_token_hold_retransmit_timeout = 0x0, timer_merge_detect_timeout = 0x0, memb_timer_state_gather_join_timeout = 0xccc4a0, memb_timer_state_gather_consensus_timeout = 0xcd25c0, 
  memb_timer_state_commit_timeout = 0x0, timer_heartbeat_timeout = 0x0, totemsrp_log_level_security = 4, totemsrp_log_level_error = 3, totemsrp_log_level_warning = 4, totemsrp_log_level_notice = 5, totemsrp_log_level_debug = 7, totemsrp_subsys_id = 6, 
  totemsrp_log_printf = 0x403ca0 <_logsys_log_printf@plt>, memb_state = MEMB_STATE_GATHER, totemsrp_poll_handle = 5450352153329664, mcast_address = {nodeid = 0, family = 2, addr = "\357\300\367B", '\000' <repeats 11 times>}, 
  totemsrp_deliver_fn = 0x30c901af50 <totemmrp_deliver_fn>, totemsrp_confchg_fn = 0x30c901af60 <totemmrp_confchg_fn>, totemsrp_service_ready_fn = 0x4054e0 <main_service_ready>, 
  totemsrp_waiting_trans_ack_cb_fn = 0x30c901b180 <totempg_waiting_trans_ack_cb>, global_seqno = 428, my_token_held = 1, token_ring_id_seq = 4516, last_released = 229, set_aru = 4294967295, old_ring_state_saved = 0, old_ring_state_aru = 585, 
  old_ring_state_high_seq_received = 585, my_last_seq = 268, tv_old = {tv_sec = 0, tv_usec = 0}, totemrrp_context = 0xcbfab0, totem_config = 0x7fffc962b8a0, use_heartbeat = 0, my_trc = 16, my_pbl = 10, my_cbl = 10, pause_timestamp = 37728268192834, 
  commit_token = 0x7fdb2d4ee928, stats = {hdr = {handle = 1222976449883930708, is_dirty = 0, last_updated = 0}, rrp = 0x0, orf_token_tx = 2, orf_token_rx = 11299, memb_merge_detect_tx = 1936, memb_merge_detect_rx = 2077, memb_join_tx = 180, 
    memb_join_rx = 191, mcast_tx = 438, mcast_retx = 35180, mcast_rx = 36683, memb_commit_token_tx = 10, memb_commit_token_rx = 10, token_hold_cancel_tx = 74, token_hold_cancel_rx = 191, operational_entered = 5, operational_token_lost = 0, 
    gather_entered = 7, gather_token_lost = 0, commit_entered = 5, commit_token_lost = 0, recovery_entered = 5, recovery_token_lost = 0, consensus_timeouts = 1, rx_msg_dropped = 0, continuous_gather = 0, continuous_sendmsg_failures = 0, 
    earliest_token = 95, latest_token = 94, token = {{rx = 37627067, tx = 37627068, backlog_calc = 10}, {rx = 37627447, tx = 37627448, backlog_calc = 10}, {rx = 37628974, tx = 37628975, backlog_calc = 10}, {rx = 37629353, tx = 37629354, 
        backlog_calc = 10}, {rx = 37630880, tx = 37630881, backlog_calc = 10}, {rx = 37631260, tx = 37631261, backlog_calc = 10}, {rx = 37632787, tx = 37632788, backlog_calc = 10}, {rx = 37633166, tx = 37633167, backlog_calc = 10}, {rx = 37634693, 
        tx = 37634694, backlog_calc = 10}, {rx = 37635074, tx = 37635074, backlog_calc = 10}, {rx = 37636600, tx = 37636601, backlog_calc = 10}, {rx = 37636981, tx = 37636982, backlog_calc = 10}, {rx = 37638506, tx = 37638507, backlog_calc = 10}, {
        rx = 37638888, tx = 37638889, backlog_calc = 10}, {rx = 37640412, tx = 37640413, backlog_calc = 10}, {rx = 37640795, tx = 37640796, backlog_calc = 10}, {rx = 37642319, tx = 37642319, backlog_calc = 10}, {rx = 37642703, tx = 37642704, 
        backlog_calc = 10}, {rx = 37644225, tx = 37644226, backlog_calc = 10}, {rx = 37644610, tx = 37644611, backlog_calc = 10}, {rx = 37646132, tx = 37646133, backlog_calc = 10}, {rx = 37646516, tx = 37646517, backlog_calc = 10}, {rx = 37648040, 
        tx = 37648040, backlog_calc = 10}, {rx = 37648423, tx = 37648424, backlog_calc = 10}, {rx = 37649945, tx = 37649945, backlog_calc = 10}, {rx = 37650331, tx = 37650331, backlog_calc = 10}, {rx = 37651851, tx = 37651852, backlog_calc = 10}, {
        rx = 37652238, tx = 37652238, backlog_calc = 10}, {rx = 37653758, tx = 37653759, backlog_calc = 10}, {rx = 37654144, tx = 37654145, backlog_calc = 10}, {rx = 37655665, tx = 37655666, backlog_calc = 10}, {rx = 37656051, tx = 37656052, 
        backlog_calc = 10}, {rx = 37657572, tx = 37657573, backlog_calc = 10}, {rx = 37657958, tx = 37657959, backlog_calc = 10}, {rx = 37659480, tx = 37659480, backlog_calc = 10}, {rx = 37659865, tx = 37659866, backlog_calc = 10}, {rx = 37661386, 
        tx = 37661387, backlog_calc = 10}, {rx = 37661772, tx = 37661773, backlog_calc = 10}, {rx = 37663292, tx = 37663293, backlog_calc = 10}, {rx = 37663679, tx = 37663680, backlog_calc = 10}, {rx = 37665199, tx = 37665200, backlog_calc = 10}, {
        rx = 37665586, tx = 37665587, backlog_calc = 10}, {rx = 37667105, tx = 37667106, backlog_calc = 10}, {rx = 37667493, tx = 37667494, backlog_calc = 10}, {rx = 37669012, tx = 37669012, backlog_calc = 10}, {rx = 37669400, tx = 37669401, 
        backlog_calc = 10}, {rx = 37670918, tx = 37670919, backlog_calc = 10}, {rx = 37671307, tx = 37671307, backlog_calc = 10}, {rx = 37672825, tx = 37672826, backlog_calc = 10}, {rx = 37673212, tx = 37673212, backlog_calc = 10}, {rx = 37674732, 
        tx = 37674733, backlog_calc = 10}, {rx = 37675118, tx = 37675119, backlog_calc = 10}, {rx = 37676639, tx = 37676640, backlog_calc = 10}, {rx = 37677025, tx = 37677025, backlog_calc = 10}, {rx = 37678547, tx = 37678547, backlog_calc = 10}, {
        rx = 37678931, tx = 37678932, backlog_calc = 10}, {rx = 37680453, tx = 37680454, backlog_calc = 10}, {rx = 37680838, tx = 37680839, backlog_calc = 10}, {rx = 37682360, tx = 37682361, backlog_calc = 10}, {rx = 37682745, tx = 37682746, 
        backlog_calc = 10}, {rx = 37684266, tx = 37684267, backlog_calc = 10}, {rx = 37684652, tx = 37684653, backlog_calc = 10}, {rx = 37686173, tx = 37686174, backlog_calc = 10}, {rx = 37686559, tx = 37686560, backlog_calc = 10}, {rx = 37688074, 
        tx = 37688074, backlog_calc = 10}, {rx = 37688466, tx = 37688466, backlog_calc = 10}, {rx = 37689980, tx = 37689980, backlog_calc = 10}, {rx = 37690373, tx = 37690374, backlog_calc = 10}, {rx = 37691886, tx = 37691887, backlog_calc = 10}, {
        rx = 37692280, tx = 37692281, backlog_calc = 10}, {rx = 37693793, tx = 37693794, backlog_calc = 10}, {rx = 37694187, tx = 37694189, backlog_calc = 10}, {rx = 37695700, tx = 37695701, backlog_calc = 10}, {rx = 37696094, tx = 37696095, 
        backlog_calc = 10}, {rx = 37697607, tx = 37697608, backlog_calc = 10}, {rx = 37698000, tx = 37698000, backlog_calc = 10}, {rx = 37699515, tx = 37699515, backlog_calc = 10}, {rx = 37699905, tx = 37699906, backlog_calc = 10}, {rx = 37701421, 
        tx = 37701421, backlog_calc = 10}, {rx = 37701812, tx = 37701812, backlog_calc = 10}, {rx = 37703327, tx = 37703328, backlog_calc = 10}, {rx = 37703718, tx = 37703718, backlog_calc = 10}, {rx = 37705235, tx = 37705236, backlog_calc = 10}, {
        rx = 37705624, tx = 37705625, backlog_calc = 10}, {rx = 37707142, tx = 37707142, backlog_calc = 10}, {rx = 37707531, tx = 37707532, backlog_calc = 10}, {rx = 37709049, tx = 37709049, backlog_calc = 10}, {rx = 37709439, tx = 37709439, 
        backlog_calc = 10}, {rx = 37710956, tx = 37710957, backlog_calc = 10}, {rx = 37711345, tx = 37711346, backlog_calc = 10}, {rx = 37712862, tx = 37712863, backlog_calc = 10}, {rx = 37713252, tx = 37713253, backlog_calc = 10}, {rx = 37714769, 
        tx = 37714769, backlog_calc = 10}, {rx = 37715160, tx = 37715160, backlog_calc = 10}, {rx = 37716675, tx = 0, backlog_calc = 10}, {rx = 0, tx = 0, backlog_calc = 0}, {rx = 37623253, tx = 37623254, backlog_calc = 10}, {rx = 37623633, tx = 37623634, 
        backlog_calc = 10}, {rx = 37625161, tx = 37625162, backlog_calc = 10}, {rx = 37625540, tx = 37625541, backlog_calc = 10}}}, orf_token_discard = 1, waiting_trans_ack = 0, token_recv_event_handle = 0xc7e1e0, token_sent_event_handle = 0xc75ba0, 
  commit_token_storage = "\004\000\"\377\002\000\000\000\001\000\000\000\002\000\000\000\002\000\300\250zj\b\000\004\000\300\250z\377\t\000\003\000\244\021\000\000\000\000\000\000\000\000\000\000\001\000\000\000\003\000\000\000\002\000\000\000\002\000\300\250zj\b\000\004\000\300\250z\377\t\000\003", '\000' <repeats 23 times>, "\003\000\000\000\002\000\300\250zk\b\000\004\000\300\250z\377\t\000\003", '\000' <repeats 23 times>, "\004\000\000\000\002\000\300\250zl\b\000\004\000\300\250z\377\t\000\003", '\000' <repeats 23 times>, "\001\000\000\000\002\000\300\250zi\b\000\004\000\300\250z\377\t\000\003\000\240\021\000\000\000\000\000\000I\002\000\000I\002\000\000\001", '\000' <repeats 39774 times>}

Comment 2 Jan Friesse 2013-07-04 05:14:30 UTC
Andrew, this is known problem (assert). Growing retransmit list is sign of network problem.

*** This bug has been marked as a duplicate of bug 854216 ***

Comment 3 Andrew Beekhof 2013-07-09 06:52:46 UTC
If the root cause is a network problem, I would expect the other cluster (running rhel7) to be affected too.  That is not the case here.

Also, this has happened on two unrelated clusters (different underlying hardware).


Note You need to log in before you can comment on or make changes to this bug.