Bug 706050 - openais aborts in memb_consensus_agreed() with an assertion at totemsrp.c:1114
Summary: openais aborts in memb_consensus_agreed() with an assertion at totemsrp.c:1114
Keywords:
Status: CLOSED DUPLICATE of bug 671575
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: openais
Version: 5.6
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Steven Dake
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-05-19 10:23 UTC by Petr Matousek
Modified: 2016-04-26 14:30 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-05-19 12:00:41 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Petr Matousek 2011-05-19 10:23:35 UTC
Description of problem:

While testing MRG Messaging Bug 681026 with the attached reproducer in loop, openais aborted after 2 hours of test execution.

Coredumb can be found in Additional info

Version-Release number of selected component (if applicable):
openais-0.80.6-28.el5_6.1.i386

How reproducible:
Unknown

Steps to Reproduce:
1.
2.
3.
  
Actual results:
openais aborted

Expected results:
no abort

Additional info:

Core was generated by `/usr/sbin/aisexec'.
Program terminated with signal 6, Aborted.
#0  0x0063c410 in __kernel_vsyscall ()
(gdb) info thread
  5 Thread 19370  0x0063c410 in __kernel_vsyscall ()
  4 Thread 19372  0x0063c410 in __kernel_vsyscall ()
  3 Thread 15277  0x0063c410 in __kernel_vsyscall ()
  2 Thread 15313  0x0063c410 in __kernel_vsyscall ()
* 1 Thread 19369  0x0063c410 in __kernel_vsyscall ()
(gdb) thread apply all bt

Thread 5 (Thread 19370):
#0  0x0063c410 in __kernel_vsyscall ()
#1  0x00491753 in poll () from /lib/libc.so.6
#2  0x08066e7e in prioritized_timer_thread (data=0x0) at timer.c:125
#3  0x0055b832 in start_thread () from /lib/libpthread.so.0
#4  0x0049b45e in clone () from /lib/libc.so.6

Thread 4 (Thread 19372):
#0  0x0063c410 in __kernel_vsyscall ()
#1  0x0056281c in __lll_unlock_wake () from /lib/libpthread.so.0
#2  0x0055fdbb in __condvar_w_cleanup () from /lib/libpthread.so.0
#3  0x080610be in worker_thread (thread_data_in=0x99b0aec) at wthread.c:73
#4  0x0055b832 in start_thread () from /lib/libpthread.so.0
#5  0x0049b45e in clone () from /lib/libc.so.6

Thread 3 (Thread 15277):
#0  0x0063c410 in __kernel_vsyscall ()
#1  0x0049cd1b in semop () from /lib/libc.so.6
#2  0x08065884 in pthread_ipc_consumer (conn=0xb6900500) at ipc.c:412
#3  0x0055b832 in start_thread () from /lib/libpthread.so.0
#4  0x0049b45e in clone () from /lib/libc.so.6

Thread 2 (Thread 15313):
#0  0x0063c410 in __kernel_vsyscall ()
#1  0x0049cd1b in semop () from /lib/libc.so.6
#2  0x08065884 in pthread_ipc_consumer (conn=0xb69007a8) at ipc.c:412
#3  0x0055b832 in start_thread () from /lib/libpthread.so.0
#4  0x0049b45e in clone () from /lib/libc.so.6

Thread 1 (Thread 19369):
#0  0x0063c410 in __kernel_vsyscall ()
#1  0x003f1df0 in raise () from /lib/libc.so.6
#2  0x003f3701 in abort () from /lib/libc.so.6
#3  0x003eb26b in __assert_fail () from /lib/libc.so.6
#4  0x080557d8 in memb_consensus_agreed (instance=0xb7599008) at totemsrp.c:1114
#5  0x08055e7d in memb_join_process (instance=0xb7599008, memb_join=0x99f7bac) at totemsrp.c:3789
#6  0x0805617a in message_handler_memb_join (instance=0xb7599008, msg=0x99f7bac, msg_len=244, endian_conversion_needed=0) at totemsrp.c:4026
#7  0x08053602 in main_deliver_fn (context=0xb7599008, msg=0x99f7bac, msg_len=19369) at totemsrp.c:4180
#8  0x08051030 in none_mcast_recv (rrp_instance=0x99b5510, iface_no=0, context=0xb7599008, msg=0x99f7bac, msg_len=244) at totemrrp.c:461
#9  0x08051164 in rrp_deliver_fn (context=0x99b3d00, msg=0x99f7bac, msg_len=244) at totemrrp.c:1273
#10 0x0804f715 in net_deliver_fn (handle=0, fd=1, revents=1, data=0x99f7580) at totemnet.c:686
#11 0x0804c892 in poll_run (handle=0) at aispoll.c:402
#12 0x08061e25 in main (argc=Cannot access memory at address 0x4ba9
) at main.c:628

Comment 1 Frantisek Reznicek 2011-05-19 11:56:08 UTC
The openais log shows this:
May 18 12:00:01.410147 [ipc.c:0883] connection received from libais client 6.
...
May 18 12:00:32.941746 [ipc.c:0883] connection received from libais client 6.
May 18 12:00:34.776880 [TOTEM] Retransmit List: 51a9da
...
May 18 12:00:34.784165 [TOTEM] Retransmit List: 51a9da
May 18 12:00:34.784399 [TOTEM] Retransmit List: 51a9da
May 18 12:00:34.784634 [TOTEM] Retransmit List: 51a9da
May 18 12:00:34.973978 [TOTEM] Retransmit List: 51a9da
May 18 12:00:35.163959 [TOTEM] Retransmit List: 51a9da
May 18 12:00:35.545239 [TOTEM] FAILED TO RECEIVE
May 18 12:00:35.545305 [TOTEM] entering GATHER state from 6.
May 18 12:00:36.354703 [TOTEM] entering GATHER state from 0.

Comment 2 Jan Friesse 2011-05-19 12:00:41 UTC
Closing as duplicate of OpenAIS bug
https://bugzilla.redhat.com/show_bug.cgi?id=671575
and corosync bug
https://bugzilla.redhat.com/show_bug.cgi?id=636583

*** This bug has been marked as a duplicate of bug 671575 ***

Comment 3 Frantisek Reznicek 2011-05-19 12:19:32 UTC
Possible workaround:
https://lists.linux-foundation.org/pipermail/openais/2011-February/015696.html


Note You need to log in before you can comment on or make changes to this bug.