Description of problem: there is an invalid assertion in totemsrp. It expects the membership will always be 2 or greater, which in some cases is not true. Version-Release number of selected component (if applicable): openais-0.80.6-8.el5_4 How reproducible: rare Steps to Reproduce: 1. not sure what was done to reproduce the issue 2. 3. Actual results: assertion occured in openais executive Expected results: no assertion occurs in openais executive Additional info:
Stack backtrace was found on Frantisek Reznicek's machine. The backtrace is as follows: Thread 1 (process 11529): #0 0x0000003238630265 in raise () from /lib64/libc.so.6 #1 0x0000003238631d10 in abort () from /lib64/libc.so.6 #2 0x00000032386296e6 in __assert_fail () from /lib64/libc.so.6 #3 0x000000000040d867 in memb_state_gather_enter (instance=0x2aaaaaaad010, gather_from=12) at totemsrp.c:1691 #4 0x000000000040e4b1 in message_handler_memb_join (instance=0x2aaaaaaad010, msg=0x1c706284, msg_len=<value optimized out>, endian_conversion_needed=<value optimized out>) at totemsrp.c:3971 #5 0x0000000000409f6e in rrp_deliver_fn (context=0x1c705bc0, msg=0x1c706284, msg_len=112) at totemrrp.c:1319 #6 0x00000000004084fb in net_deliver_fn (handle=<value optimized out>, fd=<value optimized out>, revents=<value optimized out>, data=0x1c705c00) at totemnet.c:695 #7 0x0000000000405d10 in poll_run (handle=0) at aispoll.c:402 #8 0x0000000000418834 in main (argc=<value optimized out>, argv=<value optimized out>) at main.c:620
The assertion at line 1691 (and also in the memb_join_message_send function) are invalid because they rely on membership count being 2 or greater. It is possible for this assertion to trigger with only 1 member in the cluster, since the my_proc_list[1] may contain the same entry as is contained in my_proc_list[0] since my_proc_list[1] is not a valid data entry in this situation
Created attachment 378598 [details] patch to remove assertions.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2010-0180.html