Bug 547828

Summary: There is an invalid assertion in totemsrp which can cause a sigabort.
Product: Red Hat Enterprise Linux 5 Reporter: Steven Dake <sdake>
Component: openaisAssignee: Steven Dake <sdake>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: high    
Version: 5.4CC: cluster-maint, edamato, freznice
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: openais-0.80.6-12.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-03-30 07:48:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
patch to remove assertions. none

Description Steven Dake 2009-12-15 19:08:57 UTC
Description of problem:
there is an invalid assertion in totemsrp.  It expects the membership will always be 2 or greater, which in some cases is not true.

Version-Release number of selected component (if applicable):
openais-0.80.6-8.el5_4

How reproducible:
rare

Steps to Reproduce:
1. not sure what was done to reproduce the issue
2.
3.
  
Actual results:
assertion occured in openais executive

Expected results:
no assertion occurs in openais executive

Additional info:

Comment 1 Steven Dake 2009-12-15 19:10:25 UTC
Stack backtrace was found on Frantisek Reznicek's machine.  The backtrace is as follows:

Thread 1 (process 11529):
#0  0x0000003238630265 in raise () from /lib64/libc.so.6
#1  0x0000003238631d10 in abort () from /lib64/libc.so.6
#2  0x00000032386296e6 in __assert_fail () from /lib64/libc.so.6
#3  0x000000000040d867 in memb_state_gather_enter (instance=0x2aaaaaaad010, 
    gather_from=12) at totemsrp.c:1691
#4  0x000000000040e4b1 in message_handler_memb_join (instance=0x2aaaaaaad010, 
    msg=0x1c706284, msg_len=<value optimized out>, 
    endian_conversion_needed=<value optimized out>) at totemsrp.c:3971
#5  0x0000000000409f6e in rrp_deliver_fn (context=0x1c705bc0, msg=0x1c706284, 
    msg_len=112) at totemrrp.c:1319
#6  0x00000000004084fb in net_deliver_fn (handle=<value optimized out>, 
    fd=<value optimized out>, revents=<value optimized out>, data=0x1c705c00)
    at totemnet.c:695
#7  0x0000000000405d10 in poll_run (handle=0) at aispoll.c:402
#8  0x0000000000418834 in main (argc=<value optimized out>, 
    argv=<value optimized out>) at main.c:620

Comment 2 Steven Dake 2009-12-15 19:12:53 UTC
The assertion at line 1691 (and also in the memb_join_message_send function)
are invalid because they rely on membership count being 2 or greater.  It is
possible for this assertion to trigger with only 1 member in the cluster, since
the my_proc_list[1] may contain the same entry as is contained in
my_proc_list[0] since my_proc_list[1] is not a valid data entry in this
situation

Comment 3 Steven Dake 2009-12-15 19:31:39 UTC
Created attachment 378598 [details]
patch to remove assertions.

Comment 6 errata-xmlrpc 2010-03-30 07:48:19 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0180.html