Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 531795

Summary: Occasionally a node within a 5-node RHEL 5.4 cluster is fenced for no apparent reason resulting in a core dump of aisexec.
Product: Red Hat Enterprise Linux 5 Reporter: Everett Bennett <everett.bennett>
Component: openaisAssignee: Christine Caulfield <ccaulfie>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Cluster QE <mspqa-list>
Severity: urgent Docs Contact:
Priority: low    
Version: 5.4CC: adas, bmarzins, cluster-maint, edamato, jesse.marlin, John.Hadad, mwaite, nikb321, redhat, sdake, sghosh, stefan.hurwitz, swhiteho, teigland
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 525280 Environment:
Last Closed: 2010-04-12 08:42:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Everett Bennett 2009-10-29 14:22:51 UTC
+++ This bug was initially created as a clone of Bug #525280 +++


Description of problem:
-----------------------           
Occassionally a node within a 5-node RHEL 5.4 cluster is fenced for no apparent reason resulting in a core dump of aisexec.

Version-Release number of selected component (if applicable):

root@med1:/var/lib/openais> uname -a
Linux med1 2.6.18-164.2.1.el5 #1 SMP Mon Sep 21 04:37:42 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
root@med1:/var/lib/openais> rpm -qa|grep openais
openais-0.80.6-8.el5_4.1
openais-debuginfo-0.80.6-8.el5_4.1

root@med1:/var/lib/openais> ls -ld core*
-rw------- 1 root root 29044736 Oct 22 16:25 core.14301

How reproducible:

Intermittent, but has happened more than 1 time.  2 Core traces are listed below.
  
Actual results:

root@med1:/var/lib/openais> alias ls=ls
root@med1:/var/lib/openais> rpm -ivh /root/openais-debuginfo-0.80.6-8.el5_4.1.x86_64.rpm 
Preparing...                ########################################### [100%]
   1:openais-debuginfo      ########################################### [100%]
root@med1:/var/lib/openais> gdb /usr/sbin/aisexec core.14301
GNU gdb Fedora (6.8-37.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...

warning: Can't read pathname for load map: Input/output error.
Reading symbols from /lib64/libdl.so.2...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libpthread.so.0...done.
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /usr/libexec/lcrso/objdb.lcrso...Reading symbols from /usr/lib/debug/usr/libexec/lcrso/objdb.lcrso.debug...done.
done.
Loaded symbols for /usr/libexec/lcrso/objdb.lcrso
Reading symbols from /usr/libexec/lcrso/service_cman.lcrso...done.
Loaded symbols for /usr/libexec/lcrso/service_cman.lcrso
Reading symbols from /lib64/libnss_files.so.2...done.
Loaded symbols for /lib64/libnss_files.so.2
Reading symbols from /usr/libexec/lcrso/service_evs.lcrso...Reading symbols from /usr/lib/debug/usr/libexec/lcrso/service_evs.lcrso.debug...done.
done.
Loaded symbols for /usr/libexec/lcrso/service_evs.lcrso
Reading symbols from /usr/libexec/lcrso/service_clm.lcrso...Reading symbols from /usr/lib/debug/usr/libexec/lcrso/service_clm.lcrso.debug...done.
done.
Loaded symbols for /usr/libexec/lcrso/service_clm.lcrso
Reading symbols from /usr/libexec/lcrso/service_amf.lcrso...Reading symbols from /usr/lib/debug/usr/libexec/lcrso/service_amf.lcrso.debug...done.
done.
Loaded symbols for /usr/libexec/lcrso/service_amf.lcrso
Reading symbols from /usr/libexec/lcrso/service_ckpt.lcrso...Reading symbols from /usr/lib/debug/usr/libexec/lcrso/service_ckpt.lcrso.debug...done.
done.
Loaded symbols for /usr/libexec/lcrso/service_ckpt.lcrso
Reading symbols from /usr/libexec/lcrso/service_evt.lcrso...Reading symbols from /usr/lib/debug/usr/libexec/lcrso/service_evt.lcrso.debug...done.
done.
Loaded symbols for /usr/libexec/lcrso/service_evt.lcrso
Reading symbols from /usr/libexec/lcrso/service_lck.lcrso...Reading symbols from /usr/lib/debug/usr/libexec/lcrso/service_lck.lcrso.debug...done.
done.
Loaded symbols for /usr/libexec/lcrso/service_lck.lcrso
Reading symbols from /usr/libexec/lcrso/service_msg.lcrso...Reading symbols from /usr/lib/debug/usr/libexec/lcrso/service_msg.lcrso.debug...done.
done.
Loaded symbols for /usr/libexec/lcrso/service_msg.lcrso
Reading symbols from /usr/libexec/lcrso/service_cfg.lcrso...Reading symbols from /usr/lib/debug/usr/libexec/lcrso/service_cfg.lcrso.debug...done.
done.
Loaded symbols for /usr/libexec/lcrso/service_cfg.lcrso
Reading symbols from /usr/libexec/lcrso/service_cpg.lcrso...Reading symbols from /usr/lib/debug/usr/libexec/lcrso/service_cpg.lcrso.debug...done.
done.
Loaded symbols for /usr/libexec/lcrso/service_cpg.lcrso
Reading symbols from /usr/libexec/lcrso/service_confdb.lcrso...Reading symbols from /usr/lib/debug/usr/libexec/lcrso/service_confdb.lcrso.debug...done.
done.
Loaded symbols for /usr/libexec/lcrso/service_confdb.lcrso
Reading symbols from /lib64/libgcc_s.so.1...done.
Loaded symbols for /lib64/libgcc_s.so.1

warning: Can't read pathname for load map: Input/output error.

warning: Can't read pathname for load map: Input/output error.
Core was generated by `aisexec'.
Program terminated with signal 11, Segmentation fault.
[New process 14301]
[New process 14362]
[New process 14337]
[New process 14333]
[New process 14314]
[New process 14302]
#0  0x00002aaaaaeb614b in unbind_con () from /usr/libexec/lcrso/service_cman.lcrso
(gdb) bt
#0  0x00002aaaaaeb614b in unbind_con () from /usr/libexec/lcrso/service_cman.lcrso
#1  0x00002aaaaaeb3728 in ?? () from /usr/libexec/lcrso/service_cman.lcrso
#2  0x00002aaaaaeb38db in ?? () from /usr/libexec/lcrso/service_cman.lcrso
#3  0x00002aaaaaeb3a61 in send_status_return () from /usr/libexec/lcrso/service_cman.lcrso
#4  0x00002aaaaaeb6f55 in send_to_userport () from /usr/libexec/lcrso/service_cman.lcrso
#5  0x00002aaaaaeb4c49 in ?? () from /usr/libexec/lcrso/service_cman.lcrso
#6  0x0000000000415165 in app_deliver_fn (nodeid=1, iovec=<value optimized out>, iov_len=1, endian_conversion_required=0)
    at totempg.c:460
#7  0x00000000004155ec in totempg_deliver_fn (nodeid=1, iovec=<value optimized out>, iov_len=<value optimized out>, 
    endian_conversion_required=0) at totempg.c:604
#8  0x0000000000410418 in messages_deliver_to_app (instance=0x2aaaaaaae010, skip=0, end_point=<value optimized out>)
    at totemsrp.c:3548
#9  0x000000000041203c in message_handler_orf_token (instance=0x2aaaaaaae010, msg=<value optimized out>, 
    msg_len=<value optimized out>, endian_conversion_needed=<value optimized out>) at totemsrp.c:3420
#10 0x0000000000409f43 in rrp_deliver_fn (context=0x1a537430, msg=0x1a550e28, msg_len=70) at totemrrp.c:1308
#11 0x00000000004084fb in net_deliver_fn (handle=<value optimized out>, fd=<value optimized out>, 
    revents=<value optimized out>, data=0x1a550780) at totemnet.c:695
#12 0x0000000000405d10 in poll_run (handle=0) at aispoll.c:402
#13 0x0000000000418834 in main (argc=<value optimized out>, argv=<value optimized out>) at main.c:620
(gdb) 
#0  0x00002aaaaaeb614b in unbind_con () from /usr/libexec/lcrso/service_cman.lcrso
#1  0x00002aaaaaeb3728 in ?? () from /usr/libexec/lcrso/service_cman.lcrso
#2  0x00002aaaaaeb38db in ?? () from /usr/libexec/lcrso/service_cman.lcrso
#3  0x00002aaaaaeb3a61 in send_status_return () from /usr/libexec/lcrso/service_cman.lcrso
#4  0x00002aaaaaeb6f55 in send_to_userport () from /usr/libexec/lcrso/service_cman.lcrso
#5  0x00002aaaaaeb4c49 in ?? () from /usr/libexec/lcrso/service_cman.lcrso
#6  0x0000000000415165 in app_deliver_fn (nodeid=1, iovec=<value optimized out>, iov_len=1, endian_conversion_required=0)
    at totempg.c:460
#7  0x00000000004155ec in totempg_deliver_fn (nodeid=1, iovec=<value optimized out>, iov_len=<value optimized out>, 
    endian_conversion_required=0) at totempg.c:604
#8  0x0000000000410418 in messages_deliver_to_app (instance=0x2aaaaaaae010, skip=0, end_point=<value optimized out>)
    at totemsrp.c:3548
#9  0x000000000041203c in message_handler_orf_token (instance=0x2aaaaaaae010, msg=<value optimized out>, 
    msg_len=<value optimized out>, endian_conversion_needed=<value optimized out>) at totemsrp.c:3420
#10 0x0000000000409f43 in rrp_deliver_fn (context=0x1a537430, msg=0x1a550e28, msg_len=70) at totemrrp.c:1308
#11 0x00000000004084fb in net_deliver_fn (handle=<value optimized out>, fd=<value optimized out>, 
    revents=<value optimized out>, data=0x1a550780) at totemnet.c:695
#12 0x0000000000405d10 in poll_run (handle=0) at aispoll.c:402
#13 0x0000000000418834 in main (argc=<value optimized out>, argv=<value optimized out>) at main.c:620
(gdb) quit

Comment 1 Everett Bennett 2009-10-29 14:26:08 UTC
I cloned the bug report that we saw at at RHEL 5.3 site for another site which was running RHEL 5.4.  If you have a patch to test, please advise and note this report.

Regards

Everett

Comment 2 Steven Dake 2009-10-29 15:12:34 UTC
core dump appears to be in cman.  Reassigning to Chrissie for further investigation.

Comment 3 Christine Caulfield 2009-10-29 15:43:14 UTC
Is this a very busy system? It looks like it could possibly be a large number of queued messages from CMAN. However it's a very large number so I'm not really sure at the moment. It would account for the different core dumps from 5.3 and 5.4 though.

Is it possible to add this to cluster.conf, inside the <cluster> section"

<cman max_queued="1024"/>

thanks
Chrissie