Bug 499767 - groupd segfaults on start
Summary: groupd segfaults on start
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: openais
Version: 5.3
Hardware: ppc64
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Steven Dake
QA Contact: Cluster QE
URL:
Whiteboard:
: 501062 (view as bug list)
Depends On:
Blocks: 500914
TreeView+ depends on / blocked
 
Reported: 2009-05-08 03:57 UTC by Ray Van Dolson
Modified: 2018-10-20 00:00 UTC (History)
9 users (show)

Fixed In Version: openais-0.80.6-1.el5_4
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-09-02 11:06:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
cluster configuration file in use (734 bytes, text/plain)
2009-05-08 03:59 UTC, Ray Van Dolson
no flags Details
groupd core (7.25 MB, application/octet-stream)
2009-05-12 15:49 UTC, John Ruemker
no flags Details
another groupd core (7.25 MB, application/octet-stream)
2009-05-12 15:50 UTC, John Ruemker
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2009:1366 0 normal SHIPPED_LIVE openais bug-fix and enhancement update 2009-09-01 11:00:17 UTC
Red Hat Product Errata RHSA-2009:1341 0 normal SHIPPED_LIVE Low: cman security, bug fix, and enhancement update 2009-09-01 10:43:16 UTC

Description Ray Van Dolson 2009-05-08 03:57:58 UTC
Description of problem:
I'm trying to set up a simple 2 node cluster on RHEL 5.3 ppc64 (IBM JS20 blades).  I'm using only manual fencing for simplicity's sake and have no resources defined.

service cman start was spitting out errors, and, upon manual invocation of the startup commands it seems that groupd is segfaulting and not starting up properly.

Version-Release number of selected component (if applicable):
cman-2.0.98-1.el5_3.1

How reproducible:
Always

Steps to Reproduce:
1. Set up a cluster with the attached cluster.conf file on a RHEL53 PPC64 machine.
2. Execute the following commands:

     # mount -t configfs none /sys/kernel/config
     # ccsd
     # cman_tool join
     # groupd

3. Execute ps afx | grep groupd
  
Actual results:
groupd isn't running

Expected results:
groupd should be running

Additional info:
The following shows up in /var/log/groupd:

1241754782 cman: our nodeid 2 name domusB.esri.com quorum 1
1241754782 groupd segfault log follows:

(Nothing further)

A gdb run and backtrace on the groupd process is as follows:

# gdb groupd 
GNU gdb Fedora (6.8-27.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "ppc64-redhat-linux-gnu"...
(gdb) set follow-fork-mode child
(gdb) r
Starting program: /sbin/groupd 

Program received signal SIGSEGV, Segmentation fault.
[Switching to process 671]
0x0ff238b0 in __new_semctl (semid=<value optimized out>, semnum=<value optimized out>, cmd=<value optimized out>) at ../sysdeps/unix/sysv/linux/semctl.c:122
122	      arg = va_arg (ap, union semun);
(gdb) bt
#0  0x0ff238b0 in __new_semctl (semid=<value optimized out>, semnum=<value optimized out>, cmd=<value optimized out>) at ../sysdeps/unix/sysv/linux/semctl.c:122
#1  0x0fe1314c in openais_service_connect (service=<value optimized out>, shmseg=<value optimized out>) at util.c:339
#2  0x0fe13d60 in cpg_initialize (handle=<value optimized out>, callbacks=<value optimized out>) at cpg.c:108
#3  0x1000c96c in setup_cpg () at cpg.c:638
#4  0x10012930 in loop () at main.c:816
#5  0x1001357c in main (argc=1, argv=0xff9efb14) at main.c:1054
(gdb) bt full
#0  0x0ff238b0 in __new_semctl (semid=<value optimized out>, semnum=<value optimized out>, cmd=<value optimized out>) at ../sysdeps/unix/sysv/linux/semctl.c:122
	arg = Could not find the frame base for "__new_semctl".

Comment 1 Ray Van Dolson 2009-05-08 03:59:20 UTC
Created attachment 342986 [details]
cluster configuration file in use

Comment 2 David Teigland 2009-05-08 14:45:58 UTC
Thanks for debugging this for us.  Looks like more regression from openais ipc.

Comment 3 Ray Van Dolson 2009-05-08 19:32:23 UTC
(In reply to comment #2)
> Thanks for debugging this for us.  Looks like more regression from openais ipc.  

Would an older version potentially work for me in the interim?

(Perhaps the dependancy chain hassle would make this not worth the effort)

Comment 4 Ray Van Dolson 2009-05-08 21:15:45 UTC
FYI, reverting to openais-0.80.3-22.el5 appears to have resolved the problem.  I guess the component on this bz should be changed.

Comment 5 Steven Dake 2009-05-08 22:15:19 UTC
Ray,

which version of openais provides this error?   openais-0.80.3-22.el5_3.4?  Works for me on x86_64, i386, maybe a ppc64 only issue.

Regards
-steve

Comment 6 Ray Van Dolson 2009-05-08 22:35:03 UTC
(In reply to comment #5)
> Ray,
> 
> which version of openais provides this error?   openais-0.80.3-22.el5_3.4? 
> Works for me on x86_64, i386, maybe a ppc64 only issue.
> 
> Regards
> -steve  

openais-0.80.3-22.el5_3.4 
  - x86_64: Works fine
  -  ppc64: segfaults

openais-0.80.3-22.el5
  - x86_64: Works fine
  -  ppc64: Works fine


I've reproduced on two ppc64 systems (both JS20's).

Comment 7 John Ruemker 2009-05-12 15:49:38 UTC
Created attachment 343606 [details]
groupd core

Comment 8 John Ruemker 2009-05-12 15:50:11 UTC
Created attachment 343607 [details]
another groupd core

Comment 9 John Ruemker 2009-05-12 15:52:20 UTC
The customer who opened this BZ has a support case open with us as well and has provided us with 2 core files.  These were generated on

openais-0.80.3-22.el5_3.4-ppc
cman-2.0.98-1.el5_3.1-ppc

The groupd log for both shows a message such as:

1241755835 cman: our nodeid 1 name domusA.esri.com quorum 1
1241755835 groupd segfault log follows:

Comment 14 Steven Dake 2009-05-17 11:08:03 UTC
*** Bug 501062 has been marked as a duplicate of this bug. ***

Comment 18 Nate Straz 2009-06-23 21:18:38 UTC
Verified with openais-0.80.6-7.el5 and cman-2.0.108-1.el5 on ppc.

Comment 19 Ray Van Dolson 2009-07-06 18:19:54 UTC
Updated to openais-0.80.3-22.el5_3.8 and this appears to be working now.  Thanks.

Comment 20 Steven Dake 2009-07-07 00:39:05 UTC
happy to be of service.  thanks.

Comment 22 errata-xmlrpc 2009-09-02 11:06:45 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1341.html


Note You need to log in before you can comment on or make changes to this bug.