Bug 505594 - semaphore leak during cluster startup/shutdown cycle
Summary: semaphore leak during cluster startup/shutdown cycle
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cman
Version: 5.4
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Steven Dake
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks: 506778
TreeView+ depends on / blocked
 
Reported: 2009-06-12 14:38 UTC by Nate Straz
Modified: 2016-04-26 14:49 UTC (History)
6 users (show)

Fixed In Version: openais-0.80.6-6 and cman-2.0.108-1.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 506778 (view as bug list)
Environment:
Last Closed: 2009-09-02 11:10:35 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
patch to resolve problem (9.76 KB, application/octet-stream)
2009-06-18 04:08 UTC, Steven Dake
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:1341 0 normal SHIPPED_LIVE Low: cman security, bug fix, and enhancement update 2009-09-01 10:43:16 UTC

Description Nate Straz 2009-06-12 14:38:21 UTC
Description of problem:

While trying to verify another bug I found that after starting and stopping the cluster 40-45 times, groupd and gfs_controld would get stuck in a busy loop.

semget(0x330681cf, 3, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device)
geteuid()                               = 0
semget(0x94fea7e, 3, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device)
geteuid()                               = 0


Version-Release number of selected component (if applicable):
cman-2.0.98-1.el5_3.4
openais-0.80.3-22.el5_3.8


How reproducible:
Unknown

Steps to Reproduce:
1. while true; do service cman start; service cman stop; done
  
Actual results:
groupd and gfs_controld get stuck in a busy loop because they can't get a semaphore

Expected results:
1. who ever is leaking semaphores doesn't
2. daemons should not get stuck when they can't get a semaphore

Additional info:

Comment 1 Nate Straz 2009-06-12 15:06:03 UTC
I was able to reproduce this on the first try.  It took 43 `service cman starts` to get to this point.

[root@z3 ~]# ipcs -s -u

------ Semaphore Status --------
used arrays = 128
allocated semaphores = 384

Comment 2 Nate Straz 2009-06-12 15:07:58 UTC
[root@z3 ~]# ipcs -s -l

------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 32
semaphore max value = 32767

Comment 5 Steven Dake 2009-06-16 04:17:05 UTC
patch looks good chrissie.  I'll give it a go with latest openais.

Comment 6 Steven Dake 2009-06-16 04:20:36 UTC
changed to 5.4, this bug doesn't exist in 5.3, only 5.3.z.

Comment 7 Steven Dake 2009-06-18 04:08:31 UTC
Created attachment 348375 [details]
patch to resolve problem

Comment 8 Christine Caulfield 2009-06-18 07:32:48 UTC
cman patch:

commit 36ac90224ae2668723df9679d1faecfebd3975c6
Author: Christine Caulfield <ccaulfie>
Date:   Thu Jun 18 08:04:50 2009 +0100

    cman: Use the new openais exit APIs

Comment 11 errata-xmlrpc 2009-09-02 11:10:35 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1341.html


Note You need to log in before you can comment on or make changes to this bug.