Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 505594

Summary: semaphore leak during cluster startup/shutdown cycle
Product: Red Hat Enterprise Linux 5 Reporter: Nate Straz <nstraz>
Component: cmanAssignee: Steven Dake <sdake>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: low    
Version: 5.4CC: ccaulfie, cfeist, cluster-maint, edamato, larry.arnaud, sghosh
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: openais-0.80.6-6 and cman-2.0.108-1.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 506778 (view as bug list) Environment:
Last Closed: 2009-09-02 11:10:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 506778    
Attachments:
Description Flags
patch to resolve problem none

Description Nate Straz 2009-06-12 14:38:21 UTC
Description of problem:

While trying to verify another bug I found that after starting and stopping the cluster 40-45 times, groupd and gfs_controld would get stuck in a busy loop.

semget(0x330681cf, 3, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device)
geteuid()                               = 0
semget(0x94fea7e, 3, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device)
geteuid()                               = 0


Version-Release number of selected component (if applicable):
cman-2.0.98-1.el5_3.4
openais-0.80.3-22.el5_3.8


How reproducible:
Unknown

Steps to Reproduce:
1. while true; do service cman start; service cman stop; done
  
Actual results:
groupd and gfs_controld get stuck in a busy loop because they can't get a semaphore

Expected results:
1. who ever is leaking semaphores doesn't
2. daemons should not get stuck when they can't get a semaphore

Additional info:

Comment 1 Nate Straz 2009-06-12 15:06:03 UTC
I was able to reproduce this on the first try.  It took 43 `service cman starts` to get to this point.

[root@z3 ~]# ipcs -s -u

------ Semaphore Status --------
used arrays = 128
allocated semaphores = 384

Comment 2 Nate Straz 2009-06-12 15:07:58 UTC
[root@z3 ~]# ipcs -s -l

------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 32
semaphore max value = 32767

Comment 5 Steven Dake 2009-06-16 04:17:05 UTC
patch looks good chrissie.  I'll give it a go with latest openais.

Comment 6 Steven Dake 2009-06-16 04:20:36 UTC
changed to 5.4, this bug doesn't exist in 5.3, only 5.3.z.

Comment 7 Steven Dake 2009-06-18 04:08:31 UTC
Created attachment 348375 [details]
patch to resolve problem

Comment 8 Christine Caulfield 2009-06-18 07:32:48 UTC
cman patch:

commit 36ac90224ae2668723df9679d1faecfebd3975c6
Author: Christine Caulfield <ccaulfie>
Date:   Thu Jun 18 08:04:50 2009 +0100

    cman: Use the new openais exit APIs

Comment 11 errata-xmlrpc 2009-09-02 11:10:35 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1341.html