Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 505594 - semaphore leak during cluster startup/shutdown cycle
semaphore leak during cluster startup/shutdown cycle
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cman (Show other bugs)
5.4
All Linux
low Severity medium
: rc
: ---
Assigned To: Steven Dake
Cluster QE
: Regression
Depends On:
Blocks: 506778
  Show dependency treegraph
 
Reported: 2009-06-12 10:38 EDT by Nate Straz
Modified: 2016-04-26 10:49 EDT (History)
6 users (show)

See Also:
Fixed In Version: openais-0.80.6-6 and cman-2.0.108-1.el5
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 506778 (view as bug list)
Environment:
Last Closed: 2009-09-02 07:10:35 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
patch to resolve problem (9.76 KB, application/octet-stream)
2009-06-18 00:08 EDT, Steven Dake
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:1341 normal SHIPPED_LIVE Low: cman security, bug fix, and enhancement update 2009-09-01 06:43:16 EDT

  None (edit)
Description Nate Straz 2009-06-12 10:38:21 EDT
Description of problem:

While trying to verify another bug I found that after starting and stopping the cluster 40-45 times, groupd and gfs_controld would get stuck in a busy loop.

semget(0x330681cf, 3, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device)
geteuid()                               = 0
semget(0x94fea7e, 3, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device)
geteuid()                               = 0


Version-Release number of selected component (if applicable):
cman-2.0.98-1.el5_3.4
openais-0.80.3-22.el5_3.8


How reproducible:
Unknown

Steps to Reproduce:
1. while true; do service cman start; service cman stop; done
  
Actual results:
groupd and gfs_controld get stuck in a busy loop because they can't get a semaphore

Expected results:
1. who ever is leaking semaphores doesn't
2. daemons should not get stuck when they can't get a semaphore

Additional info:
Comment 1 Nate Straz 2009-06-12 11:06:03 EDT
I was able to reproduce this on the first try.  It took 43 `service cman starts` to get to this point.

[root@z3 ~]# ipcs -s -u

------ Semaphore Status --------
used arrays = 128
allocated semaphores = 384
Comment 2 Nate Straz 2009-06-12 11:07:58 EDT
[root@z3 ~]# ipcs -s -l

------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 32
semaphore max value = 32767
Comment 5 Steven Dake 2009-06-16 00:17:05 EDT
patch looks good chrissie.  I'll give it a go with latest openais.
Comment 6 Steven Dake 2009-06-16 00:20:36 EDT
changed to 5.4, this bug doesn't exist in 5.3, only 5.3.z.
Comment 7 Steven Dake 2009-06-18 00:08:31 EDT
Created attachment 348375 [details]
patch to resolve problem
Comment 8 Christine Caulfield 2009-06-18 03:32:48 EDT
cman patch:

commit 36ac90224ae2668723df9679d1faecfebd3975c6
Author: Christine Caulfield <ccaulfie@redhat.com>
Date:   Thu Jun 18 08:04:50 2009 +0100

    cman: Use the new openais exit APIs
Comment 11 errata-xmlrpc 2009-09-02 07:10:35 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1341.html

Note You need to log in before you can comment on or make changes to this bug.