505594 – semaphore leak during cluster startup/shutdown cycle

Bug 505594 - semaphore leak during cluster startup/shutdown cycle

Summary: semaphore leak during cluster startup/shutdown cycle

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	cman
Sub Component:
Version:	5.4
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Steven Dake
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	506778
TreeView+	depends on / blocked

Reported:	2009-06-12 14:38 UTC by Nate Straz
Modified:	2016-04-26 14:49 UTC (History)
CC List:	6 users (show)
Fixed In Version:	openais-0.80.6-6 and cman-2.0.108-1.el5
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	506778 (view as bug list)
Environment:
Last Closed:	2009-09-02 11:10:35 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
patch to resolve problem (9.76 KB, application/octet-stream) 2009-06-18 04:08 UTC, Steven Dake	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2009:1341	0	normal	SHIPPED_LIVE	Low: cman security, bug fix, and enhancement update	2009-09-01 10:43:16 UTC

Description Nate Straz 2009-06-12 14:38:21 UTC

Description of problem:

While trying to verify another bug I found that after starting and stopping the cluster 40-45 times, groupd and gfs_controld would get stuck in a busy loop.

semget(0x330681cf, 3, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device)
geteuid()                               = 0
semget(0x94fea7e, 3, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device)
geteuid()                               = 0


Version-Release number of selected component (if applicable):
cman-2.0.98-1.el5_3.4
openais-0.80.3-22.el5_3.8


How reproducible:
Unknown

Steps to Reproduce:
1. while true; do service cman start; service cman stop; done
  
Actual results:
groupd and gfs_controld get stuck in a busy loop because they can't get a semaphore

Expected results:
1. who ever is leaking semaphores doesn't
2. daemons should not get stuck when they can't get a semaphore

Additional info:

Comment 1 Nate Straz 2009-06-12 15:06:03 UTC

I was able to reproduce this on the first try.  It took 43 `service cman starts` to get to this point.

[root@z3 ~]# ipcs -s -u

------ Semaphore Status --------
used arrays = 128
allocated semaphores = 384

Comment 2 Nate Straz 2009-06-12 15:07:58 UTC

[root@z3 ~]# ipcs -s -l

------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 32
semaphore max value = 32767

Comment 5 Steven Dake 2009-06-16 04:17:05 UTC

patch looks good chrissie.  I'll give it a go with latest openais.

Comment 6 Steven Dake 2009-06-16 04:20:36 UTC

changed to 5.4, this bug doesn't exist in 5.3, only 5.3.z.

Comment 7 Steven Dake 2009-06-18 04:08:31 UTC

Created attachment 348375 [details]
patch to resolve problem

Comment 8 Christine Caulfield 2009-06-18 07:32:48 UTC

cman patch:

commit 36ac90224ae2668723df9679d1faecfebd3975c6
Author: Christine Caulfield <ccaulfie>
Date:   Thu Jun 18 08:04:50 2009 +0100

    cman: Use the new openais exit APIs

Comment 11 errata-xmlrpc 2009-09-02 11:10:35 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1341.html

Note You need to log in before you can comment on or make changes to this bug.