Description of problem: While trying to verify another bug I found that after starting and stopping the cluster 40-45 times, groupd and gfs_controld would get stuck in a busy loop. semget(0x330681cf, 3, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device) geteuid() = 0 semget(0x94fea7e, 3, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device) geteuid() = 0 Version-Release number of selected component (if applicable): cman-2.0.98-1.el5_3.4 openais-0.80.3-22.el5_3.8 How reproducible: Unknown Steps to Reproduce: 1. while true; do service cman start; service cman stop; done Actual results: groupd and gfs_controld get stuck in a busy loop because they can't get a semaphore Expected results: 1. who ever is leaking semaphores doesn't 2. daemons should not get stuck when they can't get a semaphore Additional info:
I was able to reproduce this on the first try. It took 43 `service cman starts` to get to this point. [root@z3 ~]# ipcs -s -u ------ Semaphore Status -------- used arrays = 128 allocated semaphores = 384
[root@z3 ~]# ipcs -s -l ------ Semaphore Limits -------- max number of arrays = 128 max semaphores per array = 250 max semaphores system wide = 32000 max ops per semop call = 32 semaphore max value = 32767
patch looks good chrissie. I'll give it a go with latest openais.
changed to 5.4, this bug doesn't exist in 5.3, only 5.3.z.
Created attachment 348375 [details] patch to resolve problem
cman patch: commit 36ac90224ae2668723df9679d1faecfebd3975c6 Author: Christine Caulfield <ccaulfie> Date: Thu Jun 18 08:04:50 2009 +0100 cman: Use the new openais exit APIs
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1341.html