from linux-cluster: On Tue, Aug 14, 2007 at 10:19:07AM -0500, Chris Harms wrote: > We installed the 5.1 Beta RPMs of the cluster suite and have left our > cluster running unfettered for over a week. It now appears modclusterd > has a slow memory leak. Its consuming 1.5% and climbing of our 16GB of > RAM which is up from 1.3% yesterday. I would be happy to do some tests > and send along the results. Please advise.
Same issue for us with RHEL AS 4.5... quad AMD64, 16GB. One node could show as a little as 46mb while others may show as much as 2.4GB after 28-days running. Restarting the service 'appears' to clear its memory usage.
Do you happen to have ps output for a modclusterd process using that much memory?
Sorry, I already shutoff their services once I saw the increase in memory usage. Only top sorted by M: top - 11:34:48 up 17 days, 22:33, 4 users, load average: 0.33, 0.28, 0.31 Tasks: 279 total, 1 running, 278 sleeping, 0 stopped, 0 zombie Cpu(s): 0.1% us, 0.1% sy, 0.0% ni, 99.6% id, 0.2% wa, 0.0% hi, 0.0% si Mem: 7988824k total, 7137008k used, 851816k free, 227116k buffers Swap: 524280k total, 19356k used, 504924k free, 4596640k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 21482 root 18 2 2463m 2.1g 2.1g S 0 27.2 31:38.95 cache 19421 root 15 -1 2463m 2.0g 2.0g S 0 26.7 2:52.00 cache 21893 root 18 2 2463m 1.8g 1.8g S 0 24.2 1:58.18 cache 16071 root 14 -1 1870m 1.7g 1760 S 0 22.3 1263:22 modclusterd 19408 root 12 -4 2460m 145m 144m S 0 1.9 0:00.29 cache 21478 root 17 0 2463m 138m 138m S 0 1.8 68:15.40 cache 19567 cacheusr 16 -1 2463m 118m 117m S 0 1.5 5:24.04 cache 19962 cacheusr 15 -1 2463m 83m 82m S 0 1.1 0:00.21 cache 13656 root 16 0 91200 79m 48m S 0 1.0 0:01.33 clvmd And another node: rhurst@db5 [~]$ top top - 11:07:27 up 17 days, 3:32, 9 users, load average: 0.59, 0.52, 0.41 Tasks: 302 total, 1 running, 301 sleeping, 0 stopped, 0 zombie Cpu(s): 0.7% us, 1.1% sy, 0.0% ni, 97.0% id, 1.1% wa, 0.0% hi, 0.1% si Mem: 7988824k total, 7966128k used, 22696k free, 201088k buffers Swap: 524280k total, 49828k used, 474452k free, 4786172k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 14203 rhurst 16 0 7464 1176 768 R 0 0.0 0:00.06 top 16333 root 14 -1 2766m 2.4g 1760 S 0 31.1 1273:33 modclusterd 19937 root 5 -10 0 0 0 S 0 0.0 0:26.63 lock_dlm1 19938 root 5 -10 0 0 0 S 0 0.0 0:26.50 lock_dlm2 1 root 16 0 4756 556 460 S 0 0.0 0:23.19 init
Would it be possible to attach your cluster conf (with any confidential info blanked out, of course)?
Created attachment 162046 [details] Copy of our cluster.conf
Anyone still seeing this? I can't reproduce it here.
Closing after over 6 months in NEEDINFO
Yes, one operating cluster has the memory leak problem regularly, means once a week or more often. Surprisingly, our second cluster with the same setup not. The main difference between the (xen) clusters is that the one showing the problems is mixed with 32 and 64bit nodes, the second one is 64bit solely. Which information/logs do you need?