I have the following system, pentium II, 256MB ram, redhat6.2 with kernel 2.2.14-5.0. I run LVS (pirahna) and ip masq. running vmstat I see the amount of free (real) memory decrease by 1mb every 1/2 hour untill LVS dies. (and mostly everything else) I do not see the memory usage reflected in ps or top listings, I assume LVS and/or masquarading use kernel memory. The issue is multiplied when I put the load balancer under high load. Terry.
The follow on. Thinking it might be a kernel issue, attempted to upgrade to latest available kernel, LVS did not want to start under the 2.2.16-5smp (and non smp). remaining with the existing kernel, wrote a script to monitor free memory, if memory ran out it would restart pulse (lvs).. System died after about 9 hours of operation. message on screen was from vfs: file-max limit 4096 reached. Have trippled Inode and filehandle descriptors. waiting to see how things go. Is pulse not able to reap it's used hash table entries? is there any limit on the hash table entries? and input from redhat would be appreciated.
This is going to take some looking into. I am not aware of a memory leak in piranha, and it passes all electric fence type testing, but it is always possible. It would be interesting to now if you get different results with different load blancing options. Also; what do you mean by LVS not working with the latest kernel? In what way? Your later entry certainly makes it sound like its external to piranha. Pulse does nothing complex or clever -- it's just a socket I/O daemon that performs forks. It is more likely that LVS is consuming memory. What kind of setup are you using (i.e. config file)? Is persistence involved for example?
I have done a heap more testing, these are my findings. Not configuring 'rsh uptime' on the www servers causes the system to loose a small chunck of memory each time nanny tries to guage the remote load. This memory is only returned by restarting lvs. Seting up rsh (bleh) does stop this memory depreciation. For a system under _HEAVEY_ load, 4096 file descriptors is not enough. the linux kernel does not relinquish allocted file descriptors (nor inodes for that matter) so under really high load, it's only a matter of time... I've trippled the file descriptors and the system is still happy. - for now - Sorry, i meant to say lvs instead of pulse. config file is non persistance, standard config file as per faq. primary = x.x.x.x nat_router = 192.168.1.254 eth1:1 service = lvs virtual www { address = x.x.x.x eth0:1 active = 1 server 3of9 { address = 192.168.1.13 active = 1 weight = 2000 } server 4of9 { address = 192.168.1.14 active = 1 weight = 2000 } [....trunc'd] } as for the lastest kernel issue, when I start lvs with it, and no www servers in sight, lvs starts, and tries to do it's thing.. however, if I start lvs with web servers in place, lvs goes defunct. very odd.. I have no other info than that. (rebooting with the old kernel and everything works!!) Terry
LVS (the kernel, not the program) will consume memory based on the number of conenctions that have occurred. Because of caching (and persistence if enabled) this memory will hang a roung for quite a while, but I would not expect it to be permanently lost. Saying that restarting the lvs program frees up the memory is certaily an indication that it may be within the program itself, or a child. Humm. This will take some time to check into, so I will apologize in advance that you won't hear back on this anytime soon. If you are comfortable playing with source code, here are some things you can try. Perhaps you can pin the problem down and tell us :-) There are some memory testing tools (that should be either on the Red Hat CD's or on "www.freshmeat.net". These are tools and libraries that, if you link them into your program ahead of the main linux libraries, will intercept memory calls and provide running output. Information can be found in the "Linux Application Development" book, or the tool documentation. Electric fence (libefence.a) -- Finds buffer overruns and underruns on malloc'd memory. Can also find memory alignment problems. Works well on strings. Checker (checkergcc) -- Finds memory leaks and overflows mcheck (mpr.a) -- Finds memory corruptions but cannot show where they occur mpr -- finds memory leaks
After some investigation, it is possible that this problem was corrected in the current release. Can you try the latest RPMs on http://people.redhat.com/kbarrett/HA/software.html
Have not hear back. Are you still having this problem?
to date I have had no more issues with it after 1) setting up rsh to allow the dynamic weighting to happen 2) increasing the file descriptors. you can close the bug report, I will reopen/append if it happens again.