13762 – running out of memory (poss mem leak) - LVS

Bug 13762 - running out of memory (poss mem leak) - LVS

Summary: running out of memory (poss mem leak) - LVS

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	piranha
Sub Component:
Version:	6.2
Hardware:	i386
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Phil Copeland
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2000-07-12 00:26 UTC by tmanderson
Modified:	2008-05-01 15:37 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2000-08-08 01:36:26 UTC
Embargoed:

Attachments	(Terms of Use)

Description Red Hat Bugzilla 2000-07-12 00:26:15 UTC

I have the following system,  pentium II, 256MB ram, redhat6.2 with kernel
2.2.14-5.0. I run LVS (pirahna) and ip masq.

running vmstat I see the amount of free (real) memory decrease by 1mb every
1/2 hour
untill LVS dies. (and mostly everything else)

I do not see the memory usage reflected in ps or top listings, I assume LVS
and/or masquarading use kernel memory.

The issue is multiplied when I put the load balancer under high load.

Terry.

Comment 1 Red Hat Bugzilla 2000-07-13 00:27:50 UTC

The follow on. Thinking it might be a kernel issue, attempted to upgrade to
latest available
kernel, LVS did not want to start under the 2.2.16-5smp (and non smp).

remaining with the existing kernel, wrote a script to monitor free memory, if
memory ran out
it would restart pulse (lvs)..

System died after about 9 hours of operation. message on screen was from vfs:
file-max limit 4096 reached.

Have trippled Inode and filehandle descriptors. waiting to see how things go.

Is pulse not able to reap it's used hash table entries? is there any limit on
the
hash table entries?

and input from redhat would be appreciated.

Comment 2 Red Hat Bugzilla 2000-07-13 15:59:07 UTC

This is going to take some looking into. I am not aware of a memory leak in
piranha, and it passes all electric fence type testing, but it is always
possible. It would be interesting to now if you get different results with
different load blancing options.

Also; what do you mean by LVS not working with the latest kernel? In what way?

Your later entry certainly makes it sound like its external to piranha. Pulse
does nothing complex or clever -- it's just a socket I/O daemon that performs
forks.

It is more likely that LVS is consuming memory. What kind of setup are you using
(i.e. config file)? Is persistence involved for example?

Comment 3 Red Hat Bugzilla 2000-07-13 23:18:20 UTC

I have done a heap more testing, these are my findings.

Not configuring 'rsh uptime' on the www servers causes the system to loose a
small
chunck of memory each time nanny tries to guage the remote load. This memory
is only returned by restarting lvs.  Seting up rsh (bleh) does stop this memory
depreciation.

For a system under _HEAVEY_ load, 4096 file descriptors is not enough. the linux
kernel does not relinquish allocted file descriptors (nor inodes for that
matter) so under
really high load, it's only a matter of time... I've trippled the file
descriptors
and the system is still happy. - for now -

Sorry, i meant to say lvs instead of pulse.  config file is non persistance,
standard config
file as per faq.  

primary = x.x.x.x
nat_router = 192.168.1.254 eth1:1
service = lvs

virtual www {
        address = x.x.x.x eth0:1
        active = 1

        server 3of9 {
                address = 192.168.1.13
                active = 1
                weight = 2000
        }
        server 4of9 {
                address = 192.168.1.14
                active = 1
                weight = 2000
        }
 
[....trunc'd]
}


as for the lastest kernel issue, when I start lvs with it, and no www servers in
sight,
lvs starts, and tries to do it's thing..
however, if I start lvs with web servers in place, lvs goes defunct. very odd..
I have no other
info than that. (rebooting with the old kernel and everything works!!)

Terry

Comment 4 Red Hat Bugzilla 2000-07-24 14:52:42 UTC

LVS (the kernel, not the program) will consume memory based on the number of
conenctions that have occurred. Because of caching (and persistence if enabled)
this memory will hang a roung for quite a while, but I would not expect it to
be  permanently lost.

Saying that restarting the lvs program frees up the memory is certaily an
indication that it may be within the program itself, or a child. Humm. This will
take some time to check into, so I will apologize in advance that you won't hear
back on this anytime soon.

If you are comfortable playing with source code, here are some things you can
try. Perhaps you can pin the problem down and tell us :-)

There are some memory testing tools (that should be either on the Red Hat CD's
or on "www.freshmeat.net". These are tools and libraries that, if you link them
into your program ahead of the main linux libraries, will intercept memory calls
and provide running output. Information can be found in the "Linux Application
Development" book, or the tool documentation.

Electric fence (libefence.a) -- Finds buffer overruns and underruns on
malloc'd memory. Can also find memory alignment problems. Works well on strings.

Checker (checkergcc) -- Finds memory leaks and overflows

mcheck (mpr.a) -- Finds memory corruptions but cannot show where they occur

mpr -- finds memory leaks

Comment 5 Red Hat Bugzilla 2000-07-25 16:55:09 UTC

After some investigation, it is possible that this problem was corrected in the
current release.

Can you try the latest RPMs on
http://people.redhat.com/kbarrett/HA/software.html

Comment 6 Red Hat Bugzilla 2000-08-07 14:58:55 UTC

Have not hear back. Are you still having this problem?

Comment 7 Red Hat Bugzilla 2000-08-08 01:36:24 UTC

to date I have had no more issues with it after 
1) setting up rsh to allow the dynamic weighting to happen 
2) increasing the file descriptors.

you can close the bug report, I will reopen/append if it happens again.

Note You need to log in before you can comment on or make changes to this bug.