175134 – Memory leak in nanny.c

Bug 175134 - Memory leak in nanny.c

Summary: Memory leak in nanny.c

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	piranha
Sub Component:
Version:	3
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Stanko Kupcevic
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2005-12-06 21:14 UTC by Lon Hohberger
Modified:	2009-04-16 20:13 UTC (History)
CC List:	1 user (show)
Fixed In Version:	0.8.3-1
Clone Of:
Environment:
Last Closed:	2007-05-10 18:57:18 UTC
Embargoed:

Attachments	(Terms of Use)

Description Lon Hohberger 2005-12-06 21:14:33 UTC

+++ This bug was initially created as a clone of Bug #174315 +++

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050524
Fedora/1.0.4-4 Firefox/1.0.4

Description of problem:
When using an external send program (-e), nanny fails to deallocate the result
buffer and leaks memory upon every external invocation.



Version-Release number of selected component (if applicable):
piranha-0.7.0 through piranha-0.8.1

How reproducible:
Always

Steps to Reproduce:
1. start a nanny instance with a verbose external check program executed often (1s):

nanny -c -h 192.168.0.0 -p 1234 -e /sbin/lspci -x BLAH -q -t 1 --lvs&

2. watch the process memory footprint

watch -n1 "cat /proc/$(PID)/status"
  

Actual Results:  The memory footprint continues to grow indefinitely.

Expected Results:  The memory footprint should stabilize.

Additional info:

The actual leak is in nanny.c::external_check() which fails to deallocate the
"result" buffer allocated by getExecOutput() using strdup:

        result = getExecOutput (flags, argv, timeout);

        if (expect_str != NULL) {
                if (strcmp (expect_str, result) != 0) {
                        piranha_log (flags, (char *)
                                     "Trouble. Recieved results are not what we 
expected from (%s)\n",
                                     inet_ntoa (*remoteAddr));
                        return 1;
                } else {
                        return 0;
                }

        }

A patch will be available shortly.

-- Additional comment from fmalita on 2005-11-27 14:04 EST --
Created an attachment (id=121522)
Fix for the nanny memory leak.


-- Additional comment from lhh on 2005-11-28 09:39 EST --
Patch looks correct.

-- Additional comment from lhh on 2005-11-28 15:51 EST --
Bump


-----------------------------
Duplicate copy for RHCS3

Comment 1 Lon Hohberger 2006-01-04 18:05:15 UTC

patch already in CVS

Comment 2 Albert Graham 2006-02-05 04:10:28 UTC

This bug is also in RHEL 4 as well as 3, can you make sure this fix and the
others mentioned below are included in the next release

The whole system hangs (but is still pingable) soon after starting on every
system I've tried, I think this could be an IP_VS bug, there is a patch but it
has "dont think"  it has been applied to the lastest RHEL 4 updates which is 
available here:

http://lkml.org/lkml/2004/11/24/375

This hang is a show stopper as it locks up the primary, then locks up the
failover server ? and with is with just a few users.

My tests included 500 connections/s which worked fine, but that was from a
single source IP address, I think the problem/bug kicks in when there are many
different source IP addresses e.g. web server.

I was hoping to replace a few hardware load balancers and have been working on
trying to get piranha to work reliabily for weeks without success.

Could I also bring your attention to the follow messages on the piranha mailing
list that describe the this "hang" in more detail:

https://www.redhat.com/archives/piranha-list/2005-December/thread.html

Please note, one solution was to use the UP kernel as it was suggested that the
IP_VS/ipvsadm or piranha was not SMP safe, However, I can confirm the same
problems on both UP and SMP kernels (2.6.9-22.0.2.EL), I've tried all previous
kernels with the same results.

Note You need to log in before you can comment on or make changes to this bug.