Bug 552211

Summary: dhcpd memory leak when failover configured
Product: Red Hat Enterprise Linux 5 Reporter: RHEL Program Management <pm-rhel>
Component: dhcpAssignee: Jiri Popelka <jpopelka>
Status: CLOSED ERRATA QA Contact: Alexander Todorov <atodorov>
Severity: high Docs Contact:
Priority: urgent    
Version: 5.4CC: atodorov, borgan, herrold, jpopelka, mpoole, ovasik, pm-eus
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-01-14 10:12:16 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 534117    
Bug Blocks:    
Attachments:
Description Flags
valgrind.log
none
valgrind.log iwth dhcpd -f none

Description RHEL Program Management 2010-01-04 11:29:08 UTC
This bug has been copied from bug #534117 and has been proposed
to be backported to 5.4 z-stream (EUS).

Comment 5 Alexander Todorov 2010-01-04 18:49:06 UTC
Jiri,
i've tested with dhcp-3.0.5-21.el5_4.1 (from brew) on RHEL5.4 with 3 virtual systems on isolated network. 

1 primary DHCP server, 2 secondary server and one client. 

On the client I have a script which runs dhclient several hundred times, removing any .leases files between each run. On the primary server I'm continuously running top and monitoring the free memory field. It is decreasing steadily with every loop of the test case and is several megabytes off from the initial value after 2 executions of the script below. 

Is this the correct NVR which fixes this for 5.4.z? It looks so because the patch is applied in the spec file. Is dhcpd leaking memory somewhere else?


Test script:
#!/bin/sh

for i in `seq 200`; do
   echo "----- $i -----"
   killall -9 dhclient
   sleep 1
   /bin/rm /var/lib/dhclient/dhclient*
   dhclient 
   sleep 1
done

Comment 6 Jiri Popelka 2010-01-05 09:32:45 UTC
Alexander,
your script looks good to me.
Dhcpd is leaking memory for each DHCPDISCOVER package received.
Client is broadcasting DHCPDISCOVER message when he's trying to discover available DHCP servers.
Running dhclient and removing any .leases files a while after (because when client has valid lease he skips DHCPDISCOVER message and sends directly DHCPREQUEST message) should be sufficient to test this problem.

dhcp-3.0.5-21.el5_4.1 is the correct NVR.

Comment 7 Alexander Todorov 2010-01-05 13:56:34 UTC
Created attachment 381758 [details]
valgrind.log

memory leak test from
# valgrind -v --log-file=valgrind.log --tool=memcheck --trace-children=yes --leak-check=full --show-reachable=yes /usr/sbin/dhcpd

there are 110 references to omapi related functions which makes me think that we're missing another patch from upstream:

Changes since 3.1.1b1
 * A memory leak when using omapi has been fixed. 

This is with the hotfix from https://bugzilla.redhat.com/show_bug.cgi?id=534117#c1

I've tail -f'ed the valgrind.log file during execution of my test and there were no more errors reported. However the DHCP server was running and answering client requests.

Comment 8 Alexander Todorov 2010-01-05 15:50:35 UTC
Created attachment 381780 [details]
valgrind.log iwth dhcpd -f

memory leak test from
# valgrind -v --log-file=valgrind.log --tool=memcheck --trace-children=yes
--leak-check=full --show-reachable=yes /usr/sbin/dhcpd -f

Starts dhcpd with -f to stay in foreground. Now that we know how to find some leaks I'll re-test it with older package version which should have the bug present and compare the logs.

Comment 9 Alexander Todorov 2010-01-07 16:44:44 UTC
QE has performed some more testing and the results are:

The old version dhcp-3.0.5-21.el5 leaks approximately 20 MB in 30 minutes. 
The patched version dhcp-3.0.5-21.el5_4.1 appears to leak way less and the primary vs. secondary server leak differs: The primary server leaked some 7 MB in 30 minutes, the secondary server leaked some 4 MB in 30 minutes in our test environment. 

We believe that the leak in load_balance_mine() function has been properly fixed and we're moving this bug to VERIFIED. 

There however may be other memory leaks depending on your setup and patches for them may not have been pulled from upstream. If you experience further issues please file a separate bug.

Comment 12 errata-xmlrpc 2010-01-14 10:12:16 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0042.html