Bug 734213

Summary: memory leaks in negotiator - call to param_without_default()
Product: Red Hat Enterprise MRG Reporter: Erik Erlandson <eerlands>
Component: condorAssignee: Erik Erlandson <eerlands>
Status: CLOSED ERRATA QA Contact: Tomas Rusnak <trusnak>
Severity: medium Docs Contact:
Priority: high    
Version: 2.0CC: jthomas, ltoscano, matt, mkudlej, trusnak
Target Milestone: 2.1   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 7.6.4-0.3 Doc Type: Bug Fix
Doc Text:
Previously, calls to the param_without_default() function that did not free returned memory resulted in memory leaks in the Negotiator grid daemon. With this update, the calls have been replaced with new, memory-safe param_defined() function, and the memory leaks no longer occur in the described scenario.
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-01-23 17:28:55 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 743350    

Description Erik Erlandson 2011-08-29 18:38:30 UTC
Description of problem:

(from upstream)
I noticed an idiom used several times in the HGQ code to check if a parameter is defined:

	if (NULL != param_without_default("GROUP_QUOTA_MAX_ALLOCATION_ROUNDS"))

This leaks memory if the parameter is defined, because the return value of param_without_default() is not freed.

I suggest adding a new function param_is_defined(name). It could probably be implemented more efficiently than by calling param_without_default(), but until there is need to optimize, a simple implementation is probably best.

Additional info:
upstream: https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2299

Comment 3 Erik Erlandson 2011-09-12 21:53:25 UTC
REPRO/TEST

There is no functional change made by this fix. The negotiator should work exactly as before. I exercised the fix with the following functional test for HGQ (also attached as a tarball):

Configuration:

NEGOTIATOR_DEBUG = D_FULLDEBUG
SCHEDD_INTERVAL	= 15
NEGOTIATOR_USE_SLOT_WEIGHTS = FALSE

NUM_CPUS = 40

# turn on round-robin for this test, since we're testing
# an overlapping-effective-pool scenario
HFS_ROUND_ROBIN_RATE = 1.0

# turn on multiple allocation rounds, since we're testing
# slot rejection scenario as well
HFS_MAX_ALLOCATION_ROUNDS = 3

GROUP_NAMES = a, b, a.a, a.b, b.a, b.b

GROUP_QUOTA_DYNAMIC_a = 0.5
GROUP_QUOTA_DYNAMIC_b = 0.5
GROUP_QUOTA_DYNAMIC_a.a = 0.5
GROUP_QUOTA_DYNAMIC_a.b = 0.5
GROUP_QUOTA_DYNAMIC_b.a = 0.5
GROUP_QUOTA_DYNAMIC_b.b = 0.5

# Have "a" and "b" subtrees manage their quota independently
GROUP_AUTOREGROUP = TRUE
GROUP_AUTOREGROUP_a = FALSE
GROUP_AUTOREGROUP_b = FALSE

Submit file:

universe = vanilla
cmd = /bin/sleep
arguments = 10m
# group "a.a" should be able to use slots unused by "a.b"
# set up requirements to stay out of lower slots (used by b.a and b.b)
Requirements = (SlotID > 10)
+AccountingGroup = "a.a.user"
queue 25
# set up jobs for "a.b" to fail:
Requirements = False
+AccountingGroup = "a.b.user"
queue 25
# set up "b.a" and "b.b" to compete
Requirements = (SlotID <= 10)
+AccountingGroup = "b.a.user"
queue 25
+AccountingGroup = "b.b.user"
queue 25

Expected negotiator results (copied from original test README)

This test is designed to test behavior in two cases of negotiation failure:
1) A group cannot use all the slots it is allocated (in which case other should be able to use surplus)
2) Two (or more) groups are competing for same subset of slots, in which case neither should starve.

Best to begin this by nuking your accounting log.
Install "neg_failures.config" into your configuration, then start up condor.

Submit "neg_failures.submit":
% condor_submit neg_failures.submit
Submitting job(s)....................................................................................................
100 job(s) submitted to cluster 2.

You should see the following distribution: "b.a" and "b.b" are sharing their subset, and neither is starved.  "a.a" is able to u
se the surplus that "a.b" cannot use.  (note, there are 10 of the total 40 slots unused here, by design):

% qvhist AccountingGroup -c 'JobStatus == 2'
     20 a.a.user
      5 b.a.user
      5 b.b.user
     30 total

Comment 4 Erik Erlandson 2011-09-12 21:56:47 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause:
Calls to param_without_default() that did not free returned memory.

Consequence:
Memory leaks in the Negotiator.

Fix:
Replaced calls to param_without_default() with new memory-safe param_defined().

Result:
Memory leaks now plugged.

Comment 6 Tomas Rusnak 2011-10-21 15:19:42 UTC
Retested over all supported platforms x86,x86_64/RHEL5,RHEL6 with:

$CondorVersion: 7.6.4 Oct 14 2011 BuildID: RH-7.6.4-0.8

# qvhist AccountingGroup -c 'JobStatus == 2'
     20 a.a.user
      5 b.a.user
      5 b.b.user
     30 total

Same results on all platforms.Negotiator functionality compared to old condor:

# condor -v
$CondorVersion: 7.6.3 Jul 27 2011 BuildID: RH-7.6.3-0.3.el6 $
$CondorPlatform: X86_64-RedHat_6.1 $

# qvhist AccountingGroup -c 'JobStatus == 2'
     20 a.a.user
      5 b.a.user
      5 b.b.user
     30 total

Memory consumption before and after patch are too small in difference to compare them.

>>> VERIFIED

Comment 7 Tomas Capek 2011-11-16 15:11:45 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,11 +1 @@
-Cause:
+Previously, calls to the param_without_default() function that did not free returned memory resulted in memory leaks in the Negotiator grid daemon. With this update, the calls have been replaced with new, memory-safe param_defined() function, and the memory leaks no longer occur in the described scenario.-Calls to param_without_default() that did not free returned memory.
-
-Consequence:
-Memory leaks in the Negotiator.
-
-Fix:
-Replaced calls to param_without_default() with new memory-safe param_defined().
-
-Result:
-Memory leaks now plugged.

Comment 8 errata-xmlrpc 2012-01-23 17:28:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2012-0045.html