Bug 734213
Summary: | memory leaks in negotiator - call to param_without_default() | ||
---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Erik Erlandson <eerlands> |
Component: | condor | Assignee: | Erik Erlandson <eerlands> |
Status: | CLOSED ERRATA | QA Contact: | Tomas Rusnak <trusnak> |
Severity: | medium | Docs Contact: | |
Priority: | high | ||
Version: | 2.0 | CC: | jthomas, ltoscano, matt, mkudlej, trusnak |
Target Milestone: | 2.1 | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | 7.6.4-0.3 | Doc Type: | Bug Fix |
Doc Text: |
Previously, calls to the param_without_default() function that did not free returned memory resulted in memory leaks in the Negotiator grid daemon. With this update, the calls have been replaced with new, memory-safe param_defined() function, and the memory leaks no longer occur in the described scenario.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2012-01-23 17:28:55 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 743350 |
Description
Erik Erlandson
2011-08-29 18:38:30 UTC
REPRO/TEST There is no functional change made by this fix. The negotiator should work exactly as before. I exercised the fix with the following functional test for HGQ (also attached as a tarball): Configuration: NEGOTIATOR_DEBUG = D_FULLDEBUG SCHEDD_INTERVAL = 15 NEGOTIATOR_USE_SLOT_WEIGHTS = FALSE NUM_CPUS = 40 # turn on round-robin for this test, since we're testing # an overlapping-effective-pool scenario HFS_ROUND_ROBIN_RATE = 1.0 # turn on multiple allocation rounds, since we're testing # slot rejection scenario as well HFS_MAX_ALLOCATION_ROUNDS = 3 GROUP_NAMES = a, b, a.a, a.b, b.a, b.b GROUP_QUOTA_DYNAMIC_a = 0.5 GROUP_QUOTA_DYNAMIC_b = 0.5 GROUP_QUOTA_DYNAMIC_a.a = 0.5 GROUP_QUOTA_DYNAMIC_a.b = 0.5 GROUP_QUOTA_DYNAMIC_b.a = 0.5 GROUP_QUOTA_DYNAMIC_b.b = 0.5 # Have "a" and "b" subtrees manage their quota independently GROUP_AUTOREGROUP = TRUE GROUP_AUTOREGROUP_a = FALSE GROUP_AUTOREGROUP_b = FALSE Submit file: universe = vanilla cmd = /bin/sleep arguments = 10m # group "a.a" should be able to use slots unused by "a.b" # set up requirements to stay out of lower slots (used by b.a and b.b) Requirements = (SlotID > 10) +AccountingGroup = "a.a.user" queue 25 # set up jobs for "a.b" to fail: Requirements = False +AccountingGroup = "a.b.user" queue 25 # set up "b.a" and "b.b" to compete Requirements = (SlotID <= 10) +AccountingGroup = "b.a.user" queue 25 +AccountingGroup = "b.b.user" queue 25 Expected negotiator results (copied from original test README) This test is designed to test behavior in two cases of negotiation failure: 1) A group cannot use all the slots it is allocated (in which case other should be able to use surplus) 2) Two (or more) groups are competing for same subset of slots, in which case neither should starve. Best to begin this by nuking your accounting log. Install "neg_failures.config" into your configuration, then start up condor. Submit "neg_failures.submit": % condor_submit neg_failures.submit Submitting job(s).................................................................................................... 100 job(s) submitted to cluster 2. You should see the following distribution: "b.a" and "b.b" are sharing their subset, and neither is starved. "a.a" is able to u se the surplus that "a.b" cannot use. (note, there are 10 of the total 40 slots unused here, by design): % qvhist AccountingGroup -c 'JobStatus == 2' 20 a.a.user 5 b.a.user 5 b.b.user 30 total Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: Calls to param_without_default() that did not free returned memory. Consequence: Memory leaks in the Negotiator. Fix: Replaced calls to param_without_default() with new memory-safe param_defined(). Result: Memory leaks now plugged. Retested over all supported platforms x86,x86_64/RHEL5,RHEL6 with:
$CondorVersion: 7.6.4 Oct 14 2011 BuildID: RH-7.6.4-0.8
# qvhist AccountingGroup -c 'JobStatus == 2'
20 a.a.user
5 b.a.user
5 b.b.user
30 total
Same results on all platforms.Negotiator functionality compared to old condor:
# condor -v
$CondorVersion: 7.6.3 Jul 27 2011 BuildID: RH-7.6.3-0.3.el6 $
$CondorPlatform: X86_64-RedHat_6.1 $
# qvhist AccountingGroup -c 'JobStatus == 2'
20 a.a.user
5 b.a.user
5 b.b.user
30 total
Memory consumption before and after patch are too small in difference to compare them.
>>> VERIFIED
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,11 +1 @@ -Cause: +Previously, calls to the param_without_default() function that did not free returned memory resulted in memory leaks in the Negotiator grid daemon. With this update, the calls have been replaced with new, memory-safe param_defined() function, and the memory leaks no longer occur in the described scenario.-Calls to param_without_default() that did not free returned memory. - -Consequence: -Memory leaks in the Negotiator. - -Fix: -Replaced calls to param_without_default() with new memory-safe param_defined(). - -Result: -Memory leaks now plugged. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2012-0045.html |