Red Hat Bugzilla – Bug 850392
RFE Update Hunting+Splitting+Defaults algorithm
Last modified: 2013-03-06 13:45:14 EST
The system defaults for 7.8 series have been updated to in an effort to better utilize system resources around a jobs memory constraints.
The following areas have been affected:
1.) Job submission will still use RequestMemory and default it to ImageSize on the first pass (to not break backwards compatibility)
2.) After the 1st job has been run MemoryUsage will be updated in the JobAd so subsequent jobs use the RSS value (default) for future matching. MemoryUsage can also be an expression which is set in the case of jobs which would prefer PSS, but this requires extra validation.
3.) During matchmaking the negotiator will use either RequestMemory or MemoryUsage, whichever one is greater.
4.) When the claim is being activated the startd will evaluate MODIFY_REQUEST_EXPR_* to modify in the incoming requests so that the requests are quantized into reusable sized chucks to maximize claim reuse (http://research.cs.wisc.edu/condor/manual/v7.9/3_12Setting_Up.html#36604)
The upstream parent ticket has been added and has a breakout for the sub-elements listed above.
The testing for this is quite involved.
It requires testing the following according to version(s):
001 - old submit, old schedd, new exec
010 - old submit, new schedd, old exec
011 - old submit, new schedd, new exec
100 - new submit, old schedd, old exec (not realistic)
101 - new submit(remote), old schedd, new exec
110 - new submit, new schedd, old exec
111 - all new.
++ It requires potential feedback from the field on examples of existing RequestMemory expressions to ensure compatibility.
++ MemoryUsage = expr referencing PSS
*** Bug 845569 has been marked as a duplicate of this bug. ***
Can you please provide few examples (>=2) where the behavior changed (what changed, configuration, etc)?
Request is to come with use case difference from an administration and daemon perspective.
From the administrators perspective, the behavior has changed in the following way:
When a job is initially submitted everything appears similar to as it had before, with slight modifications on the auto-filled data (https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2835), and it is advised that the user specify the Request* variables and not overwrite the Requirements expr. Once it is matched, the startd will evaluate the RequestMemory Expression according to its MODIFY_REQUEST_EXPR_* and modify the initial request to better carve the partitionable slot for reuse. In this way the behavior has changed, as the RequestMemory is usually a lower bound on what will be supplied. Once the job has completed its run it will update the attribute MemoryUsage with the value of RSS which will be an upper bound. This MemoryUsage is referenced in the default requirements expression.
Re comment #7, methods for verification could include:
1. post job submission check the Requirements expression via condor_q -long and verify the differences between the two versions.
2. One the jobs have been match and landed on partitionable slot run condor_status -long and verify that the slot has been split on quantized boundaries.
3. Once a job has completed verify via condor_q -long that MemoryUsage has been updated with RSS.
Verified (thanks to mkudlej for most of the work) on supported configuration (RHEL5.9/6.4 beta, i386/x86_64):
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.