Bug 850392

Summary: RFE Update Hunting+Splitting+Defaults algorithm
Product: Red Hat Enterprise MRG Reporter: Timothy St. Clair <tstclair>
Component: condorAssignee: Timothy St. Clair <tstclair>
Status: CLOSED ERRATA QA Contact: Luigi Toscano <ltoscano>
Severity: high Docs Contact:
Priority: high    
Version: DevelopmentCC: dkinon, dryan, ltoscano, matt, mkudlej, rrati, tstclair
Target Milestone: 2.3Keywords: FutureFeature, TestOnly
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: condor-7.8.2-0.1 Doc Type: Rebase: Enhancements Only
Doc Text:
Important: if this rebase also contains *bug fixes* (or contains only bug fixes), select the correct option from the Doc Type drop-down list. Rebase package(s) to version: condor-7.8 series Highlights and notable enhancements: The default settings have been updated to better allow for accurate matching and reuse of partitionable slots. The defaults have been changed for job submission, and execute resources. The enhancements enable more accurate memory tracking of jobs and higher resource utilization.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-03-06 13:45:14 EST Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Bug Depends On:    
Bug Blocks: 845292    

Description Timothy St. Clair 2012-08-21 10:21:54 EDT
The system defaults for 7.8 series have been updated to in an effort to better utilize system resources around a jobs memory constraints.  

The following areas have been affected: 
1.) Job submission will still use RequestMemory and default it to ImageSize on the first pass (to not break backwards compatibility)
2.) After the 1st job has been run MemoryUsage will be updated in the JobAd so subsequent jobs use the RSS value (default) for future matching.  MemoryUsage can also be an expression which is set in the case of jobs which would prefer PSS, but this requires extra validation.
3.) During matchmaking the negotiator will use either RequestMemory or MemoryUsage, whichever one is greater.  
4.) When the claim is being activated the startd will evaluate MODIFY_REQUEST_EXPR_* to modify in the incoming requests so that the requests are quantized into reusable sized chucks to maximize claim reuse (http://research.cs.wisc.edu/condor/manual/v7.9/3_12Setting_Up.html#36604) 

The upstream parent ticket has been added and has a breakout for the sub-elements listed above.
Comment 3 Timothy St. Clair 2012-08-22 12:23:38 EDT
The testing for this is quite involved.  

It requires testing the following according to version(s):

state space:
001 - old submit, old schedd, new exec
010 - old submit, new schedd, old exec
011 - old submit, new schedd, new exec
100 - new submit, old schedd, old exec (not realistic) 
101 - new submit(remote), old schedd, new exec
110 - new submit, new schedd, old exec
111 - all new.

++ It requires potential feedback from the field on examples of existing RequestMemory expressions to ensure compatibility.  

++ MemoryUsage = expr referencing PSS
Comment 4 Timothy St. Clair 2012-09-06 12:46:18 EDT
*** Bug 845569 has been marked as a duplicate of this bug. ***
Comment 5 Luigi Toscano 2012-09-18 08:53:12 EDT
Can you please provide few examples (>=2) where the behavior changed (what changed, configuration, etc)?
Comment 6 Timothy St. Clair 2012-09-18 10:59:45 EDT
Request is to come with use case difference from an administration and daemon perspective.
Comment 7 Timothy St. Clair 2012-09-19 11:00:36 EDT
From the administrators perspective, the behavior has changed in the following way:

When a job is initially submitted everything appears similar to as it had before, with slight modifications on the auto-filled data (https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2835), and it is advised that the user specify the Request* variables and not overwrite the Requirements expr.  Once it is matched, the startd will evaluate the RequestMemory Expression according to its MODIFY_REQUEST_EXPR_*  and modify the initial request to better carve the partitionable slot for reuse.  In this way the behavior has changed, as the RequestMemory is usually a lower bound on what will be supplied.  Once the job has completed its run it will update the attribute MemoryUsage with the value of RSS which will be an upper bound.  This MemoryUsage is referenced in the default requirements expression.
Comment 8 Timothy St. Clair 2012-10-01 11:00:30 EDT
Re comment #7, methods for verification could include:

1. post job submission check the Requirements expression via condor_q -long and verify the differences between the two versions.

2. One the jobs have been match and landed on partitionable slot run condor_status -long and verify that the slot has been split on quantized boundaries. 

3. Once a job has completed verify via condor_q -long that MemoryUsage has been updated with RSS.
Comment 17 Luigi Toscano 2013-02-08 09:37:35 EST
Verified (thanks to mkudlej for most of the work) on supported configuration (RHEL5.9/6.4 beta, i386/x86_64):
condor-7.8.8-0.4.1
Comment 19 errata-xmlrpc 2013-03-06 13:45:14 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0564.html