Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 618959

Summary: Number of Concurrency limit for parallel universe jobs
Product: Red Hat Enterprise MRG Reporter: Lubos Trilety <ltrilety>
Component: condorAssignee: Matthew Farrellee <matt>
Status: CLOSED NOTABUG QA Contact: MRG Quality Engineering <mrgqe-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 1.0CC: matt
Target Milestone: 2.0   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-02-08 14:27:18 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
SchedLog none

Description Lubos Trilety 2010-07-28 08:04:29 UTC
Created attachment 434944 [details]
SchedLog

Description of problem:
The concurrency limit should be resolved per job not per resource request.
E.g. at the moment the parallel universe job with 'concurrency_limits=license1:2' and 'machine_count = 2' needs 4 license1.

Version-Release number of selected component (if applicable):
condor-7.4.4-0.7

How reproducible:
100%

Steps to Reproduce:
1. set license_LIMIT=2, configure dedicated scheduler
2. submit parallel universe job with parameters 'concurrency_limits=license1:2' and 'machine_count = 2'
  
Actual results:
Job doesn't run
See SchedLog for 'concurrency limit reached' error

Expected results:
Job should run successfully 


Additional info:

Comment 1 Matthew Farrellee 2011-02-08 14:27:18 UTC
The current semantics have limits summed from each slot running a job. That means a PU job will count the limits for each slot claimed with the machine_count (slot count, really).

Special case code would be required, probably in condor_submit, to make this behave differently.

Options -
 1) a single job in the cluster would be given the concurrency limits
  - Won't work well since matching should be consistent across all jobs in the cluster. It would be a problem if the limit was hit and one job could not match while all others could.
 2) use the floating point nature of the limit and give each job a fraction of the requested limits
  - This can also be done without condor_submit assistance, e.g. concurrency_limits=license:1

The primary use case of concern is a cluster of jobs that needs to use only a single license. This is not impossible, limits can be fractional, e.g. machine_count = 4, concurrency_limits=license:0.25 will give the job cluster a single license. The scheduling interaction here is potentially complex.

I'm going to close this as not a bug, it behaves as intended. A user can specify fractional limits for their PU jobs.

If documentation isn't clear on this, please file a Doc bug.

If you'd really like to see condor_submit do the limit math for you, file an RFE.