Bug 794660
| Summary: | Partitionable slots can create more dynamic slots than CPUs | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Pavel Moravec <pmoravec> | ||||
| Component: | condor | Assignee: | Timothy St. Clair <tstclair> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Lubos Trilety <ltrilety> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 2.1 | CC: | jneedle, ltoscano, ltrilety, matt, mkudlej, tstclair | ||||
| Target Milestone: | 2.2 | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | done | ||||||
| Fixed In Version: | condor-7.6.5-0.15 | Doc Type: | Bug Fix | ||||
| Doc Text: |
C: Under certain conditions a partitionable slot can split into too many dynamic slots.
C: The machine could potentially be oversubscribed.
F: Add logic to prevent a partitionable slot from splitting more then the resources it has available to it.
R: The machine should not be oversubscribed.
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2012-09-19 17:42:50 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 828434 | ||||||
| Attachments: |
|
||||||
|
Description
Pavel Moravec
2012-02-17 08:50:25 UTC
Is the scenario really unknown? Any new clue about the conditions when this bug can show up? Best insight is in the dedicated scheduler.
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
C: Under certain conditions a partitionable slot can split into too many dynamic slots.
C: The machine could potentially be oversubscribed.
F: Add logic to prevent a partitionable slot from splitting more then the resources it has available to it.
R: The machine should not be oversubscribed.
If I understand condor ticket #2816, the issue seems to be 100% reproducible. According to condor ticket #204 the problem was seen "sporadically". What is the realistic expectation about how much is reproducible? "This is because the requirements expression in the slot ad is not properly evaluated." One would need to construct a slot_ad such that it caused a match but failed to evaluate after the claim has been given and during the split process. The only thing I can think of is to insert an if-then clause in the requirements expression which causes it to fail *only* when it's evaluated on the startd. Could you please specify more precisely how to reproduce this bug? Exactly what type of ifThenElse clause can cause the bug to happen? if then else on a attribute which only exists on the startd, but is not present in the ad published to the collector. (In reply to comment #15) > if then else on a attribute which only exists on the startd, but is not > present in the ad published to the collector. OK, that much was clear. But I am aware only about those attributes which are published and I don't want to parse source code for others. Could you please write specific example of the if-then-else clause which fulfils these requirements? in submission: Requirements = ifThenElse( PithyRetort =!= UNDEFINED, FALSE, TRUE) only on startd: PithyRetort = TRUE And make sure that PithyRetort is not part of: http://research.cs.wisc.edu/condor/manual/v7.8/3_3Configuration.html#18154 The suggested scenario doesn't reproduce the bug. Currently we aren't able to reproduce it. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-1278.html |