Bug 1358497
| Summary: | [RFE] Ability to define global resource quotas for builders | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Brennan Vincello <bvincell> |
| Component: | RFE | Assignee: | Ben Parees <bparees> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Wang Haoran <haowang> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 3.2.1 | CC: | aos-bugs, bparees, cewong, erich, jokerman, kurktchiev, mmccomas |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-06-29 01:00:49 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Brennan Vincello
2016-07-20 20:24:15 UTC
Just to add a bit more context to this bug. Here is the problem we are facing: A) Set a default project template which includes our desired limits (min/max/default requests) B) Builder pod starts up and grabs the default request sizes C) Builder pod never expands its resources to the min/max, which means that the measly default requests we set are now being used to build JBoss applications (as an example). D) Builds either fail or complete in astronomical times (think 15 minutes for a build that takes 20 seconds in an unlimited project), due to resource starvation. Our default requests are set to 100meg ram and 100m for CPU (suppose the equivalent of 100mhz). As far as I can see the two solutions are: 1. Allow a special builder pod section in the limits spec 2. Make the builder pod use the min/max out of the box. As Brennan has pointed out in the SC attached to the bug, there are API ways to possibly force the builder pod to get more resources, but I am unable to sit and monitor any/all projects that are created in our environment and then give them the needed bits to make the builders work as expected, neither can I just give everyone unlimited resource access. One possible solution would be to have the BuildDefaults plugin set the resource request/limit specifically for build pods. This can be configured globally and would only apply to builds. (In reply to Cesar Wong from comment #2) > One possible solution would be to have the BuildDefaults plugin set the > resource request/limit specifically for build pods. This can be configured > globally and would only apply to builds. https://docs.openshift.org/latest/install_config/build_defaults_overrides.html does not show how to set resource limits. However I assume it is possible? Any reason we are not re-looking at https://trello.com/c/C847S5AQ/791-3-enforce-cgroup-constraints-on-build-containers for this? That card has no bearing on this. We've already addressed enforcing resource constraints on builds, we just did it differently than how that card proposes doing it. But that's unrelated to the idea of having a separate quota for resources consumed by builds vs resources consumed by long running pods. BuildDefaults does not offer a way to set resource limits currently, i think Cesar was suggesting that as an enhancement. Would that satisfy the requirement here? I'm not sure how you're going to universally set an appropriate resource limit that's low enough to solve the concern (too many unused resources being assigned to builds) and high enough that it works for most people's builds (because it's not, everyone's going to have to manually override the default w/ the limit they need). There is also the separate issue of having a separate quota for builds (or more generally for short-lived pods) which is something you can already do. Any pod with an activeterminationdeadline (which includes any build that has a completiondeadline set) can be made subject to a distinct quota. So another path is to make it possible to provide a default completiondeadline for builds. it's not clear which of those two would satisfy the request here, it's likely both are useful. Well this is also why in Comment #1 i was asking if it was possible for the build pod to actually respect and use the limits that are set globally outside of the Default section. e.g. default: cpu: "100m" memory: "100Mi" defaultRequest: cpu: "100m" memory: "100Mi" The even though container and pod have larger limits to work with, the build pod only grabs the above default allocations and never grows/shrinks. So to me this is a mix of bug/rfe. Ultimately here I would like to set the limit spec and have all pods builder/deployer/long running ones respect said limits. Though there is something to be said about having a defaultrequest section that does/doesnt not overwrite an actual min/max section. Maybe thats more to do with the explanation and usage of each one of those... Anyway, hopefully that makes more sense from "needs" perspective, from "feasibility" perspective I got nada I don't follow your statement about "container and pod have larger limits to work with" build pods get the same limits assigned as any other pod which doesn't itself define a resource limit, and they respect those limits. if you want the build pods to get a different resource limit than what a "normal" pod would get, then that is the build defaulter solution Cesar proposed. Ok so here are the limits I impose across the cluster:
- apiVersion: "v1"
kind: "LimitRange"
metadata:
name: "default-limits"
spec:
limits:
-
type: "Pod"
max:
cpu: "1"
memory: "512Mi"
min:
cpu: "40m"
memory: "6Mi"
-
type: "Container"
max:
cpu: "1"
memory: "512Mi"
min:
cpu: "40m"
memory: "6Mi"
default:
cpu: "100m"
memory: "100Mi"
defaultRequest:
cpu: "100m"
memory: "100Mi"
maxLimitRequestRatio:
cpu: "10"
When a build pod gets created and gets going it requests 100m CPU and 100Mi memory and never switches/expands to the full CPU or 512mb allowed.
Sorry hit enter too soon. So if it truly did expand and use the resources it is allowed to use then I wouldnt be running into the problems I am which are described in comment 1 That's not how limit range works. Limit range is enforcing a limit on how much memory/cpu a user can request when they define a pod. What your limit range is declaring is: 1) pods that specify no limits get 100 megs 100 millicore as their limits 2) pods that specify anything between 6megs/40 millicore and 512megs/1 core will be accepted 3) pods that specify anything outside that range will be rejected. It does not mean "start the pod with a limit of 100mi and let it grow to 512mi", that's what a limit of 512mi does. (just because you request a limit of 512megs doesn't mean you are immediately using 512megs, it just means that's the most you can use). (In reply to Ben Parees from comment #11) > That's not how limit range works. > > Limit range is enforcing a limit on how much memory/cpu a user can request > when they define a pod. > > What your limit range is declaring is: > > 1) pods that specify no limits get 100 megs 100 millicore as their limits > 2) pods that specify anything between 6megs/40 millicore and 512megs/1 core > will be accepted > 3) pods that specify anything outside that range will be rejected. > > It does not mean "start the pod with a limit of 100mi and let it grow to > 512mi", that's what a limit of 512mi does. (just because you request a > limit of 512megs doesn't mean you are immediately using 512megs, it just > means that's the most you can use). Then I have been reading the documentation all wrong for the last year and a half :/ but even then though, the build pod should be able to be told to use max available by default or do what we have been talking about and have special limits that only apply to them I've updated the trello card tracking this RFE to propose adding resource limits to the set of things the build defaulter can default. That would enable a cluster admin to set default resource limits for all builds in the cluster which would operate independent of cluster/project default limits for other pods. Feel free to open a separate bug proposing updates to the limit range documentation to make it clearer what they do. Ryan Howe opened a bug for me on the documentation aspect of this https://bugzilla.redhat.com/show_bug.cgi?id=1376214 The trello card update seems to be what I am requesting as far as build pod limits are concerned so that works for me on that front as well. Thanks Ben... sorry for my documentation misreading/understanding :/ |