Bug 1829447
Summary: | Build pods falling under NotTerminating quota scope | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Sergio G. <sgarciam> | |
Component: | Build | Assignee: | Gabe Montero <gmontero> | |
Status: | CLOSED ERRATA | QA Contact: | wewang <wewang> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 4.3.z | CC: | aos-bugs, gmontero, mfojtik, sbhavsar, scuppett, wzheng | |
Target Milestone: | --- | |||
Target Release: | 4.5.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
Cause: with the RunOnceDuration admission plugin disabled in 4.x, there is no longer automatic applying of activeDeadlineSeconds to build pods
Consequence: Incorrect behavior with quotas can occur, namely quota shouldn't be required as build pods have the RestartPolicy to Never and that makes them fall outside the NotTerminating scope.
Fix: the build controller is updated to provide an acceptable default for activeDeadlineSeconds given its design of being a run once pod that always terminates.
Result: NotTerminating scope with quotas are handled properly for builds, namely the quota shouldn't be required as build pods have the RestartPolicy to Never and that makes them fall outside the NotTerminating scope.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1835913 (view as bug list) | Environment: | ||
Last Closed: | 2020-07-13 17:32:51 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1835913 |
Description
Sergio G.
2020-04-29 14:39:03 UTC
Setting target release to current development version (4.5) for investigation. Where fixes (if any) are required/requested for prior versions, cloned BZs will be created when appropriate. I believe ResourceQuota is behaving as expected, though I do not fully understand what "the RestartNever policy is applied" means. I do not believe this refers to the standard restart policy that is assigned to pods. To work around this issue, you can set a cluster-wide BuildDefault that sets the memory limit to 2Gi [1]. Unfortunately this is a cluster-wide setting - for a specific namespace you can use LimitRanges [2]. [1] https://docs.openshift.com/container-platform/4.3/builds/build-configuration.html#builds-configuration-parameters_build-configuration [2] https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/memory-default-namespace/ Hi Adam. As far as I understand, the "the RestartNever policy is applied" means a RestartPolicy=Never in the pods. Following the breadcrumbs in our documentation, I found that there was a plugin named RunOnceDuration [1] which was used to include the default active deadline for run-once pods, like the build pods. Thats why the documentation was probably back in the time but it's not correct nowadays. What I'd like to know is: is this a bug in the product or in the documentation? If it's a bug in the documentation and we do require a limit in the buildConfigs, then let's do it. If it's a bug in the product which is not applying an activeDeadlineSeconds to build pods, then let's fix it. [1] https://docs.openshift.com/container-platform/3.6/admin_guide/managing_pods.html#configuring-the-run-once-duration-plug-in RunOnceDuration configuration is disabled since 4.0, the "RestartNever" is standard container restart policy (Always, Never, etc.). Moving back to devex team to decide whether they want to ship the RunOnceDuration admission as webhook admission plugin, which will involve creating new operator. >> RunOnceDuration configuration is disabled since 4.0, the "RestartNever" is standard container restart policy (Always, Never, etc.).
This means that, until the RunOnceDuration admission is back (or any other mean which makes the buildPods have a very long activeDeadlineSeconds) the documentation is wrong as build pods are falling under NotTerminating.
I'll probably document it in a KCS until you guys make a decission:
- implement any change to have such activeDeadlineSeconds in build pods
- decide that build pods aren't special at all and they must have a limit/quota as other pods, which can be set per build or cluster-wide using the build overrides.
Adding KCS for future cases https://access.redhat.com/solutions/5049021 Adding a long activeDeadlineSeconds makes the most sense IMO - build pods are designed to terminate. I propose 1 week as a fixed active deadline (604800 seconds). talked with Corey ... I'm going to take this one Verified in version: 4.5.0-0.nightly-2020-05-13-202437 Steps: $ cat <<EOF | oc create -f - apiVersion: v1 kind: ResourceQuota metadata: name: limit-memory spec: hard: limits.memory: 2Gi scopes: - NotTerminating EOF $ oc new-app https://github.com/openshift/nodejs-ex $ oc get builds NAME TYPE FROM STATUS STARTED DURATION nodejs-ex-1 Source Git@a096bd2 Complete About a minute ago 1m4s Will the fix backported to 4.4 and/or 4.3 ? Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |