Description of problem: According to documentation [1]: - NotTerminating scope in quotas matches pods where spec.activeDeadlineSeconds is nil. - Build pods will fall under NotTerminating unless the RestartNever policy is applied. Version-Release number of selected component (if applicable): 4.3 How reproducible: Always Steps to Reproduce: $ oc new-project test-quota $ cat <<EOF | oc create -f - apiVersion: v1 kind: ResourceQuota metadata: name: limit-memory spec: hard: limits.memory: 2Gi scopes: - NotTerminating EOF $ oc new-app https://github.com/openshift/nodejs-ex Actual results: $ oc get builds NAME TYPE FROM STATUS STARTED DURATION nodejs-ex-1 Source Git New (CannotCreateBuildPod) $ oc describe build/nodejs-ex-1 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedCreate 16s (x13 over 36s) build-controller Error creating build pod: pods "nodejs-ex-4-build" is forbidden: failed quota: limit-memory: must specify limits.memory Expected results: The quota shouldn't be required as build pods have the RestartPolicy to Never and that makes them fall outside the NotTerminating scope. References: [1] https://docs.openshift.com/container-platform/4.3/applications/quotas/quotas-setting-per-project.html
Setting target release to current development version (4.5) for investigation. Where fixes (if any) are required/requested for prior versions, cloned BZs will be created when appropriate.
I believe ResourceQuota is behaving as expected, though I do not fully understand what "the RestartNever policy is applied" means. I do not believe this refers to the standard restart policy that is assigned to pods. To work around this issue, you can set a cluster-wide BuildDefault that sets the memory limit to 2Gi [1]. Unfortunately this is a cluster-wide setting - for a specific namespace you can use LimitRanges [2]. [1] https://docs.openshift.com/container-platform/4.3/builds/build-configuration.html#builds-configuration-parameters_build-configuration [2] https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/memory-default-namespace/
Hi Adam. As far as I understand, the "the RestartNever policy is applied" means a RestartPolicy=Never in the pods. Following the breadcrumbs in our documentation, I found that there was a plugin named RunOnceDuration [1] which was used to include the default active deadline for run-once pods, like the build pods. Thats why the documentation was probably back in the time but it's not correct nowadays. What I'd like to know is: is this a bug in the product or in the documentation? If it's a bug in the documentation and we do require a limit in the buildConfigs, then let's do it. If it's a bug in the product which is not applying an activeDeadlineSeconds to build pods, then let's fix it. [1] https://docs.openshift.com/container-platform/3.6/admin_guide/managing_pods.html#configuring-the-run-once-duration-plug-in
RunOnceDuration configuration is disabled since 4.0, the "RestartNever" is standard container restart policy (Always, Never, etc.). Moving back to devex team to decide whether they want to ship the RunOnceDuration admission as webhook admission plugin, which will involve creating new operator.
>> RunOnceDuration configuration is disabled since 4.0, the "RestartNever" is standard container restart policy (Always, Never, etc.). This means that, until the RunOnceDuration admission is back (or any other mean which makes the buildPods have a very long activeDeadlineSeconds) the documentation is wrong as build pods are falling under NotTerminating. I'll probably document it in a KCS until you guys make a decission: - implement any change to have such activeDeadlineSeconds in build pods - decide that build pods aren't special at all and they must have a limit/quota as other pods, which can be set per build or cluster-wide using the build overrides.
Adding KCS for future cases https://access.redhat.com/solutions/5049021
Adding a long activeDeadlineSeconds makes the most sense IMO - build pods are designed to terminate. I propose 1 week as a fixed active deadline (604800 seconds).
talked with Corey ... I'm going to take this one
Verified in version: 4.5.0-0.nightly-2020-05-13-202437 Steps: $ cat <<EOF | oc create -f - apiVersion: v1 kind: ResourceQuota metadata: name: limit-memory spec: hard: limits.memory: 2Gi scopes: - NotTerminating EOF $ oc new-app https://github.com/openshift/nodejs-ex $ oc get builds NAME TYPE FROM STATUS STARTED DURATION nodejs-ex-1 Source Git@a096bd2 Complete About a minute ago 1m4s
Will the fix backported to 4.4 and/or 4.3 ?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409