Description of problem: When you try to schedule a build and hit your pod limit, the build status will show "new" forever. The "forbidden" quota error never percolates up to the build status. Version-Release number of selected component (if applicable): tuned-profiles-openshift-node-3.0.1.0-1.git.525.eddc479.el7ose.x86_64 openshift-node-3.0.1.0-1.git.525.eddc479.el7ose.x86_64 openshift-3.0.1.0-1.git.525.eddc479.el7ose.x86_64 openshift-master-3.0.1.0-1.git.525.eddc479.el7ose.x86_64 openshift-sdn-ovs-3.0.1.0-1.git.525.eddc479.el7ose.x86_64 How reproducible: 100% Steps to Reproduce: 1. Set up a project with a quota limit on pods 2. try to schedule a build such that the pod limit is exceeded 3. Actual results: build shows "new": [root@ose3-master ~]# oc get build NAME TYPE STATUS POD ruby-example-1 Source Complete ruby-example-1-build ruby-example-3 Source New ruby-example-3-build Expected results: Build should show failed or some other status. Additional info: [root@ose3-master ~]# oc start-build ruby-example; oc describe build ruby-example-3 ruby-example-3 Name: ruby-example-3 Created: 1 seconds ago Labels: app=ruby-example,buildconfig=ruby-example Build Config: ruby-example Status: New Duration: waiting for 1s Build Pod: ruby-example-3-build Strategy: Source Image Reference: DockerImage registry.access.redhat.com/openshift3/ruby-20-rhel7:latest Source Type: Git URL: https://github.com/openshift/simple-openshift-sinatra-sti.git Output to: ImageStreamTag ruby-example:latest Push Secret: builder-dockercfg-5sky8 Events: FirstSeen LastSeen Count From SubobjectPath Reason Message Mon, 14 Sep 2015 10:31:04 -0400 Mon, 14 Sep 2015 10:31:04 -0400 1 {build-controller } failedCreate Error creating: Pod "ruby-example-3-build" is forbidden: Limited to 3 pods
Does it ultimately get created once your quota is available?
I have no idea. I assume so?
assuming so, i dont know what the bug is here? your build is in new state because there are no resources available to run it. describing the build tells you why it's not running. if you don't want to free up resources or wait for resources to free up, you can cancel the build. Failing the build immediately is not the right approach.
As a user, how would I ever know to describe the build? Are we expecting a user to say "gee, it's been several minutes and my build hasn't started, I should go look"? This is currently a sub-optimal user experience. We provide no feedback to the user when they either start the build (new-build) or when they ask about the build (get-build) that indicates there is a problem with the quota. Additionally, any external automation tool is going to have a hard time figuring out what's going on, because the tool would have to use an unexpected API call (describe -- is this even an API call?) to look at the event history and see that a particular event occured (forbidden). While you may not like the approach of failing the build, this is still a bug - the build's status ("can never be scheduled due to quota") is not accurately reflected in the "get" command. "New" may be true but is insufficient information.
I've changed the subject of the bug to more accurately reflect what's going on -- the information provided to the user with "get" is insufficient. We are expecting too much of the user to determine what's going on in this particular case.
if/when this is resolved, we should solve the entirety of this issue: https://github.com/openshift/origin/issues/3847
This PR should fix this BZ: https://github.com/openshift/origin/pull/4872
I've put some comments in the BZ.
@Erik, finally the error info also got into the output of `oc get builds`: https://github.com/openshift/origin/pull/4909 So now the build.status.Message and build.status.Reason reflect the error condition (including the one reported here), and those can be seen in `oc describe build` and `oc get build`. Please take a look.
Yeah that PR looks pretty good. I think it will address this BZ.
The build status will be marked to pending with message (CannotCreateBuildPod) after hit the project pods limits. But,after free up my resource, the pending build keeps pending with (CannotCreateBuildPod) IMO,it's not acceptable, right? #oc get builds NAME TYPE FROM STATUS STARTED DURATION ruby-sample-build-2 Source Git Failed 39 minutes ago 2m2s ruby-sample-build-3 Source Git Complete 36 minutes ago 3m1s ruby-sample-build-4 Source Git Pending (CannotCreateBuildPod) # oc describe project xiuwangquota Name: xiuwangquota Created: About an hour ago Labels: <none> Annotations: openshift.io/description= openshift.io/display-name= openshift.io/sa.scc.mcs=s0:c13,c12 openshift.io/sa.scc.supplemental-groups=1000180000/10000 openshift.io/sa.scc.uid-range=1000180000/10000 Display Name: <none> Description: <none> Status: Active Node Selector: <none> Quota: Name: quota Resource Used Hard -------- ---- ---- cpu 600m 1 memory 300Mi 750Mi pods 2 4 replicationcontrollers 2 10 resourcequotas 1 1 services 2 10 Resource limits: Name: limits Type Resource Min Max Default ---- -------- --- --- --- Pod cpu 10m 500m - Pod memory 5Mi 750Mi - Container memory 5Mi 750Mi 100Mi Container cpu 10m 500m 100m
Did the build ever enter the running state? I would expect the message to remain "Pending (CannotCreateBuildPod)" until the system recognizes resources are available and starts running the pod, at which point the state should change to "Running" if the pod never entered the "Running" state after resources became available, I think that's a scheduler issue.
The pending(CannotCreateBuildPod) build never entered the "Running" state after resources became available. But trigger a new build will be running if the resources are available.
Derek, who does this bug need to go to? Sounds like a node/scheduling problem.
Ben, I read the bug and it's not clear to me that we have shown that the build pod was ever actually created. Is there output that shows the build pod was in fact created? If so, we could look at events around that pod to know why it may not have been scheduled. Right now, best I can tell in the discussion is that no build pod was created for builds that were in this state. Thanks, Derek
Thanks Derek, you're right, we have a bug here in which we 1) update the build phase to Pending 2) hit an error creating the build pod (in this case a quota limit) 3) still end up committing the updated build object because we're trying to reflect the error from (2) but this also ends up reflecting the phase change from (1). We need to not update the build phase if an error occurs creating the build pod.
Turns out Cesar already has a pull that should fix this: https://github.com/openshift/origin/pull/5743
incidentally the "never schedules" portion of this bug is really a dupe of: https://bugzilla.redhat.com/show_bug.cgi?id=1278232
dupe bug , already verified from https://bugzilla.redhat.com/show_bug.cgi?id=1278232#c10