Bug 1278232

Summary: if build fails to schedule because of quota, and pod count is reduced, build never automatically starts
Product: OpenShift Container Platform Reporter: Erik M Jacobs <ejacobs>
Component: BuildAssignee: Cesar Wong <cewong>
Status: CLOSED ERRATA QA Contact: Wenjing Zheng <wzheng>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.1.0CC: aos-bugs, bleanhar, bparees, dmcphers, haowang, jokerman, mmccomas, pruan
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-01-26 19:16:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Erik M Jacobs 2015-11-05 02:06:36 UTC
atomic-openshift-3.0.2.906-1.git.0.7bfe56b.el7aos.x86_64
atomic-openshift-clients-3.0.2.906-1.git.0.7bfe56b.el7aos.x86_64
atomic-openshift-master-3.0.2.906-1.git.0.7bfe56b.el7aos.x86_64
atomic-openshift-node-3.0.2.906-1.git.0.7bfe56b.el7aos.x86_64
atomic-openshift-sdn-ovs-3.0.2.906-1.git.0.7bfe56b.el7aos.x86_64
atomic-openshift-utils-3.0.7-1.git.48.75d357c.el7aos.noarch
openshift-ansible-3.0.7-1.git.48.75d357c.el7aos.noarch
openshift-ansible-filter-plugins-3.0.7-1.git.48.75d357c.el7aos.noarch
openshift-ansible-lookup-plugins-3.0.7-1.git.48.75d357c.el7aos.noarch
openshift-ansible-playbooks-3.0.7-1.git.48.75d357c.el7aos.noarch
openshift-ansible-roles-3.0.7-1.git.48.75d357c.el7aos.noarch
tuned-profiles-atomic-openshift-node-3.0.2.906-1.git.0.7bfe56b.el7aos.x86_64



[joe@ose3-master ~]$ oc get build
NAME        TYPE      FROM         STATUS                           STARTED              DURATION
example-1   Source    Git@master   Complete                         About a minute ago   44s
example-2   Source    Git@master   Pending (CannotCreateBuildPod)                        
[joe@ose3-master ~]$ oc describe build example-2
Name:           example-2
Created:        8 seconds ago
Labels:         app=example,buildconfig=example,openshift.io/build-config.name=example
Annotations:    openshift.io/build.number=2
Build Config:   example
Duration:       waiting for 8s
Build Pod:      example-2-build
Strategy:       Source
Source Type:    Git
URL:            https://github.com/openshift/sinatra-example
Ref:            master
From Image:     DockerImage registry.access.redhat.com/openshift3/ruby-20-rhel7:latest
Output to:      ImageStreamTag example:latest
Push Secret:    builder-dockercfg-tl2ta
Status:         Pending (Failed to create build pod: Pod "example-2-build" is forbidden: limited to 3 pods.)
Events:
  FirstSeen     LastSeen        Count   From                    SubobjectPath   Reason                  Message
  ─────────     ────────        ─────   ────                    ─────────────   ──────                  ───────
  8s            8s              1       {build-controller }                     failedCreate            Error creating: Pod "example-2-build" is forbidden: limited to 3 pods
  8s            8s              1       {build-controller }                     HandleBuildError        Build has error: failed to create build pod: Pod "example-2-build" is forbidden: limited to 3 pods

Comment 1 Erik M Jacobs 2015-11-05 02:07:30 UTC
[joe@ose3-master ~]$ oc get pod
NAME              READY     STATUS      RESTARTS   AGE
example-1-build   0/1       Completed   0          5m
example-1-m9b2a   1/1       Running     0          3m
[joe@ose3-master ~]$ oc describe quota sinatra-quota
Name:                   sinatra-quota
Namespace:              sinatra
Resource                Used    Hard
--------                ----    ----
cpu                     10m     500m
memory                  100Mi   512Mi
pods                    1       3
replicationcontrollers  1       3
resourcequotas          1       1
services                1       3


so, currently quota is 3 pods but only 1 exists. Build has been pending for 3+ minutes.

Comment 3 Erik M Jacobs 2015-11-05 02:45:36 UTC
In a related issue, if my pod limit is 3 and I have a database+frontend application (for example, the quickstart-keyvalue application from Origin), the following scenario occurs:

* Instantiate template
* Database comes up
* Build starts
* Build completes
* Deployment tries to happen
* Deployer fails due to pod limit

3m          3m         1         ruby-sample-build-1-build   Pod                     implicitly required container POD           Killing            {kubelet ose3-node1.example.com}   Killing with docker id 87ec4
b3d6e9d
3m          3m         1         frontend-1-deploy           Pod                     spec.containers{deployment}                 Started            {kubelet ose3-node2.example.com}   Started with docker id 698c6
f6c91c5
2m          2m         1         frontend-1-h27dz            Pod                     implicitly required container POD           Created            {kubelet ose3-node1.example.com}   Created with docker id f63b7
9a0fed7
2m          2m         1         frontend-1                  ReplicationController                                               SuccessfulCreate   {replication-controller }          Created pod: frontend-1-h27d
z
2m          2m         1         frontend-1-h27dz            Pod                                                                 Scheduled          {scheduler }                       Successfully assigned fronte
nd-1-h27dz to ose3-node1.example.com
2m          2m         1         frontend-1-h27dz            Pod                     implicitly required container POD           Pulled             {kubelet ose3-node1.example.com}   Container image "openshift3/
ose-pod:v3.0.2.906" already present on machine
2m          2m         1         frontend-1-h27dz            Pod                     spec.containers{ruby-helloworld}            Pulling            {kubelet ose3-node1.example.com}   pulling image "172.30.129.15
5:5000/quickstart/ruby-sample@sha256:e838ff8ae3ae89f69f1cd647c0fa8793e8a344cfcb59fb49e88b2bd52fa82c18"
2m          2m         1         frontend-1-h27dz            Pod                     implicitly required container POD           Started            {kubelet ose3-node1.example.com}   Started with docker id f63b7
9a0fed7
2m          2m         1         frontend-1-h27dz            Pod                     spec.containers{ruby-helloworld}            Created            {kubelet ose3-node1.example.com}   Created with docker id 64edd
f6a4638
2m          2m         1         frontend-1-h27dz            Pod                     spec.containers{ruby-helloworld}            Pulled             {kubelet ose3-node1.example.com}   Successfully pulled image "1
72.30.129.155:5000/quickstart/ruby-sample@sha256:e838ff8ae3ae89f69f1cd647c0fa8793e8a344cfcb59fb49e88b2bd52fa82c18"
2m          2m         1         frontend-1-h27dz            Pod                     spec.containers{ruby-helloworld}            Started            {kubelet ose3-node1.example.com}   Started with docker id 64edd
f6a4638
2m          1m         6         frontend-1                  ReplicationController                                               FailedCreate       {replication-controller }          Error creating: Pod "fronten
d-1-" is forbidden: limited to 3 pods
46s         46s        1         frontend-1-deploy           Pod                                                                 FailedSync         {kubelet ose3-node2.example.com}   Error syncing pod, skipping:
 failed to delete containers ([exit status 1])
46s         46s        1         frontend-1-deploy           Pod                     implicitly required container POD           Killing            {kubelet ose3-node2.example.com}   Killing with docker id 1e73b
8b04de0
43s         43s        1         frontend-1-h27dz            Pod                     spec.containers{ruby-helloworld}            Killing            {kubelet ose3-node1.example.com}   Killing with docker id 64edd
f6a4638
43s         43s        1         frontend-1-h27dz            Pod                     implicitly required container POD           Killing            {kubelet ose3-node1.example.com}   Killing with docker id f63b7
9a0fed7
^C[joe@ose3-master ~]$ oc get pod
NAME                        READY     STATUS      RESTARTS   AGE
database-1-qnppj            1/1       Running     0          5m
frontend-1-deploy           0/1       Error       0          3m
ruby-sample-build-1-build   0/1       Completed   0          5m


I feel like this should've worked (database + deployer + frontend = 3), but it didn't... so something else is going on here, too.

Either way, because the deployer failed, we seem to be in the same situation as the build. We don't retry.

Comment 4 Paul Weil 2015-11-05 14:29:17 UTC
I wouldn't expect the failed deployment to keep retrying. 

Derek, what am I missing about the quota piece that is causing it to deny here?

Comment 5 Andy Goldstein 2015-11-05 16:31:57 UTC
This is an issue with the build code. It's setting the phase to Pending when the build pod fails to create, which it should not be doing. Reassigning to Ben.

Comment 6 Ben Parees 2015-11-05 16:35:45 UTC
marking upcoming release, given where we're at in the release this there is no room to get this in and you can work around it by just creating a new build.

Comment 7 Erik M Jacobs 2015-11-05 17:37:28 UTC
Is this the kind of thing that should go in release notes as a "known issue" with documented work-around?

Comment 8 Ben Parees 2015-11-05 17:56:05 UTC
yeah i'll add an entry.

Comment 9 Ben Parees 2016-01-04 15:30:38 UTC
fixed here:
https://github.com/openshift/origin/pull/5743

Comment 10 Wang Haoran 2016-01-05 05:09:54 UTC
QE verified with origin:
devenv-rhel7_3063 
and OSE version:
openshift v3.1.1.0
kubernetes v1.1.0-origin-1107-g4c8e6f4

1. When the pods failed schedule with quota , the build status is NEW
 [root@ip-172-18-3-90 ~]# oc get builda
NAME                  TYPE      FROM      STATUS                       STARTED   DURATION
ruby-sample-build-1   Source    Git       New (CannotCreateBuildPod)       
2. update the quota, and increase the pods NO. , the build is handled correctly
[root@ip-172-18-3-90 ~]# oc get pod
NAME                        READY     STATUS    RESTARTS   AGE
ruby-sample-build-1-build   1/1       Running   0          2m

Comment 12 errata-xmlrpc 2016-01-26 19:16:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:0070