Bug 1278232
| Summary: | if build fails to schedule because of quota, and pod count is reduced, build never automatically starts | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Erik M Jacobs <ejacobs> |
| Component: | Build | Assignee: | Cesar Wong <cewong> |
| Status: | CLOSED ERRATA | QA Contact: | Wenjing Zheng <wzheng> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.1.0 | CC: | aos-bugs, bleanhar, bparees, dmcphers, haowang, jokerman, mmccomas, pruan |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-01-26 19:16:51 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Erik M Jacobs
2015-11-05 02:06:36 UTC
[joe@ose3-master ~]$ oc get pod NAME READY STATUS RESTARTS AGE example-1-build 0/1 Completed 0 5m example-1-m9b2a 1/1 Running 0 3m [joe@ose3-master ~]$ oc describe quota sinatra-quota Name: sinatra-quota Namespace: sinatra Resource Used Hard -------- ---- ---- cpu 10m 500m memory 100Mi 512Mi pods 1 3 replicationcontrollers 1 3 resourcequotas 1 1 services 1 3 so, currently quota is 3 pods but only 1 exists. Build has been pending for 3+ minutes. In a related issue, if my pod limit is 3 and I have a database+frontend application (for example, the quickstart-keyvalue application from Origin), the following scenario occurs:
* Instantiate template
* Database comes up
* Build starts
* Build completes
* Deployment tries to happen
* Deployer fails due to pod limit
3m 3m 1 ruby-sample-build-1-build Pod implicitly required container POD Killing {kubelet ose3-node1.example.com} Killing with docker id 87ec4
b3d6e9d
3m 3m 1 frontend-1-deploy Pod spec.containers{deployment} Started {kubelet ose3-node2.example.com} Started with docker id 698c6
f6c91c5
2m 2m 1 frontend-1-h27dz Pod implicitly required container POD Created {kubelet ose3-node1.example.com} Created with docker id f63b7
9a0fed7
2m 2m 1 frontend-1 ReplicationController SuccessfulCreate {replication-controller } Created pod: frontend-1-h27d
z
2m 2m 1 frontend-1-h27dz Pod Scheduled {scheduler } Successfully assigned fronte
nd-1-h27dz to ose3-node1.example.com
2m 2m 1 frontend-1-h27dz Pod implicitly required container POD Pulled {kubelet ose3-node1.example.com} Container image "openshift3/
ose-pod:v3.0.2.906" already present on machine
2m 2m 1 frontend-1-h27dz Pod spec.containers{ruby-helloworld} Pulling {kubelet ose3-node1.example.com} pulling image "172.30.129.15
5:5000/quickstart/ruby-sample@sha256:e838ff8ae3ae89f69f1cd647c0fa8793e8a344cfcb59fb49e88b2bd52fa82c18"
2m 2m 1 frontend-1-h27dz Pod implicitly required container POD Started {kubelet ose3-node1.example.com} Started with docker id f63b7
9a0fed7
2m 2m 1 frontend-1-h27dz Pod spec.containers{ruby-helloworld} Created {kubelet ose3-node1.example.com} Created with docker id 64edd
f6a4638
2m 2m 1 frontend-1-h27dz Pod spec.containers{ruby-helloworld} Pulled {kubelet ose3-node1.example.com} Successfully pulled image "1
72.30.129.155:5000/quickstart/ruby-sample@sha256:e838ff8ae3ae89f69f1cd647c0fa8793e8a344cfcb59fb49e88b2bd52fa82c18"
2m 2m 1 frontend-1-h27dz Pod spec.containers{ruby-helloworld} Started {kubelet ose3-node1.example.com} Started with docker id 64edd
f6a4638
2m 1m 6 frontend-1 ReplicationController FailedCreate {replication-controller } Error creating: Pod "fronten
d-1-" is forbidden: limited to 3 pods
46s 46s 1 frontend-1-deploy Pod FailedSync {kubelet ose3-node2.example.com} Error syncing pod, skipping:
failed to delete containers ([exit status 1])
46s 46s 1 frontend-1-deploy Pod implicitly required container POD Killing {kubelet ose3-node2.example.com} Killing with docker id 1e73b
8b04de0
43s 43s 1 frontend-1-h27dz Pod spec.containers{ruby-helloworld} Killing {kubelet ose3-node1.example.com} Killing with docker id 64edd
f6a4638
43s 43s 1 frontend-1-h27dz Pod implicitly required container POD Killing {kubelet ose3-node1.example.com} Killing with docker id f63b7
9a0fed7
^C[joe@ose3-master ~]$ oc get pod
NAME READY STATUS RESTARTS AGE
database-1-qnppj 1/1 Running 0 5m
frontend-1-deploy 0/1 Error 0 3m
ruby-sample-build-1-build 0/1 Completed 0 5m
I feel like this should've worked (database + deployer + frontend = 3), but it didn't... so something else is going on here, too.
Either way, because the deployer failed, we seem to be in the same situation as the build. We don't retry.
I wouldn't expect the failed deployment to keep retrying. Derek, what am I missing about the quota piece that is causing it to deny here? This is an issue with the build code. It's setting the phase to Pending when the build pod fails to create, which it should not be doing. Reassigning to Ben. marking upcoming release, given where we're at in the release this there is no room to get this in and you can work around it by just creating a new build. Is this the kind of thing that should go in release notes as a "known issue" with documented work-around? yeah i'll add an entry. fixed here: https://github.com/openshift/origin/pull/5743 QE verified with origin: devenv-rhel7_3063 and OSE version: openshift v3.1.1.0 kubernetes v1.1.0-origin-1107-g4c8e6f4 1. When the pods failed schedule with quota , the build status is NEW [root@ip-172-18-3-90 ~]# oc get builda NAME TYPE FROM STATUS STARTED DURATION ruby-sample-build-1 Source Git New (CannotCreateBuildPod) 2. update the quota, and increase the pods NO. , the build is handled correctly [root@ip-172-18-3-90 ~]# oc get pod NAME READY STATUS RESTARTS AGE ruby-sample-build-1-build 1/1 Running 0 2m Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2016:0070 |