Bug 1278232
Summary: | if build fails to schedule because of quota, and pod count is reduced, build never automatically starts | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Erik M Jacobs <ejacobs> |
Component: | Build | Assignee: | Cesar Wong <cewong> |
Status: | CLOSED ERRATA | QA Contact: | Wenjing Zheng <wzheng> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.1.0 | CC: | aos-bugs, bleanhar, bparees, dmcphers, haowang, jokerman, mmccomas, pruan |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-01-26 19:16:51 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Erik M Jacobs
2015-11-05 02:06:36 UTC
[joe@ose3-master ~]$ oc get pod NAME READY STATUS RESTARTS AGE example-1-build 0/1 Completed 0 5m example-1-m9b2a 1/1 Running 0 3m [joe@ose3-master ~]$ oc describe quota sinatra-quota Name: sinatra-quota Namespace: sinatra Resource Used Hard -------- ---- ---- cpu 10m 500m memory 100Mi 512Mi pods 1 3 replicationcontrollers 1 3 resourcequotas 1 1 services 1 3 so, currently quota is 3 pods but only 1 exists. Build has been pending for 3+ minutes. In a related issue, if my pod limit is 3 and I have a database+frontend application (for example, the quickstart-keyvalue application from Origin), the following scenario occurs: * Instantiate template * Database comes up * Build starts * Build completes * Deployment tries to happen * Deployer fails due to pod limit 3m 3m 1 ruby-sample-build-1-build Pod implicitly required container POD Killing {kubelet ose3-node1.example.com} Killing with docker id 87ec4 b3d6e9d 3m 3m 1 frontend-1-deploy Pod spec.containers{deployment} Started {kubelet ose3-node2.example.com} Started with docker id 698c6 f6c91c5 2m 2m 1 frontend-1-h27dz Pod implicitly required container POD Created {kubelet ose3-node1.example.com} Created with docker id f63b7 9a0fed7 2m 2m 1 frontend-1 ReplicationController SuccessfulCreate {replication-controller } Created pod: frontend-1-h27d z 2m 2m 1 frontend-1-h27dz Pod Scheduled {scheduler } Successfully assigned fronte nd-1-h27dz to ose3-node1.example.com 2m 2m 1 frontend-1-h27dz Pod implicitly required container POD Pulled {kubelet ose3-node1.example.com} Container image "openshift3/ ose-pod:v3.0.2.906" already present on machine 2m 2m 1 frontend-1-h27dz Pod spec.containers{ruby-helloworld} Pulling {kubelet ose3-node1.example.com} pulling image "172.30.129.15 5:5000/quickstart/ruby-sample@sha256:e838ff8ae3ae89f69f1cd647c0fa8793e8a344cfcb59fb49e88b2bd52fa82c18" 2m 2m 1 frontend-1-h27dz Pod implicitly required container POD Started {kubelet ose3-node1.example.com} Started with docker id f63b7 9a0fed7 2m 2m 1 frontend-1-h27dz Pod spec.containers{ruby-helloworld} Created {kubelet ose3-node1.example.com} Created with docker id 64edd f6a4638 2m 2m 1 frontend-1-h27dz Pod spec.containers{ruby-helloworld} Pulled {kubelet ose3-node1.example.com} Successfully pulled image "1 72.30.129.155:5000/quickstart/ruby-sample@sha256:e838ff8ae3ae89f69f1cd647c0fa8793e8a344cfcb59fb49e88b2bd52fa82c18" 2m 2m 1 frontend-1-h27dz Pod spec.containers{ruby-helloworld} Started {kubelet ose3-node1.example.com} Started with docker id 64edd f6a4638 2m 1m 6 frontend-1 ReplicationController FailedCreate {replication-controller } Error creating: Pod "fronten d-1-" is forbidden: limited to 3 pods 46s 46s 1 frontend-1-deploy Pod FailedSync {kubelet ose3-node2.example.com} Error syncing pod, skipping: failed to delete containers ([exit status 1]) 46s 46s 1 frontend-1-deploy Pod implicitly required container POD Killing {kubelet ose3-node2.example.com} Killing with docker id 1e73b 8b04de0 43s 43s 1 frontend-1-h27dz Pod spec.containers{ruby-helloworld} Killing {kubelet ose3-node1.example.com} Killing with docker id 64edd f6a4638 43s 43s 1 frontend-1-h27dz Pod implicitly required container POD Killing {kubelet ose3-node1.example.com} Killing with docker id f63b7 9a0fed7 ^C[joe@ose3-master ~]$ oc get pod NAME READY STATUS RESTARTS AGE database-1-qnppj 1/1 Running 0 5m frontend-1-deploy 0/1 Error 0 3m ruby-sample-build-1-build 0/1 Completed 0 5m I feel like this should've worked (database + deployer + frontend = 3), but it didn't... so something else is going on here, too. Either way, because the deployer failed, we seem to be in the same situation as the build. We don't retry. I wouldn't expect the failed deployment to keep retrying. Derek, what am I missing about the quota piece that is causing it to deny here? This is an issue with the build code. It's setting the phase to Pending when the build pod fails to create, which it should not be doing. Reassigning to Ben. marking upcoming release, given where we're at in the release this there is no room to get this in and you can work around it by just creating a new build. Is this the kind of thing that should go in release notes as a "known issue" with documented work-around? yeah i'll add an entry. fixed here: https://github.com/openshift/origin/pull/5743 QE verified with origin: devenv-rhel7_3063 and OSE version: openshift v3.1.1.0 kubernetes v1.1.0-origin-1107-g4c8e6f4 1. When the pods failed schedule with quota , the build status is NEW [root@ip-172-18-3-90 ~]# oc get builda NAME TYPE FROM STATUS STARTED DURATION ruby-sample-build-1 Source Git New (CannotCreateBuildPod) 2. update the quota, and increase the pods NO. , the build is handled correctly [root@ip-172-18-3-90 ~]# oc get pod NAME READY STATUS RESTARTS AGE ruby-sample-build-1-build 1/1 Running 0 2m Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2016:0070 |