Bug 1415196
Summary: | Rolling deployments fail when Quota Limit is reached | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Will Gordon <wgordon> |
Component: | openshift-controller-manager | Assignee: | Michal Fojtik <mfojtik> |
Status: | CLOSED EOL | QA Contact: | zhou ying <yinzhou> |
Severity: | low | Docs Contact: | |
Priority: | low | ||
Version: | unspecified | CC: | aos-bugs, mfojtik, wgordon |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-08-23 12:50:18 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Will Gordon
2017-01-20 14:27:23 UTC
Can you please share more details about your deployment? Did you deployment complete or error? How the pods scale down (by one?). Can you also provide events from the namespace/deployment? My deployment consists of a custom docker image with httpd serving static files. The image is already stored in the project, and scaling up/down works without issue. My understanding for rolling deployments was whenever a new deployment is triggered, pods are supposed to scale down (within the limits, in this case only 25% at a time) before scaling up the new pods. However this does not seem to be the case when hitting Quota limits. When the quota limit is reached, the rolling deployment attempts to deploy the new pods without scaling down available pods. Actual results: 8 "a" pods (8) -> 8 "a" pods & 2 "b" pods (10) -> 6 "a" pods & 2 "b" pods (8) Expected results: 8 "a" pods (8) -> 6 "a" pods (6) -> 6 "a" pods & 2 "b" pods (8) The deployment (after the config change) will fail with an error. The associated log messages: --> Scaling up openshift-5 from 0 to 8, scaling down openshift-4 from 8 to 0 (keep 6 pods available, don't exceed 10 pods) Scaling openshift-5 up to 2 --> FailedCreate: openshift-5 Error creating: pods "openshift-5-" is forbidden: exceeded quota: compute-resources, requested: limits.cpu=490m,limits.memory=251Mi, used: limits.cpu=3904m,limits.memory=2000Mi, limited: limits.cpu=4,limits.memory=2Gi --> FailedCreate: openshift-5 Error creating: pods "openshift-5-" is forbidden: exceeded quota: compute-resources, requested: limits.cpu=490m,limits.memory=251Mi, used: limits.cpu=3904m,limits.memory=2000Mi, limited: limits.cpu=4,limits.memory=2Gi --> FailedCreate: openshift-4 Error creating: pods "openshift-4-" is forbidden: exceeded quota: compute-resources, requested: limits.cpu=488m,limits.memory=250Mi, used: limits.cpu=3904m,limits.memory=2000Mi, limited: limits.cpu=4,limits.memory=2Gi --> FailedCreate: openshift-4 Error creating: pods "openshift-4-" is forbidden: exceeded quota: compute-resources, requested: limits.cpu=488m,limits.memory=250Mi, used: limits.cpu=3904m,limits.memory=2000Mi, limited: limits.cpu=4,limits.memory=2Gi error: timed out waiting for "openshift-5" to be synced // @kargakis Deployments have been working like that since ever. The problem seems to be that we always scale up first if the user uses maxSurge, so the new replication controller is scaled up, the pods cannot be created, the rc observed generation is not updated. QE: Can you please verify if this is still an issue on the latest Origin? The deployment will always scale up first if you use maxSurge. When you are constrained by quota, it is recommended to set maxSurge to zero and solely use maxUnavailable. Correction:
> the rc observed generation is not updated.
The rc observedGeneration is synced, the problem seems to be that the underlying scaler is waiting for the created replicas to match the desired replicas.
|