Bug 1307004
| Summary: | Rolling strategy scaling down pods before new pods pass ready check | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Alexander Koksharov <akokshar> |
| Component: | openshift-controller-manager | Assignee: | Michail Kargakis <mkargaki> |
| Status: | CLOSED ERRATA | QA Contact: | zhou ying <yinzhou> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 3.1.0 | CC: | akokshar, aos-bugs, asogukpi, dmace, erich, mfojtik, mkargaki, pep, pweil, tdawson, wmeng, yinzhou |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
The rolling updater wasn't ignoring pods marked for deletion and was counting them as ready.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-04-12 19:04:52 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1267746 | ||
|
Description
Alexander Koksharov
2016-02-12 12:27:02 UTC
Probably related to https://bugzilla.redhat.com/show_bug.cgi?id=1281286 Agree with Michail that it's related. The deployer has to trust the ready state of pods reported by the Kubelet, and if Terminating pods are considered "Ready", then the deployer has no real choice but to consider them ready. If Terminating pods shouldn't be ready, the issue is with the Kubelet (which is what sets the ready state for the pods). We need to follow up with Kubernetes to sort out the relationship between readiness and Terminating. I wouldn't say this is a bug with deployments directly, but deployments are certainly affected in a way that seems surprising, so let's leave this bug open for now. Spoke with Clayton, and decided that the updater should not count terminating pods towards the minimum even though they're "Ready". Ignoring them is also consistent with how RCs handle pods. This will require an upstream fix. Since the issue has existed since the introduction of the rolling updater, I'm marking the issue UpcomingRelease. Can you post the output of the following command? oc get dc/pret -o yaml Also note that there aren't only 2 pods running and in ready state but you have 4 more pods that may be in terminating state and not under the service anymore but they still serve live connections. Alexander bump. Bump #2 :-) Still not fixed. I will get to this once 1.3 is out. Michal can you take a look at this and move the discussion upstream? We will need to do it for both the rolling updater and Deployments, if we end up doing it. I have a fix for this upstream: https://github.com/kubernetes/kubernetes/pull/39150 Needs testing Hi Michail Kargakis: When I tested with rolling, the replicas was 4, and maxSurge: 25%, maxUnavailable: 25%, when deploy keep 3 pods available, don't exceed 5 pods, the 5 pods contain the deletion pods, like this: [root@zhouy ~]# oc get po NAME READY STATUS RESTARTS AGE ruby-ex-5-fb443 1/1 Terminating 0 33s ruby-ex-5-lkv19 1/1 Terminating 0 16s ruby-ex-5-sb61z 1/1 Running 0 34s ruby-ex-6-czdmp 0/1 ContainerCreating 0 <invalid> ruby-ex-6-deploy 1/1 Running 0 <invalid> ruby-ex-6-fkjjr 1/1 Running 0 <invalid> ruby-ex-6-mv810 0/1 ContainerCreating 0 <invalid> ruby-ex-6-r3c8v 1/1 Running 0 <invalid> So, is this right for the rolling updater wasn't ignoring pods marked for deletion and was counting them as ready ? You have three pods running which is the minimum allowed by the deployment which is fine. Did you observe less than 3 pods running at any point in time for the deployment? No less than 3 pods running for the deployment. Can't reproduce this issue with latest OCP3.5: openshift version openshift v3.5.0.17+c55cf2b kubernetes v1.5.2+43a9be4 etcd 3.1.0 [root@zhouy testjson]# oc get po NAME READY STATUS RESTARTS AGE database-1-7gld8 1/1 Running 0 12m frontend-3-2ghjv 1/1 Terminating 0 12s frontend-3-7c7j3 1/1 Running 0 16s frontend-4-0lnbj 1/1 Running 0 <invalid> frontend-4-deploy 1/1 Running 0 <invalid> frontend-4-gbjd9 1/1 Running 0 <invalid> frontend-4-hook-pre 0/1 Completed 0 <invalid> frontend-4-m2jk4 1/1 Running 0 <invalid> frontend-4-n47ng 0/1 ContainerCreating 0 <invalid> Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0884 |