Description of problem: Rolling update does not perform as it described in documentation. 1. It was discovered that system does not wait for pod to terminate. as a result update finishes with less active pods then required. When pods take long time to terminate rolling update scenario finished with this: NAME READY STATUS RESTARTS AGE pret-1-build 0/1 Completed 0 19m pret-1-nwz64 1/1 Terminating 0 18m pret-1-oeakn 1/1 Terminating 0 17m pret-1-ov9j0 1/1 Terminating 0 19m pret-1-wauee 1/1 Terminating 0 18m pret-2-0hukg 0/1 Running 0 27s pret-2-2qk9o 0/1 Running 0 31s pret-2-build 0/1 Completed 0 1m pret-2-ie41n 1/1 Running 0 46s pret-2-j6gxm 1/1 Running 0 21s In this example there are only two pods which are in running and ready state. 2. Pods which were requested to terminate are still marked as Ready. Whereas [0] says that pod is removed from end points list when it is shown as terminating. [0] https://github.com/kubernetes/kubernetes/blob/release-1.1/docs/user-guide/pods.md#termination-of-pods Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Probably related to https://bugzilla.redhat.com/show_bug.cgi?id=1281286
Agree with Michail that it's related. The deployer has to trust the ready state of pods reported by the Kubelet, and if Terminating pods are considered "Ready", then the deployer has no real choice but to consider them ready. If Terminating pods shouldn't be ready, the issue is with the Kubelet (which is what sets the ready state for the pods). We need to follow up with Kubernetes to sort out the relationship between readiness and Terminating. I wouldn't say this is a bug with deployments directly, but deployments are certainly affected in a way that seems surprising, so let's leave this bug open for now.
Spoke with Clayton, and decided that the updater should not count terminating pods towards the minimum even though they're "Ready". Ignoring them is also consistent with how RCs handle pods. This will require an upstream fix. Since the issue has existed since the introduction of the rolling updater, I'm marking the issue UpcomingRelease.
Can you post the output of the following command? oc get dc/pret -o yaml Also note that there aren't only 2 pods running and in ready state but you have 4 more pods that may be in terminating state and not under the service anymore but they still serve live connections.
Alexander bump.
Bump #2 :-)
Still not fixed. I will get to this once 1.3 is out.
Michal can you take a look at this and move the discussion upstream? We will need to do it for both the rolling updater and Deployments, if we end up doing it.
I have a fix for this upstream: https://github.com/kubernetes/kubernetes/pull/39150
Needs testing
Hi Michail Kargakis: When I tested with rolling, the replicas was 4, and maxSurge: 25%, maxUnavailable: 25%, when deploy keep 3 pods available, don't exceed 5 pods, the 5 pods contain the deletion pods, like this: [root@zhouy ~]# oc get po NAME READY STATUS RESTARTS AGE ruby-ex-5-fb443 1/1 Terminating 0 33s ruby-ex-5-lkv19 1/1 Terminating 0 16s ruby-ex-5-sb61z 1/1 Running 0 34s ruby-ex-6-czdmp 0/1 ContainerCreating 0 <invalid> ruby-ex-6-deploy 1/1 Running 0 <invalid> ruby-ex-6-fkjjr 1/1 Running 0 <invalid> ruby-ex-6-mv810 0/1 ContainerCreating 0 <invalid> ruby-ex-6-r3c8v 1/1 Running 0 <invalid> So, is this right for the rolling updater wasn't ignoring pods marked for deletion and was counting them as ready ?
You have three pods running which is the minimum allowed by the deployment which is fine. Did you observe less than 3 pods running at any point in time for the deployment?
No less than 3 pods running for the deployment.
Can't reproduce this issue with latest OCP3.5: openshift version openshift v3.5.0.17+c55cf2b kubernetes v1.5.2+43a9be4 etcd 3.1.0 [root@zhouy testjson]# oc get po NAME READY STATUS RESTARTS AGE database-1-7gld8 1/1 Running 0 12m frontend-3-2ghjv 1/1 Terminating 0 12s frontend-3-7c7j3 1/1 Running 0 16s frontend-4-0lnbj 1/1 Running 0 <invalid> frontend-4-deploy 1/1 Running 0 <invalid> frontend-4-gbjd9 1/1 Running 0 <invalid> frontend-4-hook-pre 0/1 Completed 0 <invalid> frontend-4-m2jk4 1/1 Running 0 <invalid> frontend-4-n47ng 0/1 ContainerCreating 0 <invalid>
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0884