Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1307004 - Rolling strategy scaling down pods before new pods pass ready check
Rolling strategy scaling down pods before new pods pass ready check
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Deployments (Show other bugs)
3.1.0
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: Michail Kargakis
zhou ying
:
Depends On:
Blocks: 1267746
  Show dependency treegraph
 
Reported: 2016-02-12 07:27 EST by Alexander Koksharov
Modified: 2018-07-19 03:36 EDT (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
The rolling updater wasn't ignoring pods marked for deletion and was counting them as ready.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-04-12 15:04:52 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0884 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.5 RPM Release Advisory 2017-04-12 18:50:07 EDT

  None (edit)
Description Alexander Koksharov 2016-02-12 07:27:02 EST
Description of problem:
Rolling update does not perform as it described in documentation.

1. It was discovered that system does not wait for pod to terminate. as a result update finishes with less active pods then required.
When pods take long time to terminate rolling update scenario finished with this:

NAME           READY     STATUS        RESTARTS   AGE
pret-1-build   0/1       Completed     0          19m
pret-1-nwz64   1/1       Terminating   0          18m
pret-1-oeakn   1/1       Terminating   0          17m
pret-1-ov9j0   1/1       Terminating   0          19m
pret-1-wauee   1/1       Terminating   0          18m
pret-2-0hukg   0/1       Running       0          27s
pret-2-2qk9o   0/1       Running       0          31s
pret-2-build   0/1       Completed     0          1m
pret-2-ie41n   1/1       Running       0          46s
pret-2-j6gxm   1/1       Running       0          21s

In this example there are only two pods which are in running and ready state.

2. Pods which were requested to terminate are still marked as Ready. Whereas [0] says that pod is removed from end points list when it is shown as terminating.
[0] https://github.com/kubernetes/kubernetes/blob/release-1.1/docs/user-guide/pods.md#termination-of-pods


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 1 Michail Kargakis 2016-02-15 05:22:25 EST
Probably related to https://bugzilla.redhat.com/show_bug.cgi?id=1281286
Comment 2 Dan Mace 2016-02-15 11:27:53 EST
Agree with Michail that it's related. The deployer has to trust the ready state of pods reported by the Kubelet, and if Terminating pods are considered "Ready", then the deployer has no real choice but to consider them ready. If Terminating pods shouldn't be ready, the issue is with the Kubelet (which is what sets the ready state for the pods). We need to follow up with Kubernetes to sort out the relationship between readiness and Terminating.

I wouldn't say this is a bug with deployments directly, but deployments are certainly affected in a way that seems surprising, so let's leave this bug open for now.
Comment 3 Dan Mace 2016-02-15 14:52:20 EST
Spoke with Clayton, and decided that the updater should not count terminating pods towards the minimum even though they're "Ready". Ignoring them is also consistent with how RCs handle pods. This will require an upstream fix. Since the issue has existed since the introduction of the rolling updater, I'm marking the issue UpcomingRelease.
Comment 7 Michail Kargakis 2016-06-10 05:29:04 EDT
Can you post the output of the following command?

oc get dc/pret -o yaml

Also note that there aren't only 2 pods running and in ready state but you have 4 more pods that may be in terminating state and not under the service anymore but they still serve live connections.
Comment 8 Michal Fojtik 2016-07-20 07:06:03 EDT
Alexander bump.
Comment 9 Michal Fojtik 2016-08-01 07:35:29 EDT
Bump #2 :-)
Comment 13 Michail Kargakis 2016-08-24 15:13:48 EDT
Still not fixed. I will get to this once 1.3 is out.
Comment 14 Michail Kargakis 2016-09-16 11:26:35 EDT
Michal can you take a look at this and move the discussion upstream? We will need to do it for both the rolling updater and Deployments, if we end up doing it.
Comment 17 Michail Kargakis 2016-12-22 08:33:20 EST
I have a fix for this upstream: https://github.com/kubernetes/kubernetes/pull/39150
Comment 18 Michail Kargakis 2017-02-01 13:00:29 EST
Needs testing
Comment 19 zhou ying 2017-02-03 22:29:58 EST
Hi Michail Kargakis:
   When I tested with rolling, the replicas was 4, and maxSurge: 25%, maxUnavailable: 25%, when deploy keep 3 pods available, don't exceed 5 pods, the 5 pods contain the deletion pods, like this:

[root@zhouy ~]# oc get po 
NAME               READY     STATUS              RESTARTS   AGE
ruby-ex-5-fb443    1/1       Terminating         0          33s
ruby-ex-5-lkv19    1/1       Terminating         0          16s
ruby-ex-5-sb61z    1/1       Running             0          34s
ruby-ex-6-czdmp    0/1       ContainerCreating   0          <invalid>
ruby-ex-6-deploy   1/1       Running             0          <invalid>
ruby-ex-6-fkjjr    1/1       Running             0          <invalid>
ruby-ex-6-mv810    0/1       ContainerCreating   0          <invalid>
ruby-ex-6-r3c8v    1/1       Running             0          <invalid>

So, is this right for the rolling updater wasn't ignoring pods marked for deletion and was counting them as ready ?
Comment 20 Michail Kargakis 2017-02-04 07:09:31 EST
You have three pods running which is the minimum allowed by the deployment which is fine. Did you observe less than 3 pods running at any point in time for the deployment?
Comment 21 zhou ying 2017-02-05 20:20:43 EST
No less than 3 pods running for the deployment.
Comment 22 zhou ying 2017-02-07 01:15:58 EST
Can't reproduce this issue with latest OCP3.5:
openshift version
openshift v3.5.0.17+c55cf2b
kubernetes v1.5.2+43a9be4
etcd 3.1.0

[root@zhouy testjson]# oc get po 
NAME                  READY     STATUS              RESTARTS   AGE
database-1-7gld8      1/1       Running             0          12m
frontend-3-2ghjv      1/1       Terminating         0          12s
frontend-3-7c7j3      1/1       Running             0          16s
frontend-4-0lnbj      1/1       Running             0          <invalid>
frontend-4-deploy     1/1       Running             0          <invalid>
frontend-4-gbjd9      1/1       Running             0          <invalid>
frontend-4-hook-pre   0/1       Completed           0          <invalid>
frontend-4-m2jk4      1/1       Running             0          <invalid>
frontend-4-n47ng      0/1       ContainerCreating   0          <invalid>
Comment 24 errata-xmlrpc 2017-04-12 15:04:52 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0884

Note You need to log in before you can comment on or make changes to this bug.