Bug 1333129

Summary:

Cannot scale up a pod while a deployment is not completed

Product:

OpenShift Container Platform

Reporter:

Cesar Wong <cewong>

Component:

openshift-controller-manager

Assignee:

Michail Kargakis <mkargaki>

Status:

CLOSED ERRATA

QA Contact:

zhou ying <yinzhou>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

3.2.1

CC:

aos-bugs, dmace, jforrest, jkaur, jokerman, mfojtik, mifiedle, mkargaki, mmccomas, pweil, spadgett, tdawson, yinzhou

Target Milestone:

---

Target Release:

3.2.1

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2016-09-27 09:32:06 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
scale request	none
event log	none
scaling	none

Description Cesar Wong 2016-05-04 17:38:25 UTC

Created attachment 1153959 [details]
scale request

Description of problem:

After a deployment has created a new pod, and deleted the old pod, clicking on the 
up arrow to scale up the pod results in a message saying 'Scaling up...' but nothing
happens.

Version-Release number of selected component (if applicable):
3.2

How reproducible:
Always

Steps to Reproduce:
1. create an app with new-app: 'oc new-app https://github.com/csrwng/simple-ruby.git'
2. after the build completes, and an initial deployment has happened, start a new
   build.
3. on the overview page, wait for the new deployment to create the new pod, and delete the previous pod.
4. immediately after the previous pod disappears, click on the up arrow to scale up.

Actual results:

The pod says 'Scaling...' but nothing happens.

Expected results:

The pod scales up successfully.

Additional info:

Comment 1 Cesar Wong 2016-05-04 18:42:38 UTC

Created attachment 1153982 [details]
event log

Comment 2 Jessica Forrester 2016-05-04 19:05:37 UTC

it seems like a race condition in the deployment controller, the UI is just updating the scale resource on the DC. The key seems to be related to the timing, if you scale up after the old deployment disappears, the new deployment has scaled up, but the deployment still says it is "in progress".

Comment 3 zhou ying 2016-05-05 06:24:34 UTC

We can reproduce by command:
`oc deploy simple-ruby --latest; oc scale dc/simple-ruby --replicas=5`

Comment 4 Michail Kargakis 2016-05-05 10:04:14 UTC

That's because 1) the deployment is running on a separate process with the desired replica size being fixed which means that the deployment needs to complete before being able to scale and 2) even after the deployment process finishes, it should be scaled up to dc.spec.replicas but we have hacked the controller to restore dc.spec.replicas back to rc.spec.replicas just because we need to support older clients that try to scale a deploymentconfig. For now, you should not try to scale the dc while a deployment is in-flight but set it prior to the deployment or after it is complete.

Comment 5 Samuel Padgett 2016-05-05 13:58:49 UTC

I'll disable the scaling controls during a deployment.

Comment 6 Michail Kargakis 2016-05-05 14:30:53 UTC

Thanks Sam!

Comment 7 Samuel Padgett 2016-05-05 17:07:28 UTC

https://github.com/openshift/origin/pull/8761

Comment 8 Samuel Padgett 2016-05-05 18:07:22 UTC

Pull request in the origin/master merge queue.

Comment 10 zhou ying 2016-05-31 03:51:19 UTC

Confirmed with ami devenv-rhel7_4294, When the deployment in-flight, the scale was disabled, but when the deployment completed, and the scale enable immediately scale up, will meet : the pod saying:scaling to x ..., but wait for a long time , the scale not succeed.
please see the attachments.
openshift v1.3.0-alpha.1-41-g681170a
kubernetes v1.3.0-alpha.1-331-g0522e63
etcd 2.3.0

Comment 11 zhou ying 2016-05-31 03:52:48 UTC

Created attachment 1163019 [details]
scaling

Comment 12 Samuel Padgett 2016-05-31 12:23:32 UTC

(In reply to zhou ying from comment #10)
> Confirmed with ami devenv-rhel7_4294, When the deployment in-flight, the
> scale was disabled, but when the deployment completed, and the scale enable
> immediately scale up, will meet : the pod saying:scaling to x ..., but wait
> for a long time , the scale not succeed.

There are several reasons this could happen and might not be a bug. Can you check that you're not at your pods quota and the browse events page to see if there are any warnings?

Comment 13 Samuel Padgett 2016-07-05 13:08:42 UTC

yinzhou Any update? Do you still see the problem?

Comment 14 zhou ying 2016-07-06 06:32:16 UTC

Confirmed with ami devenv-rhel7_4530, can't reproduce this issue now on browse. 

But by command:

[root@ip-172-18-2-106 amd64]# oc get po
NAME              READY     STATUS      RESTARTS   AGE
ruby-ex-1-build   0/1       Completed   0          7m
ruby-ex-3-hzkj1   1/1       Running     0          2m
ruby-ex-3-qpgrg   1/1       Running     0          2m
ruby-ex-3-unfuj   1/1       Running     0          2m
[root@ip-172-18-2-106 amd64]# oc deploy ruby-ex --latest ; oc scale dc/ruby-ex --replicas=5
Started deployment #4
Use 'oc logs -f dc/ruby-ex' to track its progress.
deploymentconfig "ruby-ex" scaled

[root@ip-172-18-2-106 amd64]# oc get po
NAME              READY     STATUS      RESTARTS   AGE
ruby-ex-1-build   0/1       Completed   0          12m
ruby-ex-4-26fsh   1/1       Running     0          4m
ruby-ex-4-hn8o8   1/1       Running     0          4m
ruby-ex-4-jiigp   1/1       Running     0          4m

Comment 15 Samuel Padgett 2016-07-06 13:01:19 UTC

Michail, Dan (Mace), do we want to guard against this problem when scaling with the CLI?

Comment 16 Samuel Padgett 2016-07-18 14:23:09 UTC

Reassigning since the web console side is fixed. See comment #14.

Comment 17 Michail Kargakis 2016-07-20 08:46:28 UTC

*** Bug 1353834 has been marked as a duplicate of this bug. ***

Comment 18 Michail Kargakis 2016-07-20 08:53:29 UTC

*** Bug 1306720 has been marked as a duplicate of this bug. ***

Comment 19 Michal Fojtik 2016-07-20 11:17:19 UTC

We will probably just display a warning to users in CLI, but after we talked with Michalis we don't want to prevent them from scaling.

Comment 20 Michal Fojtik 2016-08-11 07:39:27 UTC

Cesar, Sam: I don't think we can show a warning in CLI in a reasonable way as the `oc scale` is upstream. We will have to create 'smarter' wrapper that will check the state of DC. I'm not 100% convinced that we want to do that refactor to only gain the warning.

I'm in favor of closing this as the UI portion is now fixed. WDYT?

Comment 21 Cesar Wong 2016-08-11 13:52:33 UTC

Michal, I'm ok with closing it as well.

Comment 22 Michal Fojtik 2016-08-11 13:55:45 UTC

Setting ON_QA so QA can close this.

Comment 23 zhou ying 2016-08-12 03:26:04 UTC

Confirmed with 3.3 latest env, the issue has fixed on browse. 
openshift version
openshift v3.3.0.18
kubernetes v1.3.0+507d3a7
etcd 2.3.0+git

On browse when deploying, the scaling up arrow is disable.

Comment 25 errata-xmlrpc 2016-09-27 09:32:06 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1933