1333129 – Cannot scale up a pod while a deployment is not completed

Bug 1333129 - Cannot scale up a pod while a deployment is not completed

Summary: Cannot scale up a pod while a deployment is not completed

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	openshift-controller-manager
Sub Component:
Version:	3.2.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	3.2.1
Assignee:	Michail Kargakis
QA Contact:	zhou ying
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	1306720 1353834 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-05-04 17:38 UTC by Cesar Wong
Modified:	2019-12-16 05:44 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-09-27 09:32:06 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
scale request (314.34 KB, image/png) 2016-05-04 17:38 UTC, Cesar Wong	no flags	Details
event log (246.06 KB, image/png) 2016-05-04 18:42 UTC, Cesar Wong	no flags	Details
scaling (2.83 MB, image/jpeg) 2016-05-31 03:52 UTC, zhou ying	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:1933	0	normal	SHIPPED_LIVE	Red Hat OpenShift Container Platform 3.3 Release Advisory	2016-09-27 13:24:36 UTC

Description Cesar Wong 2016-05-04 17:38:25 UTC

Created attachment 1153959 [details]
scale request

Description of problem:

After a deployment has created a new pod, and deleted the old pod, clicking on the 
up arrow to scale up the pod results in a message saying 'Scaling up...' but nothing
happens.

Version-Release number of selected component (if applicable):
3.2

How reproducible:
Always

Steps to Reproduce:
1. create an app with new-app: 'oc new-app https://github.com/csrwng/simple-ruby.git'
2. after the build completes, and an initial deployment has happened, start a new
   build.
3. on the overview page, wait for the new deployment to create the new pod, and delete the previous pod.
4. immediately after the previous pod disappears, click on the up arrow to scale up.

Actual results:

The pod says 'Scaling...' but nothing happens.

Expected results:

The pod scales up successfully.

Additional info:

Comment 1 Cesar Wong 2016-05-04 18:42:38 UTC

Created attachment 1153982 [details]
event log

Comment 2 Jessica Forrester 2016-05-04 19:05:37 UTC

it seems like a race condition in the deployment controller, the UI is just updating the scale resource on the DC. The key seems to be related to the timing, if you scale up after the old deployment disappears, the new deployment has scaled up, but the deployment still says it is "in progress".

Comment 3 zhou ying 2016-05-05 06:24:34 UTC

We can reproduce by command:
`oc deploy simple-ruby --latest; oc scale dc/simple-ruby --replicas=5`

Comment 4 Michail Kargakis 2016-05-05 10:04:14 UTC

That's because 1) the deployment is running on a separate process with the desired replica size being fixed which means that the deployment needs to complete before being able to scale and 2) even after the deployment process finishes, it should be scaled up to dc.spec.replicas but we have hacked the controller to restore dc.spec.replicas back to rc.spec.replicas just because we need to support older clients that try to scale a deploymentconfig. For now, you should not try to scale the dc while a deployment is in-flight but set it prior to the deployment or after it is complete.

Comment 5 Samuel Padgett 2016-05-05 13:58:49 UTC

I'll disable the scaling controls during a deployment.

Comment 6 Michail Kargakis 2016-05-05 14:30:53 UTC

Thanks Sam!

Comment 7 Samuel Padgett 2016-05-05 17:07:28 UTC

https://github.com/openshift/origin/pull/8761

Comment 8 Samuel Padgett 2016-05-05 18:07:22 UTC

Pull request in the origin/master merge queue.

Comment 10 zhou ying 2016-05-31 03:51:19 UTC

Confirmed with ami devenv-rhel7_4294, When the deployment in-flight, the scale was disabled, but when the deployment completed, and the scale enable immediately scale up, will meet : the pod saying:scaling to x ..., but wait for a long time , the scale not succeed.
please see the attachments.
openshift v1.3.0-alpha.1-41-g681170a
kubernetes v1.3.0-alpha.1-331-g0522e63
etcd 2.3.0

Comment 11 zhou ying 2016-05-31 03:52:48 UTC

Created attachment 1163019 [details]
scaling

Comment 12 Samuel Padgett 2016-05-31 12:23:32 UTC

(In reply to zhou ying from comment #10)
> Confirmed with ami devenv-rhel7_4294, When the deployment in-flight, the
> scale was disabled, but when the deployment completed, and the scale enable
> immediately scale up, will meet : the pod saying:scaling to x ..., but wait
> for a long time , the scale not succeed.

There are several reasons this could happen and might not be a bug. Can you check that you're not at your pods quota and the browse events page to see if there are any warnings?

Comment 13 Samuel Padgett 2016-07-05 13:08:42 UTC

yinzhou Any update? Do you still see the problem?

Comment 14 zhou ying 2016-07-06 06:32:16 UTC

Confirmed with ami devenv-rhel7_4530, can't reproduce this issue now on browse. 

But by command:

[root@ip-172-18-2-106 amd64]# oc get po
NAME              READY     STATUS      RESTARTS   AGE
ruby-ex-1-build   0/1       Completed   0          7m
ruby-ex-3-hzkj1   1/1       Running     0          2m
ruby-ex-3-qpgrg   1/1       Running     0          2m
ruby-ex-3-unfuj   1/1       Running     0          2m
[root@ip-172-18-2-106 amd64]# oc deploy ruby-ex --latest ; oc scale dc/ruby-ex --replicas=5
Started deployment #4
Use 'oc logs -f dc/ruby-ex' to track its progress.
deploymentconfig "ruby-ex" scaled

[root@ip-172-18-2-106 amd64]# oc get po
NAME              READY     STATUS      RESTARTS   AGE
ruby-ex-1-build   0/1       Completed   0          12m
ruby-ex-4-26fsh   1/1       Running     0          4m
ruby-ex-4-hn8o8   1/1       Running     0          4m
ruby-ex-4-jiigp   1/1       Running     0          4m

Comment 15 Samuel Padgett 2016-07-06 13:01:19 UTC

Michail, Dan (Mace), do we want to guard against this problem when scaling with the CLI?

Comment 16 Samuel Padgett 2016-07-18 14:23:09 UTC

Reassigning since the web console side is fixed. See comment #14.

Comment 17 Michail Kargakis 2016-07-20 08:46:28 UTC

*** Bug 1353834 has been marked as a duplicate of this bug. ***

Comment 18 Michail Kargakis 2016-07-20 08:53:29 UTC

*** Bug 1306720 has been marked as a duplicate of this bug. ***

Comment 19 Michal Fojtik 2016-07-20 11:17:19 UTC

We will probably just display a warning to users in CLI, but after we talked with Michalis we don't want to prevent them from scaling.

Comment 20 Michal Fojtik 2016-08-11 07:39:27 UTC

Cesar, Sam: I don't think we can show a warning in CLI in a reasonable way as the `oc scale` is upstream. We will have to create 'smarter' wrapper that will check the state of DC. I'm not 100% convinced that we want to do that refactor to only gain the warning.

I'm in favor of closing this as the UI portion is now fixed. WDYT?

Comment 21 Cesar Wong 2016-08-11 13:52:33 UTC

Michal, I'm ok with closing it as well.

Comment 22 Michal Fojtik 2016-08-11 13:55:45 UTC

Setting ON_QA so QA can close this.

Comment 23 zhou ying 2016-08-12 03:26:04 UTC

Confirmed with 3.3 latest env, the issue has fixed on browse. 
openshift version
openshift v3.3.0.18
kubernetes v1.3.0+507d3a7
etcd 2.3.0+git

On browse when deploying, the scaling up arrow is disable.

Comment 25 errata-xmlrpc 2016-09-27 09:32:06 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1933

Note You need to log in before you can comment on or make changes to this bug.