Bug 1333129

Summary: Cannot scale up a pod while a deployment is not completed
Product: OpenShift Container Platform Reporter: Cesar Wong <cewong>
Component: openshift-controller-managerAssignee: Michail Kargakis <mkargaki>
Status: CLOSED ERRATA QA Contact: zhou ying <yinzhou>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.2.1CC: aos-bugs, dmace, jforrest, jkaur, jokerman, mfojtik, mifiedle, mkargaki, mmccomas, pweil, spadgett, tdawson, yinzhou
Target Milestone: ---   
Target Release: 3.2.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-09-27 09:32:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
scale request
none
event log
none
scaling none

Description Cesar Wong 2016-05-04 17:38:25 UTC
Created attachment 1153959 [details]
scale request

Description of problem:

After a deployment has created a new pod, and deleted the old pod, clicking on the 
up arrow to scale up the pod results in a message saying 'Scaling up...' but nothing
happens.

Version-Release number of selected component (if applicable):
3.2

How reproducible:
Always

Steps to Reproduce:
1. create an app with new-app: 'oc new-app https://github.com/csrwng/simple-ruby.git'
2. after the build completes, and an initial deployment has happened, start a new
   build.
3. on the overview page, wait for the new deployment to create the new pod, and delete the previous pod.
4. immediately after the previous pod disappears, click on the up arrow to scale up.

Actual results:

The pod says 'Scaling...' but nothing happens.

Expected results:

The pod scales up successfully.

Additional info:

Comment 1 Cesar Wong 2016-05-04 18:42:38 UTC
Created attachment 1153982 [details]
event log

Comment 2 Jessica Forrester 2016-05-04 19:05:37 UTC
it seems like a race condition in the deployment controller, the UI is just updating the scale resource on the DC. The key seems to be related to the timing, if you scale up after the old deployment disappears, the new deployment has scaled up, but the deployment still says it is "in progress".

Comment 3 zhou ying 2016-05-05 06:24:34 UTC
We can reproduce by command:
`oc deploy simple-ruby --latest; oc scale dc/simple-ruby --replicas=5`

Comment 4 Michail Kargakis 2016-05-05 10:04:14 UTC
That's because 1) the deployment is running on a separate process with the desired replica size being fixed which means that the deployment needs to complete before being able to scale and 2) even after the deployment process finishes, it should be scaled up to dc.spec.replicas but we have hacked the controller to restore dc.spec.replicas back to rc.spec.replicas just because we need to support older clients that try to scale a deploymentconfig. For now, you should not try to scale the dc while a deployment is in-flight but set it prior to the deployment or after it is complete.

Comment 5 Samuel Padgett 2016-05-05 13:58:49 UTC
I'll disable the scaling controls during a deployment.

Comment 6 Michail Kargakis 2016-05-05 14:30:53 UTC
Thanks Sam!

Comment 7 Samuel Padgett 2016-05-05 17:07:28 UTC
https://github.com/openshift/origin/pull/8761

Comment 8 Samuel Padgett 2016-05-05 18:07:22 UTC
Pull request in the origin/master merge queue.

Comment 10 zhou ying 2016-05-31 03:51:19 UTC
Confirmed with ami devenv-rhel7_4294, When the deployment in-flight, the scale was disabled, but when the deployment completed, and the scale enable immediately scale up, will meet : the pod saying:scaling to x ..., but wait for a long time , the scale not succeed.
please see the attachments.
openshift v1.3.0-alpha.1-41-g681170a
kubernetes v1.3.0-alpha.1-331-g0522e63
etcd 2.3.0

Comment 11 zhou ying 2016-05-31 03:52:48 UTC
Created attachment 1163019 [details]
scaling

Comment 12 Samuel Padgett 2016-05-31 12:23:32 UTC
(In reply to zhou ying from comment #10)
> Confirmed with ami devenv-rhel7_4294, When the deployment in-flight, the
> scale was disabled, but when the deployment completed, and the scale enable
> immediately scale up, will meet : the pod saying:scaling to x ..., but wait
> for a long time , the scale not succeed.

There are several reasons this could happen and might not be a bug. Can you check that you're not at your pods quota and the browse events page to see if there are any warnings?

Comment 13 Samuel Padgett 2016-07-05 13:08:42 UTC
yinzhou Any update? Do you still see the problem?

Comment 14 zhou ying 2016-07-06 06:32:16 UTC
Confirmed with ami devenv-rhel7_4530, can't reproduce this issue now on browse. 

But by command:

[root@ip-172-18-2-106 amd64]# oc get po
NAME              READY     STATUS      RESTARTS   AGE
ruby-ex-1-build   0/1       Completed   0          7m
ruby-ex-3-hzkj1   1/1       Running     0          2m
ruby-ex-3-qpgrg   1/1       Running     0          2m
ruby-ex-3-unfuj   1/1       Running     0          2m
[root@ip-172-18-2-106 amd64]# oc deploy ruby-ex --latest ; oc scale dc/ruby-ex --replicas=5
Started deployment #4
Use 'oc logs -f dc/ruby-ex' to track its progress.
deploymentconfig "ruby-ex" scaled

[root@ip-172-18-2-106 amd64]# oc get po
NAME              READY     STATUS      RESTARTS   AGE
ruby-ex-1-build   0/1       Completed   0          12m
ruby-ex-4-26fsh   1/1       Running     0          4m
ruby-ex-4-hn8o8   1/1       Running     0          4m
ruby-ex-4-jiigp   1/1       Running     0          4m

Comment 15 Samuel Padgett 2016-07-06 13:01:19 UTC
Michail, Dan (Mace), do we want to guard against this problem when scaling with the CLI?

Comment 16 Samuel Padgett 2016-07-18 14:23:09 UTC
Reassigning since the web console side is fixed. See comment #14.

Comment 17 Michail Kargakis 2016-07-20 08:46:28 UTC
*** Bug 1353834 has been marked as a duplicate of this bug. ***

Comment 18 Michail Kargakis 2016-07-20 08:53:29 UTC
*** Bug 1306720 has been marked as a duplicate of this bug. ***

Comment 19 Michal Fojtik 2016-07-20 11:17:19 UTC
We will probably just display a warning to users in CLI, but after we talked with Michalis we don't want to prevent them from scaling.

Comment 20 Michal Fojtik 2016-08-11 07:39:27 UTC
Cesar, Sam: I don't think we can show a warning in CLI in a reasonable way as the `oc scale` is upstream. We will have to create 'smarter' wrapper that will check the state of DC. I'm not 100% convinced that we want to do that refactor to only gain the warning.

I'm in favor of closing this as the UI portion is now fixed. WDYT?

Comment 21 Cesar Wong 2016-08-11 13:52:33 UTC
Michal, I'm ok with closing it as well.

Comment 22 Michal Fojtik 2016-08-11 13:55:45 UTC
Setting ON_QA so QA can close this.

Comment 23 zhou ying 2016-08-12 03:26:04 UTC
Confirmed with 3.3 latest env, the issue has fixed on browse. 
openshift version
openshift v3.3.0.18
kubernetes v1.3.0+507d3a7
etcd 2.3.0+git

On browse when deploying, the scaling up arrow is disable.

Comment 25 errata-xmlrpc 2016-09-27 09:32:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1933