Bug 1369314

Summary: Dev preview scale up button doesn't always work
Product: OpenShift Online Reporter: bugreport398
Component: DeploymentsAssignee: Michail Kargakis <mkargaki>
Status: CLOSED CURRENTRELEASE QA Contact: zhou ying <yinzhou>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.xCC: abhgupta, aos-bugs, bugreport398, jokerman, mfojtik, mmccomas, spadgett, zhezli
Target Milestone: ---Flags: mkargaki: needinfo? (mfojtik)
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-02-16 22:12:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
video of clicking scale up after git push triggered build and deploy, then navigating away and back, then clicking scale up again. none

Description bugreport398 2016-08-23 06:34:24 UTC
Created attachment 1193156 [details]
video of clicking scale up after git push triggered build and deploy, then navigating away and back, then clicking scale up again.

Description of problem:

Possible bug in dev preview interface (Firefox 48.0 for Ubuntu).

Scale up button doesn't work in certain circumstances.  

Version-Release number of selected component (if applicable):

Dev Preview

How reproducible:  

I don't know.  

Steps to Reproduce:

1. [Python 2.7 application]

2. Scale down the pod to 0 through the interface.

3. Do a git push.

4. Wait till build has finished and click the Dismiss button on the notification.

5. Wait till deployment finishes.

6. Click the Scale up button (it will animate like it is scaling up, but it isn't)

7.  If you navigate to Home and then click on the Project name, you will see that the scaling progress is not being displayed and the pod is still set to 0.

8.  Clicking the Scale up button again, makes the pod scale up.

So perhaps the on click event is not working as desired after the build has finished? 

Actual results:

The pod does not scale up.  

Expected results:

The pod scales up.  

Additional info:

Comment 1 Samuel Padgett 2016-08-23 14:33:30 UTC
I'm able to reproduce. I see the web console making the scale request, and there are no errors from the API server. I don't believe it's a web console bug. Changing the component to deployments.

PUT /oapi/v1/namespaces/sgp/deploymentconfigs/ldp-js/scale

{  
   "apiVersion":"extensions/v1beta1",
   "kind":"Scale",
   "metadata":{  
      "name":"ldp-js",
      "namespace":"sgp",
      "creationTimestamp":"2016-08-08T13:25:31Z"
   },
   "spec":{  
      "replicas":1
   }
}

Comment 2 Michal Fojtik 2016-08-24 16:00:50 UTC
Does the DC have ConfigChange trigger?

Comment 3 Samuel Padgett 2016-08-24 16:25:43 UTC
Michal, the one I tested did.

Comment 4 Michail Kargakis 2016-08-25 09:10:23 UTC
Triggers have nothing to do with scaling. We haven't changed anything in this area since 3.1. I tried to reproduce but with no luck. Sam can you post the output of `oc get rc -o yaml` when you hit this?

Comment 5 Michal Fojtik 2016-08-25 13:14:45 UTC
I'm also not able to reproduce this on latest master. Moving this off the blocker list and we can consider back porting potential fix for 3.2.

Comment 6 Samuel Padgett 2016-08-25 13:32:08 UTC
Michail, I saw a conflict updating the DC in the events after reproducing yesterday (even though the scale request succeeded). Have you tried on 3.2? It only happens sometimes. When I looked at the RC, but spec and status replicas were 0.

I'll try to reproduce today.

Comment 7 Samuel Padgett 2016-08-25 13:50:47 UTC
Here is the event:

Cannot update deployment sgp/ldp-js-31 status to Pending: replicationcontrollers "ldp-js-31" cannot be updated: the object has been modified; please apply your changes to the latest version and try again

Comment 9 Michail Kargakis 2016-08-26 12:25:13 UTC
The update conflict is not a problem. Are you using the scale subresource of the deployment config or the replication controller in the UI? If the former which should be the correct thing to do, then there is a place in the deploymentconfig controller that we mutate replicas back to the replicas of the replication controller for backwards-compatibility reasons. It may be that we use that path for some requests which is totally wrong.

Comment 10 Samuel Padgett 2016-08-26 17:37:45 UTC
Michail, we're using the scale subresource. See the request above in comment #1.

Comment 11 Michail Kargakis 2016-08-29 14:09:10 UTC
I have managed to reproduce this on 3.2 and find the root cause.


I0829 10:26:26.129482   20773 controller.go:226] Synced deploymentConfig "test/ruby-ex" replicas from 1 to 0 based on the latest/active deployment "test/ruby-ex-2" which was scaled directly or has not previously been synced

oc get rc -o yaml | grep deployment-replicas:
openshift.io/deployment.replicas: ""

The deployer pod controller sets an empty openshift.io/deployment.replicas annotation forcing the deployment config controller to reconcile the deployment config back to the replica size of the replication controller (zero). Still not sure why this is happening because I can reproduce it with some deployments (eg. ruby-ex from `oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-ex.git`) but not with others (eg. the `database` config from the sample-app example).

I cannot reproduce for 3.3 since we have revamped the deployment controllers and dropped the deployer pod controller. Note that this is not a regression for 3.2 (used to work like that since 3.1). Do we need a fix for 3.2? When will Online move to 3.3?

Comment 12 Abhishek Gupta 2016-11-01 20:56:29 UTC
Online is now on 3.3.1 in DevPreview production.

Comment 13 Li Zhe 2016-11-02 05:58:14 UTC
Tested on dev-preview-stg, OpenShift Master:v3.3.1.3, with Python 2.7 and Ruby 2.2, can not fing the bug anymore.
oc get bc -o yaml
...
      openshift.io/deployer-pod.name: ruby-ex-2-deploy
      openshift.io/deployment-config.latest-version: "2"
      openshift.io/deployment-config.name: ruby-ex
      openshift.io/deployment.phase: Complete
      openshift.io/deployment.replicas: "1"
      openshift.io/deployment.status-reason: caused by an image change

...

Can verify now