1481801 – Issues with deployment on us-east-2 after 3.6 upgrade

Bug 1481801 - Issues with deployment on us-east-2 after 3.6 upgrade

Summary: Issues with deployment on us-east-2 after 3.6 upgrade

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Online
Classification:	Red Hat
Component:	Deployments
Sub Component:
Version:	3.x
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	urgent
Target Milestone:	---
Target Release:	3.x
Assignee:	Tomáš Nožička
QA Contact:	zhou ying
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1482337 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-08-15 18:37 UTC by Paul Bergene
Modified:	2023-09-14 04:02 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:	undefined
Clone Of:
Environment:
Last Closed:	2017-11-09 18:50:01 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Sequence of writes observed via watch to the deployment (1.56 KB, application/x-gzip) 2017-08-15 19:55 UTC, Clayton Coleman	no flags	Details
yml files and script to setup example with issue (1.32 KB, application/x-gzip) 2017-08-17 04:42 UTC, Angel Misevski	no flags	Details
View All

Description Paul Bergene 2017-08-15 18:37:50 UTC

Description of problem:

We are having trouble with eclipse-che deployments on us-east-2 after 3.6 upgrade. Specifically issues with pods being scaled down to 0 immediately after deployment.

We are unsure what is the direct cause of the issue, if it is something in che that 3.6 treats with more scrutiny or something more directly related to openshift.

Clayton, Justin and Matt were on a call with a few OSIO people and have more background on this issue.

Requesting clayton, amisevsk to supplement with deployment config and other relevant details 

Version-Release number of selected component (if applicable):


OpenShift Master:
    v3.6.173.0.5 (online version 3.5.0.20)
Kubernetes Master:
    v1.6.1+5115d708d7 

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Clayton Coleman 2017-08-15 19:52:17 UTC

Recorded the changes when one of these was created:

1. CREATE initial create
2. UPDATE no-op update (no idea who, no status data)
3. UPDATE scaled from 1 -> 0 (very rapidly, probably a controller of some form)
4. UPDATE deployment controller updates status with the result from #2

Afterwards I observe that the RS is set to scale 0.  We also validated a pod is briefly created (from step 2) before being removed (by the controller).

There are two primary problems:

1. something is scaling to 0 immediately - is this part of OSIO, or is it the deployment controller gone rogue
2. the deployment controller never updates the status of the deployment to reflect that it is scaled to zero (which seems like a bug)

Comment 2 Clayton Coleman 2017-08-15 19:53:44 UTC

Additional data we need:

1. When creating the same deployment in a completely different context, do we still observe the same scale to zero (basically identify whether this is a cluster problem, a deployment controller problem, or a rogue user-provided controller problem)

Investigations:

1. Why do we get two events - create, then no-op?  Is this the controller checking in but not setting status? Another controller?  the client?

Comment 3 Clayton Coleman 2017-08-15 19:55:05 UTC

Created attachment 1313835 [details]
Sequence of writes observed via watch to the deployment

Comment 4 Karanbir Singh 2017-08-15 20:20:49 UTC

as a data point, the same deployment run in the same way was tried on OCP 3.5 earlier today, and had no problems ( the replica scales to 1, and brings up the service ). 

Looking at the replica set, the deployment for it - they both specify replaces:1 at time of creation and remain at 1 through the life of the app.

Comment 5 Clayton Coleman 2017-08-15 20:23:26 UTC

Also, need a user to investigate whether setting Deployment scale to 1 after the system has settled, what is the outcome on both deployment status and the underlying RS?

Comment 6 Angel Misevski 2017-08-15 21:57:21 UTC

Adding what we've found from the Che side:

- Eclipse Che uses fabric8 kubernetes-client to create OpenShift resources
- During workspace creation, Che creates routes, a service, and a deployment for the workspace. All resources are created successfully
- Immediately after the deployment is created and its pod starts, for some (as of yet unclear) reason, kubernetes-client attempts to scale this deployment down to 0 replicas.
- The attempt to scale the deployment to 0 fails, and kubernetes-client waits 10 minutes until deleting the deployment and cleaning up after itself.
- Looking at the deployment that is created by Che in the OpenShift console, we can see that it starts off at 1 replica, but is shortly thereafter modified to be zero replicas.
- If the deployment is exported and created in OpenShift manually (with one replica) it starts normally and stays running -- it is not scaled down.
- As far as I can tell, the scale down call did not occur prior to OS 3.6

As a result, it seems like the scaling down of workspace deployments is being caused by Che. However, a second issue seems to be occurring, where the number of active replicas, according to their deployment does not properly get set to 0 when the deployment is scaled down. I think that this is the issue that is causing Che to fail to launch workspaces. Kubernetes-client is attempting to scale deployments down to zero, and this is successful, but if you `oc describe` the deployment, the number of replicas is listed as 1/0.

- Once the ten minute timeout elapses, the error

2017-08-15 17:41:00,613[aceSharedPool-0] [ERROR] [c.d.i.DeploymentOperationsImpl 149] - 1/0 pod(s) ready for Deployment: che-ws-slygsb8hunnr00md in namespace: amisevsk-che after waiting for 600 seconds so giving up

appears in the Che logs (note that the class logging the error is part of kubernetes-client, not Che).

- After the deployment is scaled down and its pod terminated, checking the deployment with oc yields [1]. Note that the deployment lists 1 replica available despite the replica set showing 0.

- This can be reproduced manually -- attempting to scale a deployment down through the Openshift console will yield 1/0 replicas on the deployment, and 0/0 on the replicaset. [2]

I think what's happening is that kubernetes client is waiting on the deployment to show that it has successfully scaled down. This never happens even though the pod is shut down and the replica set is scaled to zero. As a result kubernetes-client hangs and this causes Che to stop working.

[1] - https://gist.github.com/amisevsk/a55ceb933fb59bf716066a86fba05308
[2] - https://i.snag.gy/thkVuz.jpg

Comment 7 Angel Misevski 2017-08-16 05:21:36 UTC

I've managed to reproduce the issue running locally on minishift with OpenShift version 

openshift v3.6.0+c4dd4cf
kubernetes v1.6.1+5115d708d7

Comment 8 Angel Misevski 2017-08-16 07:03:51 UTC

Correction to my earlier comment: 

- The issue of Che scaling down deployments is reproduced
- The issue of deployments staying at 1/0 is *not*.

So Che works half-okay in this case.

Comment 9 Tomáš Nožička 2017-08-16 08:55:42 UTC

I have tried to create a Deployment in v3.6.173.0.5-1 and scale it to 0 and availability went to 0 as well. There can be several reason why it is showing 1/0, most notably the "replicas" count is the desired state, not the state cluster is in - meaning such state hasn't been reconciled yet - e.g. there is a terminating pod. Please provide output of `oc get deployments -o yaml`, `oc get rs -o yaml` and `oc get po -o yaml` if that still seems like an issue.

But looking at [1] I can see that DESIRED is 0 but there is 1 pod under CURRENT so is the reason why it's available. (You forgot to list pods there.)

On the main question of scaling replicas to 0 if the deployment is created/manipulated by your fabric8-client I would guess that if you can reproduce it on a Deployment without that client/controller involved, it's a bug in upstream Deployments, if not the chances are that your app/fabric8-client has some bug and not OpenShift. Reading that if you export it and create it without that client/controller involved and it works fine would indicate the later.

I am not really sure what the fabric8-client/controller does but it is possible that upgrade from 3.5 -> 3.6 just uncovered some bug there? Considering that it works fine when you create it manually.

Comment 10 Clayton Coleman 2017-08-16 17:54:38 UTC

I was never seeing state reconciled, even with zero pods.  I.e. the scaled to zero deployment never saw status update to reflect zero pods.

Comment 11 Angel Misevski 2017-08-17 04:42:21 UTC

Created attachment 1314528 [details]
yml files and script to setup example with issue

Reproduction steps:

1. extract files
2. run setup.sh to create imagestream, imagestreamtags, and deployments; wait until deployments are ready
3. execute `oc get all`, compare against file oc-get-all-before-scale
4. scale ruby-istag and ruby-remote to 0 replicas
5. execute `oc get all`, compare against file oc-get-all-after-scale

Comment 12 Angel Misevski 2017-08-17 04:49:25 UTC

Tomáš, I was unclear in my earlier explanation; I apologize.

The originally reported issue of scaling down deployments immediately was indeed a Che issue (related to a change in OS 3.6), and we have resolved it. However, there is still an underlying issue on OSO that caused a relatively simple Che issue to be very difficult to diagnose -- scaling down a deployment should not be an issue that causes Che to hang.

The scaling issue seems to only manifest when Deployment is defined from an ImageStreamTag. If the same deployment is defined with a normal docker image, the issue does not occur.

I've created an attachment with a simple example to demonstrate. The two deployments are identical, except that one uses an ImageStreamTag, while the other uses the image that the ImageStreamTag is defined from. 

The output I get from step 3 above is:

is/ruby   172.30.98.11:5000/amisevsk-che/ruby   latest,2.3   3 minutes ago

NAME                 DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/ruby-istag    1         0         0            0           3m
deploy/ruby-remote   1         1         1            1           3m

NAME                        DESIRED   CURRENT   READY     AGE
rs/ruby-istag-2604530443    1         1         1         3m
rs/ruby-remote-1464138083   1         1         1         3m

NAME                              READY     STATUS    RESTARTS   AGE
po/ruby-istag-2604530443-qz5x8    1/1       Running   0          3m
po/ruby-remote-1464138083-lpp6k   1/1       Running   0          3m


and the output after step 5 is:

NAME      DOCKER REPO                           TAGS         UPDATED
is/ruby   172.30.98.11:5000/amisevsk-che/ruby   latest,2.3   5 minutes ago
NAME                 DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/ruby-istag    0         1         0            1           5m
deploy/ruby-remote   0         0         0            0           5m
NAME                        DESIRED   CURRENT   READY     AGE
rs/ruby-istag-2604530443    0         0         0         5m
rs/ruby-remote-1464138083   0         0         0         5m

It seems that the CURRENT and AVAILABLE fields are a step behind where they should be.

Also, I will link to [1], since I do not believe in coincidences

Let me know if any more info is needed.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1482337

Comment 13 Tomáš Nožička 2017-08-17 13:03:56 UTC

Thanks, I was able to reproduce the issue and I believe that both issues are connected; more precisely caused by some bug in upstream deployment controller that prevents updating Deployment status. Just one update of status succeeds - when doing the scale. That's why it stucks in 1 available replica although there is no pod. There is actually an error in master logs indicating deployment has a collision:

E0817 14:52:16.907932   14594 deployment_controller.go:485] replicasets.extensions "ruby-istag-89978933" already exists

I am yet to find why is it trying to deploy from the same hash which it shouldn't. I know there were some collisions in upstream deployments that should be fixed with collision avoidance algorithm.

Looking into it.

Comment 14 Tomáš Nožička 2017-08-18 16:02:04 UTC

There is a bug in the platform causing this weird behaviour. Essentially when you reference an ImageStream by TAG in upstream Deployment that will create a ReplicaSet with hash of template.spec in its name. But some rouge controller in OpenShift (we are suspecting the new generic image trigger although that's yet to be confirmed) will rewrite the ReplicaSet's image to sha. Since all our tools first resolve a TAG into SHA this doesn't manifest. But because the Deployment controller compares it's template to the ReplicaSet's to find out if it has the latest version or it should deploy, because the TAG is replaced by SHA only in RS those will ALWAYS differ so it tries to create a new ReplicaSet. Because it's the same ReplicaSet as it was trying to create initially it has the same hash and the names collide. In newest version like master we have collision avoidance and this will manifest as Deployment creating new RS over and over again :/

I have confirmed it worked with 3.5, although you shouldn't use TAG like that - see A).

This is certainly a bug we will have to fix, but to get you unblocked:

A) Use SHA instead of a TAG that you should be doing anyways to maintain immutability. There is an API endpoint you can call.
(If you scale that Deployment and the tag gets rewritten in the meantime you will have pods using different image.)

B) Use DeploymentConfigs that have native support for ImageStreams.

C) Use the new generic image trigger that allows you to have triggers for upstream objects. I am afraid we don't have documentation for it, at least I couldn't find it - Clayton, is there something?

Set annotation like this and you should be good to go:

```
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
image.alpha.openshift.io/triggers: '[{"from":{"kind":"ImageStreamTag","name":"ruby:latest"},"fieldPath":"spec.template.spec.containers[0].image"}]'
```

C) seems like the easiest change for you I guess

Comment 15 Clayton Coleman 2017-08-18 21:41:01 UTC

The image policy plugin changed in 3.6 to consider more resources than previously (where it only considered pods) to support the "local name" feature that allows deployments to reference pods by name.  This inadvertently added replica sets to the set of resources that can have images be rewritten (without explicit action by the user).

The fix for 3.6.1 is going to be to require an admin to explicitly request rewrites via the config for certain types, and to exclude resources like RS, RC, and Jobs from getting this behavior automatically.  

An admin in 3.6.0 can disable this behavior in their master-config.yaml in the ImagePolicyConfig object as

    executionRules: 
    ...
    resolutionRules:
    - targetResource:
        resource: pods
      localNames: true
      type: Attempt

(add a new top level resolutionRules with only pods).

Comment 16 Clayton Coleman 2017-08-18 22:15:39 UTC

3.6.x fix https://github.com/openshift/origin/pull/15867

Comment 17 Clayton Coleman 2017-08-18 22:16:37 UTC

master fix https://github.com/openshift/origin/pull/15868

Comment 18 Tomáš Nožička 2017-08-22 08:02:14 UTC

*** Bug 1482337 has been marked as a duplicate of this bug. ***

Comment 21 Tomáš Nožička 2017-08-28 16:53:55 UTC

fixes is merged in master and 3.6 branch now

Comment 22 zhou ying 2017-09-04 08:11:42 UTC

When the env updated will verify it.

Comment 23 Boris Kurktchiev 2017-09-21 01:07:13 UTC

So unsure if I am seeing this exact behavior but I figured I post and asks since the problem seems similar enough. I currently have the pv-recycler constantly create itself, fail, delete, rinse and repeat and it keeps failing with:
Failed to pull image "openshift3/ose-recycler:v3.6.173.0.21": rpc error: code = 2 desc = unauthorized: authentication required

I just upgraded us to v3.6.173.0.21

Comment 24 zhou ying 2017-09-28 08:28:44 UTC

Verified with the latest version, the issue has fixed:
openshift v3.6.173.0.37
kubernetes v1.6.1+5115d708d7

Before scale:
[root@zhouy testjson]# oc get all 
NAME                DOCKER REPO                    TAGS         UPDATED
imagestreams/ruby   172.30.98.11:5000/zhouy/ruby   latest,2.3   7 minutes ago

NAME                              READY     STATUS    RESTARTS   AGE
po/ruby-istag-3728341130-t7n8x    1/1       Running   0          6m
po/ruby-remote-1464138083-dbpkw   1/1       Running   0          4m

NAME                 DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/ruby-istag    1         1         1            1           6m
deploy/ruby-remote   1         1         1            1           4m

NAME                        DESIRED   CURRENT   READY     AGE
rs/ruby-istag-3728341130    1         1         1         6m
rs/ruby-remote-1464138083   1         1         1         4m


After scale:
[root@zhouy testjson]# oc get all 
NAME                DOCKER REPO                    TAGS         UPDATED
imagestreams/ruby   172.30.98.11:5000/zhouy/ruby   2.3,latest   8 seconds ago

NAME                              READY     STATUS        RESTARTS   AGE
po/ruby-remote-1464138083-g5fwm   1/1       Terminating   0          6s

NAME                 DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/ruby-istag    0         0         0            0           7s
deploy/ruby-remote   0         0         0            0           6s

NAME                        DESIRED   CURRENT   READY     AGE
rs/ruby-istag-3728341130    0         0         0         7s
rs/ruby-remote-1464138083   0         0         0         6s

Comment 25 Red Hat Bugzilla 2023-09-14 04:02:27 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.