Bug 1314270

Summary:

Canceling a deployment doesn't cancel a deployment

Product:

OpenShift Container Platform

Reporter:

Alexander Koksharov <akokshar>

Component:

openshift-controller-manager

Assignee:

Michail Kargakis <mkargaki>

Status:

CLOSED ERRATA

QA Contact:

zhou ying <yinzhou>

Severity:

medium

Docs Contact:

Priority:

unspecified

Version:

3.1.0

CC:

akokshar, aos-bugs, mkargaki, tdawson

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2016-05-12 16:31:18 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
cancelled_deployment_still_a_pod.PNG	none

Description Alexander Koksharov 2016-03-03 10:16:18 UTC

Created attachment 1132718 [details]
cancelled_deployment_still_a_pod.PNG

Description of problem:
I have a build config that pushes to an image stream, and a deploy config that listens to the stream as a trigger. I activated a build, then cancelled it immediately. It must have successfully pushed the image before the cancellation, because the deploy configuration started up, then that  cancelled automatically moments later. however, it still deployed a pod, and that pod is still running, despite the fact that the deploy config is in a "cancelled" state. It really is a strange situation, so I have attached an image for clarity.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Michail Kargakis 2016-03-03 13:51:38 UTC

Can you provide the yaml from that pod, its replication controller, and the deploymentconfig?

Comment 2 Michail Kargakis 2016-03-04 13:10:28 UTC

Can you provide the yaml from that pod, its replication controller, and the deploymentconfig?

Comment 3 Michail Kargakis 2016-03-08 16:01:03 UTC

I can notice cancelations lagging a bit but eventually the canceled deployment's pods are scaled down. Have you noticed something different than this?

Comment 4 Michail Kargakis 2016-03-11 16:26:09 UTC

I haven't noticed any strange behavior other than the fact that old pods lagging for a while before scaled down when a deployment is marked as cancelled (it may also be related to other bugs such as https://bugzilla.redhat.com/show_bug.cgi?id=1281286). I don't consider this a blocker bug so I am marking this for the upcoming release.

Comment 5 Michail Kargakis 2016-03-18 20:37:54 UTC

This happens due to the resync interval of the deploymentconfig controller (2 minutes currently). When a deployment is cancelled, its deploymentconfig is not resynced on the spot. There are two possible solutions to this: 1) reduce the dc controller interval or 2) force a reconcile of a dc from the deployer controller right after the canceled deployment is marked as failed. The latter option would immediately scale down the cancelled deployment but it would add another update site for deployment configs (the deployer controller), which slightly increases the update conflicts surface for dcs. Former option implemented in https://github.com/openshift/origin/pull/8147.

Comment 6 openshift-github-bot 2016-03-25 04:04:50 UTC

Commit pushed to master at https://github.com/openshift/origin

https://github.com/openshift/origin/commit/cd5302abc821dd23d5df1e6bf53e9fe576e82886
Bug 1314270: force dc reconcilation on canceled deployments

Force a deploymentconfig reconcilation when its running deployment
is canceled instead of relying on the deploymentconfig cache sync
interval for rolling back.

Comment 7 zhou ying 2016-03-29 07:44:08 UTC

I confirmed on ami devenv_rhel_3849 , 
openshift v1.1.4-296-g8e98dcc
kubernetes v1.2.0-36-g4a3f9c5
etcd 2.2.5


the steps like this:

1. Use command to start build:
`oc process -f /data/src/github.com/openshift/origin/examples/sample-app/application-template-stibuild.json |oc create -f -`
2. Use command to logs the build:
  `oc build-logs ruby-sample-build-2`
3. When the push the image successfully, cancel the deployment immediately
   `oc deploy frontend --cancel`

Then the deployment canceled, not reproduce the issue. Does my steps suitable ?

Comment 8 Michail Kargakis 2016-03-29 08:31:05 UTC

> Then the deployment canceled, not reproduce the issue. Does my steps suitable ?

Yes they are ok, but it would be nice to also test with an older complete deployment so you can notice the old deployment being scaled back up as soon as the new is cancelled.

Comment 9 zhou ying 2016-03-29 10:32:56 UTC

Confirmed on latest OSE, the issue has fixed. 
[root@openshift-147 ~]# openshift version
openshift v3.2.0.8
kubernetes v1.2.0-36-g4a3f9c5
etcd 2.2.5

latest: digest: sha256:d36d4166122ad206d23eb77a2a54db0a3e2e137c9ebdf0b666f468acaf82d6ca size: 89224
I0329 06:30:39.300583       1 sti.go:277] Successfully pushed 172.31.39.241:5000/zhouy/origin-ruby-sample:latest
[root@zhouy testjson]# oc deploy frontend --cancel
Cancelled deployment #2
[root@zhouy testjson]# oc get pods
NAME                        READY     STATUS             RESTARTS   AGE
database-1-izax4            1/1       Running            0          24m
frontend-1-53riz            1/1       Running            0          20m
frontend-1-y74en            1/1       Running            0          20m
frontend-2-deploy           0/1       DeadlineExceeded   0          <invalid>
frontend-2-hook-pre         0/1       DeadlineExceeded   0          <invalid>
ruby-sample-build-1-build   0/1       Completed          0          24m
ruby-sample-build-2-build   0/1       Completed          0          1m
[root@zhouy testjson]# oc get rc
NAME         DESIRED   CURRENT   AGE
database-1   1         1         24m
frontend-1   2         2         20m
frontend-2   0         0         <invalid>

Comment 11 errata-xmlrpc 2016-05-12 16:31:18 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:1064