Bug 1318920 - Could not cancel the deployment successfully
Summary: Could not cancel the deployment successfully
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OKD
Classification: Red Hat
Component: Deployments
Version: 3.x
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Dan Mace
QA Contact: zhou ying
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-18 06:54 UTC by Wei Sun
Modified: 2016-05-12 17:11 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-05-12 17:11:58 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Wei Sun 2016-03-18 06:54:50 UTC
Description of problem:
Did deploy and then cancel it.The client returns "Cancelled deployment". But actually it did not cancel successfully.Described the dc,the deployment status was still completed.

Version-Release number of selected component (if applicable):
dev-preview-int

How reproducible:
Always

Steps to Reproduce:
1.Do the deploy and cancel it
#oc deploy cakephp6 --latest
#oc deploy cakephp6 --cancel
2.Describe the dc
3.

Actual results:
1.$ oc deploy cakephp6 --latest
Started deployment #3
[wsun@dhcp-8-229 cucushift]$ oc deploy cakephp6 --cancel
Cancelled deployment #3

2.$ oc describe dc cakephp6
Deployment #3 (latest):
	Name:		cakephp6-3
	Created:	27 seconds ago
	Status:		Complete
	Replicas:	1 current / 1 desired
	Selector:	deployment=cakephp6-3,deploymentconfig=cakephp6
	Labels:		app=cakephp6,openshift.io/deployment-config.name=cakephp6
	Pods Status:	1 Running / 0 Waiting / 0 Succeeded / 0 Failed


Expected results:
Deployment #3 Status should be Failed

Additional info:
Even if on web console,the Deployment #3 status is Cancelled.But actually,the deployment still is not cancelled,and if I rollback,I still could rollback to Deployment #3

Comment 1 Michail Kargakis 2016-03-21 11:23:53 UTC
Most probably a race between the deployer pod controller (the component responsible for transitioning the deployment between phases) and `oc deploy --cancel`. It seems that the deployment is marked for cancelation (still running) and at the same time the deployer finishes successfully and transitions the deployment to Complete. Opened https://github.com/openshift/origin/pull/8163 that takes a stab at it.

Comment 2 Michail Kargakis 2016-03-25 19:21:33 UTC
We decided that since --cancel is a best-effort call, emitting an event in case of a failed cancel and cleaning up the cancel annotations is enough.

Comment 3 Michail Kargakis 2016-03-28 15:52:44 UTC
Since this is a corner case and already fixed in https://github.com/openshift/origin/pull/8163, I am dropping the priority.

Comment 5 zhou ying 2016-04-01 02:33:55 UTC
The issue only reproduced on Online:

[root@zhouy testjson-for-int]# oc get pods
NAME             READY     STATUS    RESTARTS   AGE
hooks-1-lenv2    1/1       Running   0          14m
hooks-2-deploy   0/1       Error     0          13m
hooks-3-deploy   1/1       Running   0          3m
[root@zhouy testjson-for-int]# oc deploy hooks --cancel
No deployments are in progress (latest deployment #3 running 3 minutes ago)
[root@zhouy testjson-for-int]# oc get pods
NAME             READY     STATUS    RESTARTS   AGE
hooks-1-lenv2    1/1       Running   0          14m
hooks-2-deploy   0/1       Error     0          13m
hooks-3-deploy   1/1       Running   0          3m

Comment 6 Michail Kargakis 2016-04-04 10:15:07 UTC
Zhou,

what you hit is different from the reported issue.

Comment 7 Michail Kargakis 2016-04-04 10:17:18 UTC
Reassigning to the Online team.

Comment 8 Dan Mace 2016-04-04 13:32:21 UTC
The issue described in https://bugzilla.redhat.com/show_bug.cgi?id=1318920#c5 is different from the issue this bug is tracking. I've opened https://bugzilla.redhat.com/show_bug.cgi?id=1323710 to track the newly discovered behavior.

I'm putting this issue back ON_QA so it can be verified against origin where it was originally reported. Let's keep the scope of testing limited to the reported bug and open new bugs as necessary.

Comment 9 zhou ying 2016-04-05 02:49:33 UTC
Hi Dan:
   The issue only reproduced on Online, and not fixed now, and when cancel failed, try to cancel again, will see the https://bugzilla.redhat.com/show_bug.cgi?id=1318920#c5, so, in my point, https://bugzilla.redhat.com/show_bug.cgi?id=1323710 is same issue with this bug. please see:

[root@zhouy roottest]# oc get pods
NAME             READY     STATUS    RESTARTS   AGE
hooks-1-lenv2    1/1       Running   0          4d
hooks-2-deploy   0/1       Error     0          4d
hooks-3-deploy   0/1       Error     0          4d
You have new mail in /var/spool/mail/root
[root@zhouy roottest]# oc deploy hooks --latest
Started deployment #4
[root@zhouy roottest]# oc get pods
NAME             READY     STATUS    RESTARTS   AGE
hooks-1-lenv2    1/1       Running   0          4d
hooks-2-deploy   0/1       Error     0          4d
hooks-3-deploy   0/1       Error     0          4d
hooks-4-deploy   1/1       Running   0          <invalid>
[root@zhouy roottest]# oc deploy hooks --cancel
Cancelled deployment #4
[root@zhouy roottest]# oc get pods
NAME             READY     STATUS    RESTARTS   AGE
hooks-1-lenv2    1/1       Running   0          4d
hooks-2-deploy   0/1       Error     0          4d
hooks-3-deploy   0/1       Error     0          4d
hooks-4-deploy   1/1       Running   0          <invalid>
[root@zhouy roottest]# oc deploy hooks --cancel
No deployments are in progress (latest deployment #4 running less than a second ago)
[root@zhouy roottest]# oc get pods
NAME             READY     STATUS    RESTARTS   AGE
hooks-1-lenv2    1/1       Running   0          4d
hooks-2-deploy   0/1       Error     0          4d
hooks-3-deploy   0/1       Error     0          4d
hooks-4-deploy   1/1       Running   0          <invalid>

Comment 10 Michail Kargakis 2016-04-05 11:27:17 UTC
No, it's not the same issue. `deploy --cancel` marks the deployment (replication controller) as cancelled, then the deployment controller will pick it up, and terminate the deployer pod. So, it's impossible to observe the deployer pod terminating as soon as you --cancel, unless we moved the deployment controller functionality in oc which was discussed and rejected (and still we would just minimize the race window). 

What you are observing in https://bugzilla.redhat.com/show_bug.cgi?id=1323710 is that the cancel message post-cancellation is wrong because we rely on the deployment phase which may still be pending, or running. The component responsible for transitioning the deployment phase is the deployer pod controller and not --cancel.

Also keep in mind that --cancel is a best effort call (noted in the docs at https://docs.openshift.org/latest/dev_guide/deployments.html#canceling-a-deployment). When a user tries to --cancel and the deployer pod just succeeded (what this issue reports), we will just emit an event letting the user know that their --cancel failed.

Comment 11 zhou ying 2016-04-06 02:22:49 UTC
Wait for some mins, the --cancel completed, will verify this bug.
[root@zhouy ~]# oc get pods
NAME             READY     STATUS    RESTARTS   AGE
hooks-1-lenv2    1/1       Running   1          4d
hooks-2-deploy   0/1       Error     0          4d
hooks-3-deploy   0/1       Error     0          4d
hooks-4-deploy   0/1       Error     0          23h
hooks-5-deploy   1/1       Running   0          5m
[root@zhouy ~]# oc get pods
NAME             READY     STATUS    RESTARTS   AGE
hooks-1-lenv2    1/1       Running   1          5d
hooks-2-deploy   0/1       Error     0          5d
hooks-3-deploy   0/1       Error     0          4d
hooks-4-deploy   0/1       Error     0          23h
hooks-5-deploy   0/1       Error     0          9m


Note You need to log in before you can comment on or make changes to this bug.