Bug 1292021 - [devexp_public_640] Failed to "Cancel deployments in Openshift" via jenkins
Summary: [devexp_public_640] Failed to "Cancel deployments in Openshift" via jenkins
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OKD
Classification: Red Hat
Component: Image
Version: 3.x
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Gabe Montero
QA Contact: Wang Haoran
URL:
Whiteboard:
Depends On:
Blocks: 1294940 1308390 1312826
TreeView+ depends on / blocked
 
Reported: 2015-12-16 09:02 UTC by wewang
Modified: 2016-05-12 17:12 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1294940 1312826 (view as bug list)
Environment:
Last Closed: 2016-05-12 17:12:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Cancel deploy config attachment (28.46 KB, image/png)
2015-12-16 09:03 UTC, wewang
no flags Details
cancel deploy output log (61.13 KB, image/png)
2015-12-16 09:04 UTC, wewang
no flags Details

Description wewang 2015-12-16 09:02:28 UTC
Version-Release number of selected component (if applicable):
openshift/jenkins-1-rhel7  27fca1f9ef45

How reproducible:
always

Description of problem:
Failed to  Cancel deployments in openshift via jenkins

Steps to Reproduce:
1. Install jenkins in project test
2. Create project 
$ oc new-project wewang1
$ oc new-app -f https://raw.githubusercontent.com/openshift/origin/master/examples/sample-app/application-template-stibuild.json

3. $ oc policy add-role-to-user edit system:serviceaccount:test:default -n wewang1

4. Create a new job:job1 in jenkins webpage and add  post below:
"Cancel deployments in Openshif" config:
The name of the project the build is running in:wewang1
The name of the DeploymentConfig to search for active builds:frontend   

5. $ oc deploy frontend  --latest

6. build job:job1

7.Check console output

Started by user Jenkins Admin
Building in workspace /var/lib/jenkins/jobs/Testplugin/workspace
ERROR: Build step failed with exception
java.lang.ClassCastException: com.openshift.internal.restclient.model.List cannot be cast to com.openshift.restclient.model.IDeploymentConfig
    at com.openshift.jenkins.plugins.pipeline.OpenShiftDeployCanceller.coreLogic(OpenShiftDeployCanceller.java:208)
    at com.openshift.jenkins.plugins.pipeline.OpenShiftDeployCanceller.perform(OpenShiftDeployCanceller.java:256)
    at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
    at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:761)
    at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:721)
    at hudson.model.Build$BuildExecution.cleanUp(Build.java:193)
    at hudson.model.Run.execute(Run.java:1788)
    at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
    at hudson.model.ResourceController.execute(ResourceController.java:98)
    at hudson.model.Executor.run(Executor.java:374)
Build step 'Cancel deployments in OpenShift' marked build as failure
Finished: SUCCESS
   

Actual results:
 Cannot cancel deployment, new pod and rc are created 

Expected results:
 deployment will be cancelled

Comment 1 wewang 2015-12-16 09:03:47 UTC
Created attachment 1106336 [details]
Cancel deploy config attachment

Comment 2 wewang 2015-12-16 09:04:29 UTC
Created attachment 1106337 [details]
cancel deploy output log

Comment 3 Gabe Montero 2015-12-16 16:10:48 UTC
Yep, an external customer reported this just the other day.

I have reproduced using the steps in the description with version 1.0.3 of the plugin that is currently installed in the jenkins images.

I then updated my running copy of the jenkins image with the currenty 1.0.4-snapshot, ran through the same steps, and see the deployment cancellation happen successfully.

I'll work with wewang in the trello card https://trello.com/c/6gai8wLM/640-8-ci-jenkins-openshift-v3-plugin to sort out the logistics of delivering this fix.

Comment 4 Gabe Montero 2015-12-18 17:28:36 UTC
v1.0.4 with the fixes for this defect is available for both the centos and rhel versions of the jenkins image.

moving to ON_QA for @wenwang 's verification.

Comment 5 wewang 2015-12-21 09:12:02 UTC
test env has something wrong ,if it's ok , will verify the bug ,thanks

Comment 7 Gabe Montero 2015-12-23 15:32:30 UTC
The Jenkins RHEL image seems to have been reverted somehow (assuming I was not hallucinating during my verification that the RHEL image had v1.0.4 back on Dec 18), as what I pulled from ci.dev.openshift now shows v1.0.3 of the plugin being installed.  It needs to be v1.0.4 to have the fix for this bug.

I've emailed the particulars in this process (Troy Dawson, Scott Dawson, Ben Parees).

Most likely though the RHEL image on ci.dev.openshift won't get properly updated until after the holiday.

Comment 8 Gabe Montero 2016-01-05 21:17:50 UTC
OK, with assists from bparess, sdodson, and tdawson, we've got the jenkins rhel image now updated on ci.dev.openshift.redhat.com:5000 updated with v104 of the plugin.

Moving back to QA to attempt to verify the fix.

Comment 10 Gabe Montero 2016-01-07 14:57:50 UTC
Based on some code analysis on where "DeadlineExceeded" gets set, a pod ending up in "DeadlineExceeded" status appears to be an independent event to the OpenShift cancelling of deployments.  In fact, k8s manages to setting of that status whereas the OpenShift deployment controller manages the "cancelling".

And it certainly appears to be independent of the DeploymentCancelledAnnotation the jenkins plugin sets on the deployment, same as the oc cli, and not something either can prevent. 

I would then contend that with the absence of the Java stack trace, which was the true problem captured by this defect, we should mark this defect verified.

If you see this consistently, I would expect the same to occur with the `oc` command's version of cancelling the deployment.  Assuming that's true, we should open up a new issue with the Deployment team to confirm if "DeadlineExceeded" is occurring erroneously or not.

Comment 11 wewang 2016-01-08 03:04:39 UTC
@Gabe Montero, you are right, thx for your detail comments, I confirmed with coworkers and I tried to cancel deployment with oc cli, the status is "DeadlineExceeded", so I will verify the bug: 
[root@dhcp-128-91 test]# /home/2016/test/oc deploy frontend  --latest
Started deployment #3
[root@dhcp-128-91 test]# /home/2016/test/oc deploy frontend --cancel
Cancelled deployment #3
[root@dhcp-128-91 test]# oc get pods
NAME                READY     STATUS             RESTARTS   AGE
frontend-1-build    0/1       Completed          0          26m
frontend-2-deploy   0/1       DeadlineExceeded   0          3m
frontend-3-deploy   0/1       DeadlineExceeded   0          50s
frontend-3-gvler    1/1       Running            0          47s
jenkins-1-cir23     1/1       Running            0          27m

Comment 12 Gabe Montero 2016-01-08 14:09:30 UTC
@Wen Wang - sound great - thanks!!


Note You need to log in before you can comment on or make changes to this bug.