Bug 1292021

Summary: [devexp_public_640] Failed to "Cancel deployments in Openshift" via jenkins
Product: OKD Reporter: wewang <wewang>
Component: ImageAssignee: Gabe Montero <gmontero>
Status: CLOSED CURRENTRELEASE QA Contact: Wang Haoran <haowang>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.xCC: aos-bugs, bparees, mmccomas
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1294940 1312826 (view as bug list) Environment:
Last Closed: 2016-05-12 17:12:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1294940, 1308390, 1312826    
Attachments:
Description Flags
Cancel deploy config attachment
none
cancel deploy output log none

Description wewang 2015-12-16 09:02:28 UTC
Version-Release number of selected component (if applicable):
openshift/jenkins-1-rhel7  27fca1f9ef45

How reproducible:
always

Description of problem:
Failed to  Cancel deployments in openshift via jenkins

Steps to Reproduce:
1. Install jenkins in project test
2. Create project 
$ oc new-project wewang1
$ oc new-app -f https://raw.githubusercontent.com/openshift/origin/master/examples/sample-app/application-template-stibuild.json

3. $ oc policy add-role-to-user edit system:serviceaccount:test:default -n wewang1

4. Create a new job:job1 in jenkins webpage and add  post below:
"Cancel deployments in Openshif" config:
The name of the project the build is running in:wewang1
The name of the DeploymentConfig to search for active builds:frontend   

5. $ oc deploy frontend  --latest

6. build job:job1

7.Check console output

Started by user Jenkins Admin
Building in workspace /var/lib/jenkins/jobs/Testplugin/workspace
ERROR: Build step failed with exception
java.lang.ClassCastException: com.openshift.internal.restclient.model.List cannot be cast to com.openshift.restclient.model.IDeploymentConfig
    at com.openshift.jenkins.plugins.pipeline.OpenShiftDeployCanceller.coreLogic(OpenShiftDeployCanceller.java:208)
    at com.openshift.jenkins.plugins.pipeline.OpenShiftDeployCanceller.perform(OpenShiftDeployCanceller.java:256)
    at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
    at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:761)
    at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:721)
    at hudson.model.Build$BuildExecution.cleanUp(Build.java:193)
    at hudson.model.Run.execute(Run.java:1788)
    at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
    at hudson.model.ResourceController.execute(ResourceController.java:98)
    at hudson.model.Executor.run(Executor.java:374)
Build step 'Cancel deployments in OpenShift' marked build as failure
Finished: SUCCESS
   

Actual results:
 Cannot cancel deployment, new pod and rc are created 

Expected results:
 deployment will be cancelled

Comment 1 wewang 2015-12-16 09:03:47 UTC
Created attachment 1106336 [details]
Cancel deploy config attachment

Comment 2 wewang 2015-12-16 09:04:29 UTC
Created attachment 1106337 [details]
cancel deploy output log

Comment 3 Gabe Montero 2015-12-16 16:10:48 UTC
Yep, an external customer reported this just the other day.

I have reproduced using the steps in the description with version 1.0.3 of the plugin that is currently installed in the jenkins images.

I then updated my running copy of the jenkins image with the currenty 1.0.4-snapshot, ran through the same steps, and see the deployment cancellation happen successfully.

I'll work with wewang in the trello card https://trello.com/c/6gai8wLM/640-8-ci-jenkins-openshift-v3-plugin to sort out the logistics of delivering this fix.

Comment 4 Gabe Montero 2015-12-18 17:28:36 UTC
v1.0.4 with the fixes for this defect is available for both the centos and rhel versions of the jenkins image.

moving to ON_QA for @wenwang 's verification.

Comment 5 wewang 2015-12-21 09:12:02 UTC
test env has something wrong ,if it's ok , will verify the bug ,thanks

Comment 7 Gabe Montero 2015-12-23 15:32:30 UTC
The Jenkins RHEL image seems to have been reverted somehow (assuming I was not hallucinating during my verification that the RHEL image had v1.0.4 back on Dec 18), as what I pulled from ci.dev.openshift now shows v1.0.3 of the plugin being installed.  It needs to be v1.0.4 to have the fix for this bug.

I've emailed the particulars in this process (Troy Dawson, Scott Dawson, Ben Parees).

Most likely though the RHEL image on ci.dev.openshift won't get properly updated until after the holiday.

Comment 8 Gabe Montero 2016-01-05 21:17:50 UTC
OK, with assists from bparess, sdodson, and tdawson, we've got the jenkins rhel image now updated on ci.dev.openshift.redhat.com:5000 updated with v104 of the plugin.

Moving back to QA to attempt to verify the fix.

Comment 10 Gabe Montero 2016-01-07 14:57:50 UTC
Based on some code analysis on where "DeadlineExceeded" gets set, a pod ending up in "DeadlineExceeded" status appears to be an independent event to the OpenShift cancelling of deployments.  In fact, k8s manages to setting of that status whereas the OpenShift deployment controller manages the "cancelling".

And it certainly appears to be independent of the DeploymentCancelledAnnotation the jenkins plugin sets on the deployment, same as the oc cli, and not something either can prevent. 

I would then contend that with the absence of the Java stack trace, which was the true problem captured by this defect, we should mark this defect verified.

If you see this consistently, I would expect the same to occur with the `oc` command's version of cancelling the deployment.  Assuming that's true, we should open up a new issue with the Deployment team to confirm if "DeadlineExceeded" is occurring erroneously or not.

Comment 11 wewang 2016-01-08 03:04:39 UTC
@Gabe Montero, you are right, thx for your detail comments, I confirmed with coworkers and I tried to cancel deployment with oc cli, the status is "DeadlineExceeded", so I will verify the bug: 
[root@dhcp-128-91 test]# /home/2016/test/oc deploy frontend  --latest
Started deployment #3
[root@dhcp-128-91 test]# /home/2016/test/oc deploy frontend --cancel
Cancelled deployment #3
[root@dhcp-128-91 test]# oc get pods
NAME                READY     STATUS             RESTARTS   AGE
frontend-1-build    0/1       Completed          0          26m
frontend-2-deploy   0/1       DeadlineExceeded   0          3m
frontend-3-deploy   0/1       DeadlineExceeded   0          50s
frontend-3-gvler    1/1       Running            0          47s
jenkins-1-cir23     1/1       Running            0          27m

Comment 12 Gabe Montero 2016-01-08 14:09:30 UTC
@Wen Wang - sound great - thanks!!