Bug 1357752 - Should be able to cancel stages after jenkinsPipeLine build is canceled
Summary: Should be able to cancel stages after jenkinsPipeLine build is canceled
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OKD
Classification: Red Hat
Component: Build
Version: 3.x
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Gabe Montero
QA Contact: Wang Haoran
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-07-19 05:43 UTC by Xingxing Xia
Modified: 2016-12-09 21:52 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: openshift-pipeline was swallowing interrupted exceptions Consequence: jenkins jobs were not cancelling properly Fix: no longer swallow the exceptions ... throw them back to jenkins Result: users should be able to cancel jobs more expediently
Clone Of:
Environment:
Last Closed: 2016-12-09 21:52:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
jenkinspipeline_build_stages_not_cancelled (63.43 KB, image/png)
2016-07-19 05:45 UTC, Xingxing Xia
no flags Details
snapshot1 (56.08 KB, image/png)
2016-07-25 06:59 UTC, Xingxing Xia
no flags Details
snapshot2 (103.56 KB, image/png)
2016-07-25 07:03 UTC, Xingxing Xia
no flags Details
att1 (123.23 KB, image/png)
2016-08-17 09:54 UTC, Xingxing Xia
no flags Details
att2 (152.95 KB, image/png)
2016-08-17 09:54 UTC, Xingxing Xia
no flags Details

Description Xingxing Xia 2016-07-19 05:43:04 UTC
Description of problem:
After cancel jenkinsPipeLine build, the stages are still going on and finally complete.

Version-Release number of selected component (if applicable):
openshift v1.3.0-alpha.2+8d2b7b2
kubernetes v1.3.0+57fb9ac
etcd 2.3.0+git


How reproducible:
Always

Steps to Reproduce:
1. Preparation
$ oc create -f https://raw.githubusercontent.com/openshift/origin/master/examples/image-streams/image-streams-centos7.json -n openshift
$ oc create -f https://raw.githubusercontent.com/openshift/origin/master/examples/jenkins/pipeline/jenkinstemplate.json -n openshift
2. Create jenkinsPipeLine bc (and build)
$ oc new-app -f https://raw.githubusercontent.com/openshift/origin/master/examples/jenkins/pipeline/samplepipeline.json
3. Start build
$ oc start-build sample-pipeline
4. Cancel build, e.g. after 1st stage just begins running
$ oc cancel-build sample-pipeline-1
5. Check build and stages in CLI or web console 
1> $ oc get build sample-pipeline-1
2> Check build and deploy stages
$ oc get pod
3> Check in web console

Actual results:
5. 
1> The build is cancelled
NAME                  TYPE              FROM          STATUS      STARTED         DURATION
sample-pipeline-1     JenkinsPipeline                 Cancelled   6 minutes ago   2m3s
2> The build and deploy stages are still going on and finally complete.
NAME                        READY     STATUS      RESTARTS   AGE
...
frontend-2-358l7            1/1       Running     0          1m
frontend-2-zzkpt            1/1       Running     0          1m
ruby-sample-build-1-build   0/1       Completed   0          2m
...
3> See attachment

Expected results:
5.
2> and 3> The stages should be cancelled in cascade with the jenkinsPipeLine build

Additional info:

Comment 1 Xingxing Xia 2016-07-19 05:45:33 UTC
Created attachment 1181411 [details]
jenkinspipeline_build_stages_not_cancelled

Comment 2 Ben Parees 2016-07-19 16:43:51 UTC
I've also opened this as a github issue against the sync plugin:
https://github.com/fabric8io/openshift-jenkins-sync-plugin/issues/96

Comment 3 Xingxing Xia 2016-07-25 06:59:22 UTC
Created attachment 1183595 [details]
snapshot1

Comment 4 Xingxing Xia 2016-07-25 07:03:49 UTC
Created attachment 1183598 [details]
snapshot2

Comment 5 Xingxing Xia 2016-07-25 07:06:13 UTC
Tested with:
openshift v1.3.0-alpha.2+5c862c0
kubernetes v1.3.0+57fb9ac
etcd 2.3.0+git

See results in attachments:
In snapshot1, cancel the pipeline. In snapshot2, the left stages still go ahead.

Comment 6 Xingxing Xia 2016-07-25 07:25:10 UTC
BTW, the version in comment 5 is latest AMI devenv-rhel7_4656

Comment 7 Ben Parees 2016-08-16 18:13:42 UTC
Xingxing is this still an issue? it looks like the associated github issue was closed based on feedback from you:

https://github.com/fabric8io/openshift-jenkins-sync-plugin/issues/96

https://trello.com/c/ntM08mDF/868-8-pipeline-job-synchronizer#comment-5791eecd3aa58fb35923b878

"@jdyson Yes, cancel build in jenkins webconsole will waste a while but will sync to openshift cli successfully."

Comment 8 Xingxing Xia 2016-08-17 09:44:51 UTC
Hi Ben Parees,

(In reply to Ben Parees from comment #7)
> Xingxing is this still an issue? it looks like the associated github issue
> was closed based on feedback from you:

It was not feedback from me, but that doesn't matter :) (seems was from xiuwang).

I tested in devenv-rhel7_4849 today. Still reproduced. But found extra info:
The cancellation of pipeline build cannot stop the build stage, as reported above. But if the cancellation is issued during, e.g. a step "sleep 30", the cancellation can stop the build stage (and the whole pipeline, of course).

Following are details:
After `oc new-app -f https://raw.githubusercontent.com/openshift/origin/master/examples/jenkins/pipeline/samplepipeline.json` and the jenkins pod becomes running, try to edit bc sample-pipeline like:
node('maven') {
stage 'build'
sleep 30
openshiftBuild(buildConfig: 'ruby-sample-build', showBuildLogs: 'true')
stage 'deploy'
sleep 30
openshiftDeploy(deploymentConfig: 'frontend')
}

Then try:
A. Cancel the pipeline build when it comes at "sleep 30", then the whole job is stopped (not just showing "Cancelled"), see att1. 

Its logs on jenkins web console are:

OpenShift Build xxia-proj/sample-pipeline-3
[Pipeline] node
Running on maven-77b8ed73799 in /tmp/workspace/sample-pipeline
[Pipeline] {
[Pipeline] stage (build)
Entering stage build
Proceeding
[Pipeline] sleep
Aborted by Jenkins Admin
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
Finished: ABORTED

B. (Same as comment 0 in fact, just with jenkins logs pasted) Cancel the pipeline build when it is building ruby-sample-build, then the whole pipeline build is not stopped (though it shows "Cancelled" in CLI / OpenShift web console), see att2.

Its logs on jenkins web console are:
OpenShift Build xxia-proj/sample-pipeline-6
[Pipeline] node
Running on maven-a236ecd0f04 in /tmp/workspace/sample-pipeline
[Pipeline] {
[Pipeline] stage (build)
Entering stage build
Proceeding
[Pipeline] sleep
[Pipeline] openshiftBuild


Starting the "Trigger OpenShift Build" step with build config "ruby-sample-build" from the project "xxia-proj".
  Started build "ruby-sample-build-4" and waiting for build completion ...
Downloading "https://github.com/openshift/ruby-hello-world.git" ...

---> Installing application source ...
---> Building your Ruby application from source ...
---> Running 'bundle install --deployment' ...

Fetching gem metadata from https://rubygems.org/..........

Installing rake 10.3.2
Installing i18n 0.6.11

Installing json 1.8.3

Installing minitest 5.4.2
Installing thread_safe 0.3.4

Installing tzinfo 1.2.2
Installing activesupport 4.1.7

Installing builder 3.2.2
Installing activemodel 4.1.7
Installing arel 5.0.1.20140414130214

Installing activerecord 4.1.7

Installing mysql2 0.3.16

Installing rack 1.5.2
Installing rack-protection 1.5.3

Installing tilt 1.4.1
Installing sinatra 1.4.5

Installing sinatra-activerecord 2.0.3
Using bundler 1.7.8
Your bundle is complete!
It was installed into ./bundle
---> Cleaning up unused ruby gems ...

Aborted by anonymous
Aborted by anonymous
Running post commit hook ...

/opt/rh/rh-ruby22/root/usr/bin/ruby -I"lib" -I"/opt/app-root/src/bundle/ruby/gems/rake-10.3.2/lib" "/opt/app-root/src/bundle/ruby/gems/rake-10.3.2/lib/rake/rake_test_loader.rb" "test/*_test.rb" 

Run options: --seed 47730
# Running:
.
Finished in 0.002358s, 424.1641 runs/s, 424.1641 assertions/s.
1 runs, 1 assertions, 0 failures, 0 errors, 0 skips
Pushing image 172.30.56.231:5000/xxia-proj/origin-ruby-sample:latest ...
Pushed 3/10 layers, 30% complete
Pushed 4/10 layers, 40% complete
Pushed 5/10 layers, 50% complete
Pushed 6/10 layers, 60% complete
Pushed 7/10 layers, 70% complete
Pushed 8/10 layers, 80% complete
Pushed 9/10 layers, 90% complete

Pushed 10/10 layers, 100% complete
Push successful



Exiting "Trigger OpenShift Build" successfully; build "ruby-sample-build-4" has completed with status:  [Complete].
[Pipeline] stage (deploy)
Entering stage deploy
Proceeding
[Pipeline] sleep
Click here to forcibly terminate running steps
Click here to forcibly terminate running steps
[Pipeline] openshiftDeploy


Starting "Trigger OpenShift Deployment" with deployment config "frontend" from the project "xxia-proj".


Exiting "Trigger OpenShift Deployment" successfully; deployment "frontend-8" has completed with status:  [Complete].
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
Finished: ABORTED

Comment 9 Xingxing Xia 2016-08-17 09:54:09 UTC
Created attachment 1191552 [details]
att1

Comment 10 Xingxing Xia 2016-08-17 09:54:28 UTC
Created attachment 1191553 [details]
att2

Comment 11 Jimmi Dyson 2016-10-31 13:17:18 UTC
I wonder if this is related to https://issues.jenkins-ci.org/browse/JENKINS-34637. We should try with updated pipeline plugins.

Comment 12 Jimmi Dyson 2016-11-01 15:16:34 UTC
I've tried to fix this, but can't seem to. I've tried to force abort the job via the Jenkins console but that doesn't work either so it's unrelated to the sync plugin. There are two possibilities:

1. It's a jenkins bug that you can't force close build steps.
2. The build steps have to somehow be abortable. I'm not sure if this is possible but it would seem remiss if there was no way to interrupt any generic step.

Comment 13 Ben Parees 2016-11-01 15:18:09 UTC
@jimmi can you open a bug against jenkins/pipeline for this and we'll see what they say?

Comment 14 Jimmi Dyson 2016-11-02 14:26:54 UTC
So I've done some more digging & it looks like it's the build steps swallowing InterruptedExceptions. Ultimately the openshiftBuild step runs in a separate thread. When a build is cancelled that thread is interrupted & in Java it is the responsibility of the running thread to notice that it has been interrupted by regularly checking Thread.currentThread().isInterrupted() or by catching InterruptedException if sleeping. Both of these indicate that the step should be interrupted (cancelled) & cleanup should take place. As you can see in IOpenShiftBuilder.waitOnBuild (https://github.com/openshift/jenkins-plugin/blob/master/src/main/java/com/openshift/jenkins/plugins/pipeline/model/IOpenShiftBuilder.java#L111-L171) InterruptedExceptions are swallowed & not handled properly, plus there are no checks for isInterrupted in the loops. Without this there is no way for the step to be cancelled.

@bparees As this isn't something to do with the sync plugin I'd like this to be reassigned appropriately please. I'm also keeping https://github.com/fabric8io/openshift-jenkins-sync-plugin/issues/96 closed as it is not related to sync plugin.

Comment 15 Ben Parees 2016-11-02 14:33:10 UTC
@jimmi cool, thanks for the investigation.

Comment 16 Gabe Montero 2016-11-02 15:46:18 UTC
The question will be whether it is worth driving this down to the openshift-restclient-java level.

@Jeff Cantril - any benefit to the eclipse client in adding an onInterrupted() method to com.openshift.restclient.capability.resources.IPodLogRetrievalAsync.IPodLogListener ?

Comment 17 Gabe Montero 2016-11-02 15:47:33 UTC
Forgot to cc: Jeff beforehand :-)

@Jeff Cantril - see my question in #Comment 16 - thanks

Comment 18 Gabe Montero 2016-11-02 15:56:03 UTC
Realized I should clarify ... I'll be adding the protection in our calls to sleep minimally ... just curious if we want to address further down the stack as well.

Comment 19 Gabe Montero 2016-11-02 17:13:46 UTC
Jeff and I talked on IRC ... bottom line, "no" to surfacing this semantic in the rest client.

Long term, as part of pipeline-plugin 2.0, we'll look at leveraging an restclient watch with a future for the max wait condition.

Comment 20 Gabe Montero 2016-11-02 20:07:22 UTC
The pipeline-plugin has been updated via https://github.com/openshift/jenkins-plugin/pull/106

v1.0.30 is being cut on the jenkins download center as I type.

When ready, we'll merge https://github.com/openshift/jenkins/pull/182

I'll move this bug to on qa when the jenkins centos images on docker hub are updated.

Comment 21 Gabe Montero 2016-11-03 01:32:25 UTC
the jenkins-1-centos7 image is updated with v1.0.30 of the pipeline plugin and is up on docker hub.

there has been an issue pushing the jenkins-2-centos7 image from ci.openshift.

i'll track whether that changes and update this bug accordingly, but QE should be able to verify using the jenkins-1-centos7 image, so moving the bug to their attention.

Comment 22 Dongbo Yan 2016-11-03 10:17:19 UTC
Test with docker.io/openshift/jenkins-1-centos7@sha256:34c35866bb6dc9ddfbe098b35590313d0b3a1774e22ff716f5126b39d97be3da
openshift-pipeline	1.0.30
openshift-sync	0.0.14

openshift v3.4.0.19+346a31d
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0

Actual result:
Cancel pipeline build, nodejs-mongodb-example-1 build is also cancelled, but delay several minutes.
$oc get build
nodejs-mongodb-example-1   Source            Git@69b359b   Cancelled   4 minutes ago    1m30s
nodejs-mongodb-example-2   Source            Git           Cancelled   2 minutes ago    
nodejs-mongodb-example-3   Source            Git     Cancelled                   
sample-pipeline-1          JenkinsPipeline                 Cancelled   2 minutes ago    1s

Comment 23 Jimmi Dyson 2016-11-04 17:01:12 UTC
FYI I've also added some more cancellation stuff into the sync plugin so even if steps don't handle the cancellation properly the Jenkins build will still be cancelled. This does mean that async steps that e.g. start other OpenShift builds will continue but that the Jenkins build itself will be cancelled as will the OpenShift build that caused the Jenkins build to be triggered.

Comment 24 Ben Parees 2016-11-04 17:02:41 UTC
That sounds great, thanks Jimmi!


Note You need to log in before you can comment on or make changes to this bug.