Bug 1543916 - Pipeline builds do not get pruned correctly
Summary: Pipeline builds do not get pruned correctly
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RFE
Version: 3.9.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: 3.11.0
Assignee: Ben Parees
QA Contact: Wenjing Zheng
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-02-09 14:57 UTC by Corey Daley
Modified: 2018-12-21 15:16 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-12-21 15:16:27 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3607641 None None None 2018-09-10 00:15:52 UTC

Description Corey Daley 2018-02-09 14:57:42 UTC
Description of problem:
Pipeline builds do not get pruned according to the successfulBuildsHistoryLimit/failedBuildsHistoryLimit settings in the BuildConfig

Version-Release number of selected component (if applicable):
oc v3.9.0-alpha.4+65697ed-228
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://192.168.50.14:8443
openshift v3.9.0-alpha.4+65697ed-228
kubernetes v1.9.1+a0ce1bc657

How reproducible:
Everytime

Steps to Reproduce:
1. Create a Jenkins pipeline in OpenShift
2. Set the successfulBuildsHistoryLimit/failedBuildsHistoryLimit in the BuildConfig
3.  Run enough builds that the old builds should get pruned

Actual results:
Errors occur in the OpenShift logs stating that the system:serviceaccounts:openshift-infra account is forbidden from deleting the old builds.

Expected results:
The old builds should be pruned according to the settings in the BuildConfig

Additional info:

I0209 07:34:29.438274 31819 util.go:82] Pruning old build: cdaley/nodejs-sample-pipeline-1
I0209 07:34:29.438737 31819 rbac.go:116] RBAC DENY: user "system:serviceaccount:openshift-infra:build-config-change-controller" groups ["system:serviceaccounts" "system:serviceaccounts:openshift-infra" "system:authenticated"] cannot "delete" resource "builds.build.openshift.io" named "nodejs-sample-pipeline-1" in namespace "cdaley"
I0209 07:34:29.438848 31819 authorization.go:59] Forbidden: "/apis/build.openshift.io/v1/namespaces/cdaley/builds/nodejs-sample-pipeline-1", Reason: "User \"system:serviceaccount:openshift-infra:build-config-change-controller\" cannot delete builds.build.openshift.io in project \"cdaley\""

Comment 1 Corey Daley 2018-02-10 00:37:29 UTC
It looks like the builds never get pruned due to the following line(s):

https://github.com/openshift/origin/blob/master/pkg/build/controller/build/build_controller.go#L329

https://github.com/openshift/origin/blob/master/pkg/build/controller/build/build_controller.go#L378-L381

Basically if the build strategy is JenkinsPipelineStrategy we are relying on Jenkins to do all creating/updating/deletion of the job.

I believe the fix here is to update the openshift-client plugin to at least 3.0.0, and let Jenkins handle deleting the old builds based on the successfulBuildsHistoryLimit and failedBuildsHistoryLimit options in the BuildConfig, just to keep things consistent.

Comment 2 Ben Parees 2018-02-12 15:05:32 UTC
I don't think the "shouldIgnore" logic should apply to the build pruning logic.  We should prune pipeline builds.  The shouldignore was really intended to say "ignore this build in that we aren't going to create a build pod for it and monitor the pod state."

Comment 3 Ben Parees 2018-04-11 16:50:00 UTC
notes to self:  as i recall the hard problem here is that if the controller on the openshift side deletes the build, then the jenkins sync plugin may say "hey i have this jenkins job run and there's no corresponding build object, let me create one".

The solutions to that are either:

1) As corey suggested, have the sync plugin responsible for pruning pipeline builds
2) have the sync plugin be smart enough not to create a build object for job runs that are already completed (it may already be that smart, but that may also not be enough to close all the potential timing windows for getting this right..)

definitely having only a single entity responsible for creating/deleting pipeline builds is "safer" though possibly harder to implement and results in us having code in two places)

Corey if you have any other recollections around this, please add them.

Comment 4 Corey Daley 2018-04-11 17:41:35 UTC
Ben, 
Since we have the sync plugin deleting job runs if the associated openshift build is deleted, I don't think that we have the issue with jenkins recreating the builds, so it seems like it would be safe to have OpenShift prune pipeline builds and then the Sync plugin would clean up the Jenkins jobs.  Of course some tests should/would be created around this scenario.

Comment 5 Ben Parees 2018-04-11 18:14:11 UTC
> Since we have the sync plugin deleting job runs if the associated openshift build is deleted, I don't think that we have the issue with jenkins recreating the builds

my fear is timing between the sync plugin seeing the delete event, and the sync plugin seeing the build is missing.  I can (because i'm a pessimist) envision a case where a build is pruned, then a relist happens, the build is not in the list, the sync plugin starts a build for it, and then we see the delete event for the build and delete the job run.

Comment 6 Corey Daley 2018-04-11 22:34:17 UTC
We also need to update to the openshift-client 3.x which is being held up by https://github.com/fabric8io/kubernetes-client/issues/1046

Comment 7 Corey Daley 2018-08-07 13:35:34 UTC
The OpenShift client has now been updated to openshift-client 3.x

Comment 8 Ben Parees 2018-08-07 14:01:06 UTC
This is actually implemented right Corey?  Delivered in 3.11?

Comment 9 Corey Daley 2018-08-07 14:04:40 UTC
Ben, 
Yes, I was just tracking down the commit for it to post here.

This bug is fixed by https://github.com/openshift/origin/commit/37de5d244bf82e61e9d4d10bc913dbe44794a855

Comment 10 Ben Parees 2018-08-07 14:07:48 UTC
I'm going to throw this straight into ON_QA since i'm confident it's in a build at this point.

Comment 11 Wenjing Zheng 2018-08-14 06:26:00 UTC
Yes, pipeline build can be pruned now. 
Verified with below version:
registry.dev.redhat.io/openshift3/jenkins-2-rhel7@sha256:8b9cc096eaa54eafe905c79ad0b7b43a31137c73a51e91e3ee8e10e6f22734a1

Comment 15 Luke Meyer 2018-12-21 15:16:27 UTC
Closing bugs that were verified and targeted for GA but for some reason were not picked up by errata. This bug fix should be present in current 3.11 release content.


Note You need to log in before you can comment on or make changes to this bug.