Hide Forgot
Description of problem: Pipeline builds do not get pruned according to the successfulBuildsHistoryLimit/failedBuildsHistoryLimit settings in the BuildConfig Version-Release number of selected component (if applicable): oc v3.9.0-alpha.4+65697ed-228 kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://192.168.50.14:8443 openshift v3.9.0-alpha.4+65697ed-228 kubernetes v1.9.1+a0ce1bc657 How reproducible: Everytime Steps to Reproduce: 1. Create a Jenkins pipeline in OpenShift 2. Set the successfulBuildsHistoryLimit/failedBuildsHistoryLimit in the BuildConfig 3. Run enough builds that the old builds should get pruned Actual results: Errors occur in the OpenShift logs stating that the system:serviceaccounts:openshift-infra account is forbidden from deleting the old builds. Expected results: The old builds should be pruned according to the settings in the BuildConfig Additional info: I0209 07:34:29.438274 31819 util.go:82] Pruning old build: cdaley/nodejs-sample-pipeline-1 I0209 07:34:29.438737 31819 rbac.go:116] RBAC DENY: user "system:serviceaccount:openshift-infra:build-config-change-controller" groups ["system:serviceaccounts" "system:serviceaccounts:openshift-infra" "system:authenticated"] cannot "delete" resource "builds.build.openshift.io" named "nodejs-sample-pipeline-1" in namespace "cdaley" I0209 07:34:29.438848 31819 authorization.go:59] Forbidden: "/apis/build.openshift.io/v1/namespaces/cdaley/builds/nodejs-sample-pipeline-1", Reason: "User \"system:serviceaccount:openshift-infra:build-config-change-controller\" cannot delete builds.build.openshift.io in project \"cdaley\""
It looks like the builds never get pruned due to the following line(s): https://github.com/openshift/origin/blob/master/pkg/build/controller/build/build_controller.go#L329 https://github.com/openshift/origin/blob/master/pkg/build/controller/build/build_controller.go#L378-L381 Basically if the build strategy is JenkinsPipelineStrategy we are relying on Jenkins to do all creating/updating/deletion of the job. I believe the fix here is to update the openshift-client plugin to at least 3.0.0, and let Jenkins handle deleting the old builds based on the successfulBuildsHistoryLimit and failedBuildsHistoryLimit options in the BuildConfig, just to keep things consistent.
I don't think the "shouldIgnore" logic should apply to the build pruning logic. We should prune pipeline builds. The shouldignore was really intended to say "ignore this build in that we aren't going to create a build pod for it and monitor the pod state."
notes to self: as i recall the hard problem here is that if the controller on the openshift side deletes the build, then the jenkins sync plugin may say "hey i have this jenkins job run and there's no corresponding build object, let me create one". The solutions to that are either: 1) As corey suggested, have the sync plugin responsible for pruning pipeline builds 2) have the sync plugin be smart enough not to create a build object for job runs that are already completed (it may already be that smart, but that may also not be enough to close all the potential timing windows for getting this right..) definitely having only a single entity responsible for creating/deleting pipeline builds is "safer" though possibly harder to implement and results in us having code in two places) Corey if you have any other recollections around this, please add them.
Ben, Since we have the sync plugin deleting job runs if the associated openshift build is deleted, I don't think that we have the issue with jenkins recreating the builds, so it seems like it would be safe to have OpenShift prune pipeline builds and then the Sync plugin would clean up the Jenkins jobs. Of course some tests should/would be created around this scenario.
> Since we have the sync plugin deleting job runs if the associated openshift build is deleted, I don't think that we have the issue with jenkins recreating the builds my fear is timing between the sync plugin seeing the delete event, and the sync plugin seeing the build is missing. I can (because i'm a pessimist) envision a case where a build is pruned, then a relist happens, the build is not in the list, the sync plugin starts a build for it, and then we see the delete event for the build and delete the job run.
We also need to update to the openshift-client 3.x which is being held up by https://github.com/fabric8io/kubernetes-client/issues/1046
The OpenShift client has now been updated to openshift-client 3.x
This is actually implemented right Corey? Delivered in 3.11?
Ben, Yes, I was just tracking down the commit for it to post here. This bug is fixed by https://github.com/openshift/origin/commit/37de5d244bf82e61e9d4d10bc913dbe44794a855
I'm going to throw this straight into ON_QA since i'm confident it's in a build at this point.
Yes, pipeline build can be pruned now. Verified with below version: registry.dev.redhat.io/openshift3/jenkins-2-rhel7@sha256:8b9cc096eaa54eafe905c79ad0b7b43a31137c73a51e91e3ee8e10e6f22734a1
Closing bugs that were verified and targeted for GA but for some reason were not picked up by errata. This bug fix should be present in current 3.11 release content.