Bug 1537317

Summary: Jenkins build takes a long time to start when freshly deployed
Product: OpenShift Container Platform Reporter: Øystein Bedin <obedin>
Component: BuildAssignee: Gabe Montero <gmontero>
Status: CLOSED WORKSFORME QA Contact: Wenjing Zheng <wzheng>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.7.1CC: aos-bugs, bparees, obedin
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-01-23 19:23:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Øystein Bedin 2018-01-22 22:52:20 UTC
Description of problem:
When including a JenkinsPipeline in the project (together with another BC), it will successfully deploy the Jenkins image and eventually start the build - but it takes a *long* time before it kicks in to build the actual source after Jenkins is ready (i.e.: 5-10min+++). 


Version-Release number of selected component (if applicable):
OpenShift v3.7.x with Jenkins (probably also there in older releases)


How reproducible:
100%


Steps to Reproduce:
1. Deploy a template with both a JenkinsPipeline as well as a regular source BC
2. Wait for Jenkins to deploy / become operational
3. Observe that it takes 5-10 min +++ from Jenkins is ready before the actual build of the source starts - i.e.: under "Build >> Pipelines" in OpenShift.


Actual results:
See above


Expected results:
The build of the source BC should start as soon as Jenkins is ready / operational. 


Additional info:
We use this feature as part of a demo environment, and the extra 5-10min is an awkward portion to talk to - i.e.: the first build can be cancelled, and a new one can be kicked off, but the audience wonders why that works this way since the first build should just kick off automatically as soon as Jenkins is ready.

Comment 1 Ben Parees 2018-01-22 23:00:25 UTC
I think some fixes were done in the sync plugin for this problem, but I would have expected them to be in the 3.7 image already.

Comment 2 Øystein Bedin 2018-01-22 23:05:51 UTC
One of our main systems is with this issue is running OpenShift v3.7.14 (and Jenkins 2.46.3, OpenShift Sync Plugin 0.1.24). Please let us know what version(s) we potentially need to upgrade to, or if there's anything else we can do to check if we have those fixes.

Comment 3 Øystein Bedin 2018-01-23 03:27:03 UTC
Andy Block found the following (hardcoded) 5 min "poll". This seems very much aligned with our experience as most builds starts around 5-7 min in. Can we make this a configurable parameter, if nothing else?

https://github.com/openshift/jenkins-sync-plugin/blob/master/src/main/java/io/fabric8/jenkins/openshiftsync/BaseWatcher.java#L52-L57

Comment 4 Ben Parees 2018-01-23 17:22:02 UTC
We do a watch on the relevant resources, that interval is not a poll, it's the period at which we do a full resync.

Again I think you're hitting a bug where we did not process the original watch event properly, thus you'd have to wait for a resync interval, but when things are  working properly (as they are w/ the bug fix) that is not the case.

Comment 5 Gabe Montero 2018-01-23 17:53:38 UTC
Yeah there was a similar problem reported in October that was fixed with aef662b9b7869cc8f82c737b0bda31fa567abbbf in https://github.com/openshift/jenkins-sync-plugin

Typically it stems from the BC events for the watch arriving before the build events.  The above commit initiates an immediate fetch of the builds when the build config watch event arrives, to expedite / bypass the default relist interval.

But yes that should have gotten into v3.7 ... the corresponding sync plugin version was v0.1.32

Øystein - could you at least confirm you are at that version of the sync plugin with your 3.7 image?  I just want to make sure there are not some earlier 3.7 images out there.

Next, as this is a timing bug, I am not able to reproduce it with my cluster.  The timing of the watch events and when the build is getting scheduled is simply different for me.  

So could you also reproduce again and provide:

1) the jenkins master logs
2) the output of `oc get events` from the namespace in question after this occurs
3) the output from oc get <build name in question> -o yaml and oc get <build config name in questino> -o yaml

At the moment, that fetch after the bc event is a one time fetch ... I'm wondering if it needs to have some retry to it if no builds are initially found.

Comment 6 Øystein Bedin 2018-01-23 18:04:50 UTC
Thanks for the detailed info, Gabe. As mentioned above, the sync plugin is on version 0.1.24, so it sounds like it is too old. Our v3.7 cluster was installed a couple of weeks ago (7th or 8th of January), but maybe this was made available after that? I'll investigate and see what we can do to get it updated and report back if this is still a problem. 

Thanks for your support.

Comment 7 Øystein Bedin 2018-01-23 18:08:08 UTC
I checked another cluster that was installed a week or so later, and it has version 0.1.32. We'll get the first cluster updated and report back.

Comment 8 Gabe Montero 2018-01-23 18:36:30 UTC
Ah - sorry - I missed the version being noted in https://bugzilla.redhat.com/show_bug.cgi?id=1537317#c2

Yeah let's see what happens when you bump the version with the cluster in question.  If there is still an issue, let's then get the three items I noted.

I'll hold on the bugzilla for now, give you all some time to give it a go and report back.

Comment 9 Øystein Bedin 2018-01-23 19:07:52 UTC
@Gabe - we have validated that the later version of the plugin works much better - thank you!!

BTW: The v3.7 install done 16 days ago had the 'latest' tag in the ImageStream, so it had to be updated to 'v3.7' and re-import the images to make it work. 

This issue can be closed. Thanks for your support and work on this!

Comment 10 Gabe Montero 2018-01-23 19:23:01 UTC
Thanks for the quick update Øystein

I'll go ahead and closed this out as already fixed 

on you image stream tag note ... it may be a question of versions of installers and existing image stream defs, but moving forward, the images stream latest tag will have a qualified tag to the docker image

For example, for our upcoming 3.9 release, we've got:

https://github.com/openshift/origin/blob/master/examples/image-streams/image-streams-centos7.json#L1083-L1123

and 

https://github.com/openshift/origin/blob/master/examples/image-streams/image-streams-rhel7.json#L983-L1025