Bug 1519034

Summary: Jenkins-2 with persistent storage breaks when rolling out update using image 67a1ed17c726
Product: OpenShift Container Platform Reporter: Ryan Howe <rhowe>
Component: ImageStreamsAssignee: Gabe Montero <gmontero>
Status: CLOSED ERRATA QA Contact: Dongbo Yan <dyan>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.7.0CC: aos-bugs, bparees, gmontero, jminter, jokerman, jupierce, mmccomas, sreber, xiuwang
Target Milestone: ---   
Target Release: 3.7.z   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Changes in the destination for the rpm install of the list of plugins included in the openshift jenkins rhel7 image changed between 3.6 and 3.7 Consequence: After an upgrade of the openshift jenkins rhel76 image from 3.6 to 3.7 when persistent volumes are maintained, jenkins would not start successfully, as it could not read in the needed plugin jpi/hpi files. Fix: Updates are made to the 3.7 openshift jenkins rhel7 image so that additional links are established so that the real jpi/hpi files can be found. Result: Jenkins in an OpenShift pod with PVs will come up successfully after a 3.6 to 3.7 upgrade.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-18 13:24:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ryan Howe 2017-11-29 23:34:50 UTC
Description of problem:

 Rolling out deploy from a previous Jenkins image pull, update to latest 67a1ed17c726 fails when using persistent storage mounted to /var/lib/jenkins 


Version-Release number of selected component (if applicable):
3.6  

How reproducible:
100%

Steps to Reproduce:

1.
Deployed a Jenkins image with persistent storage from the template using docker image ID 189862ac15ca

  https://access.redhat.com/containers/#/registry.access.redhat.com/openshift3/jenkins-2-rhel7/images/v3.6.173.0.5-5
      
2. Created some builds and deployments within the jenkins console.

3. Then updated the image to the latest v3.6 image we shipped docker image id 67a1ed17c726.  
   https://access.redhat.com/containers/#/registry.access.redhat.com/openshift3/jenkins-2-rhel7/images/v3.7.9-21


Actual results:
SEVERE: Failed Inspecting plugin /var/lib/jenkins/plugins/*
Caused by: src '/var/lib/jenkins/plugins/* doesn't exist 

Expected results:
For the deploy to work with the old Jenkins storage. 

Additional info:

Attaching logs

Comment 2 Ben Parees 2017-11-29 23:36:56 UTC
Pretty sure Jim just said he tested this and did *not* see an issue.

Comment 3 Jim Minter 2017-11-29 23:42:48 UTC
I was wrong (again).  I think this is pretty serious - could be that all upgrades where /var/lib/jenkins is persistent are broken.

I think the cause is that the old image sets up symlinks /var/lib/jenkins/plugins/*.jpi to /usr/lib64/jenkins/*.hpi ; in the new image, these files are under /usr/lib.

Comment 4 Jim Minter 2017-11-29 23:52:16 UTC
In registry.access.redhat.com/openshift3/jenkins-2-rhel7:v3.6*, there are individual RPMs per plugin, each placing the plugin in /usr/lib64.

In registry.access.redhat.com/openshift3/jenkins-2-rhel7:v3.7*, there's a single RPM jenkins-2-plugins-3.7.1510081324-1.el7 which places all the plugins in usr/lib.

Comment 5 Ryan Howe 2017-11-29 23:54:08 UTC
I tested with jenkins-2-rhel images using persistent storage. 

v3.6.173.0.49-5   d9f35e8f2b4d 
v3.6.173.0.5-5    189862ac15ca

Same result when creating content then updating and changing image tag in dc to v3.6 to pull image 67a1ed17c726

Comment 6 Ben Parees 2017-11-30 05:09:06 UTC
For this problem i think we need to fix the jenkins-plugin rpm to put the plugins in /usr/lib64 and rebuild the image.  That will at least solve the migration problem.

We'll need Justin's team to make that change, I believe.

Comment 7 Ben Parees 2017-11-30 13:43:36 UTC
We should also revisit why we are symlinking the plugins to the PV instead of copying them, because it would seem to set us up for this problem in the future if a plugin is removed from the image (since the symlink would now be broken again).

Gabe can you provide some background on why we're doing it this way/whether we can simply switch to copying them instead?  Was it just to save storage?

Comment 8 Jim Minter 2017-11-30 15:16:55 UTC
AFAICS, in the Jenkins CentOS image, unlike in the RHEL image, we copy the plugins.

Comment 9 Gabe Montero 2017-11-30 15:20:59 UTC
Ben - looks like the notion on linking the plugins from /usr/bin64 to /opt/openshift/plugins was first done with https://github.com/openshift/jenkins/pull/59 back in Dec of 2015.

When I did it, it was a part of adding our very first plugin, the openshift-pipeline, and getting it to work on rhel, because our jenkins s2i run script won't pick up the rpm unless it is in /opt/openshift/plugins (where it then copies from there to JENKINS_HOME/plugins).

There is no commentary in the pull as to why we agreed upon doing an ln vs cp.

Certainly space savings is an effect of the choice.  But also consider, if a customer installed a plugin rpm as a fix vehicle, vs. grabbing a new image, presumably the rpm update will stop at putting it in /usr/lib or /usr/lib64, and something else needs to copy it to /opt/openshift/plugins.  I don't remember precisely, but I suspect that influenced the decision as well.

If we switch to copy, then the run script needs to change to look in both places, and copy from the appropriate spot to JENKINS_HOME/plugins

Comment 10 Gabe Montero 2017-11-30 16:24:53 UTC
Ben, Jim, and I had a pow-wow ... going to switch to copy

completing investigation / coding / testing now.

Comment 14 Justin Pierce 2017-11-30 19:49:26 UTC
Wouldn't it be effective to to symlink /usr/lib64/jenkins -> /usr/lib/jenkins in the Jenkins Dockerfile? If I'm understanding correctly, this would fix old migration issues while not introducing a new one (i.e. someone using the current 3.7 GA images). It would also not require a new plugin RPM build & RPM advisory to ship.

Comment 15 Gabe Montero 2017-11-30 19:55:06 UTC
Hey Justin - we are doing which is effectively the same as that in https://github.com/openshift/jenkins/pull/425, though per direction from Ben we are getting out of the symlink business, going with cp instead.

And yes, with either approach, we shouldn't need new RPM build, etc.  Just a regen of the rhel image.

Comment 16 Gabe Montero 2017-11-30 22:11:11 UTC
after some more investigation, got agreement to to with Justin's approach per https://bugzilla.redhat.com/show_bug.cgi?id=1519034#c14 in https://github.com/openshift/jenkins/pull/425

Comment 17 Gabe Montero 2017-11-30 22:36:49 UTC
also have https://github.com/openshift/jenkins/pull/427 for the openshift/jenkins 3.7 branch

Comment 18 Gabe Montero 2017-12-01 00:44:54 UTC
and actually, we've employed both symlinks and copies ... see the PR for the gory details

Comment 19 Gabe Montero 2017-12-05 17:50:57 UTC
The requisite PRs have merged.

Next step will be to link up with Justin to cut the new 3.7 image, but Ben still has some PRs cooking for another bug related to the 3.7 image.  When those merge we'll go from there.

Comment 20 Gabe Montero 2017-12-05 20:19:59 UTC
PR https://github.com/openshift/jenkins/pull/431 is Ben's PR that I referred to in  Comment #19

Comment 22 XiuJuan Wang 2017-12-11 02:01:28 UTC
Jenkins pod could be running with persistent volume after migrate to new version(brew-pulp.../jenkins-2-rhel7:v3.7.13-1)

Comment 25 errata-xmlrpc 2017-12-18 13:24:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3464