Bug 2066019

Summary: Latest ose-jenkins-agent-base:v4.9.0 image fails to start on OpenShift due to FIPS error
Product: OpenShift Developer Tools and Services Reporter: jamieclinton
Component: JenkinsAssignee: Akram Ben Aissi <abenaiss>
Status: CLOSED EOL QA Contact: Jitendar Singh <jitsingh>
Severity: high Docs Contact: Rolfe Dlugy-Hegwer <rdlugyhe>
Priority: high    
Version: 4.8CC: aos-bugs, cdaley, dkarde, gmontero, jdelft, jitsingh, spandura
Target Milestone: ---   
Target Release: 4.11   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
* This enhancement adds a new Jenkins environment variable, `JAVA_FIPS_OPTIONS`, that controls how the JVM operates when running on a FIPS node. For more information, see link:https://access.redhat.com/documentation/en-us/openjdk/11/html-single/configuring_openjdk_11_on_rhel_with_fips/index#config-fips-in-openjdk[OpenJDK support article] (link:https://bugzilla.redhat.com/show_bug.cgi?id=2066019[BZ#2066019])
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-02-07 15:34:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2075135    

Description jamieclinton 2022-03-20 02:13:10 UTC
Description of problem:
I tried to deploy the latest ose-jenkins-agent-base:v4.9.0 image from the Red Hat catalog on our OpenShift 4.8 cluster: https://catalog.redhat.com/software/containers/openshift4/ose-jenkins-agent-base/5cdd8e2fbed8bd5717d66e77?tag=v4.9.0-202203081819.p0.gaf84740.assembly.stream&push_date=1647430675000

A previous build of the ose-jenkins-agent-base:v4.9.0 image (from several weeks ago) works just fine, but with this latest image, the pod fails to come up and terminates with the following error in the logs:

2022/03/20 01:23:16 [go-init] No pre-start command defined, skip
2022/03/20 01:23:16 [go-init] Main command launched : /usr/local/bin/run-jnlp-client
/bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
alternatives version 1.13 - Copyright (C) 2001 Red Hat, Inc.
This may be freely redistributed under the terms of the GNU Public License.usage: alternatives --install <link> <name> <path> <priority>
                    [--initscript <service>]
                    [--family <family>]
                    [--slave <slave_link> <slave_name> <slave_path>]*
       alternatives --remove <name> <path>
       alternatives --auto <name>
       alternatives --config <name>
       alternatives --display <name>
       alternatives --set <name> <path>
       alternatives --list
       alternatives --remove-all <name>
       alternatives --add-slave <name> <path> <slave_link> <slave_name> <slave_path>
       alternatives --remove-slave <name> <path> <slave_name>common options: --verbose --test --help --usage --version --keep-missing --keep-foreign
                --altdir <directory> --admindir <directory>
OPENSHIFT_JENKINS_JVM_ARCH='', CONTAINER_MEMORY_IN_MB='8796093022207', using /usr/lib/jvm/java-11-openjdk-11.0.14.1.1-1.el8_4.x86_64/bin/java
Downloading http://172.30.132.115:80//jnlpJars/remoting.jar ...
+ cd
+ exec java -Duser.home=/home/jenkins -XX:+UseParallelGC -XX:MinHeapFreeRatio=5 -XX:MaxHeapFreeRatio=10 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -cp /home/jenkins/remoting.jar hudson.remoting.jnlp.Main -headless -url http://172.30.132.115:80/ -tunnel 172.30.229.180:50000 ba5e5d3bdf36505e05545d12ffb2df14cd15f450dabde9235759451e2eabcd2a devops-pipeline-sghgd
Mar 20, 2022 1:23:16 AM hudson.remoting.jnlp.Main createEngine
INFO: Setting up agent: devops-pipeline-sghgd
Mar 20, 2022 1:23:16 AM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Mar 20, 2022 1:23:16 AM hudson.remoting.Engine startEngine
INFO: Using Remoting version: 4.10.1
Mar 20, 2022 1:23:16 AM hudson.remoting.Engine startEngine
WARNING: No Working Directory. Using the legacy JAR Cache location: /home/jenkins/.jenkins/cache/jars
Mar 20, 2022 1:23:16 AM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: FIPS mode: only SunJSSE TrustManagers may be used
java.security.KeyManagementException: FIPS mode: only SunJSSE TrustManagers may be used
    at java.base/sun.security.ssl.SSLContextImpl.chooseTrustManager(SSLContextImpl.java:133)
    at java.base/sun.security.ssl.SSLContextImpl.engineInit(SSLContextImpl.java:95)
    at java.base/javax.net.ssl.SSLContext.init(SSLContext.java:297)
    at hudson.remoting.Engine.run(Engine.java:535)2022/03/20 01:23:17 [go-init] Main command failed
2022/03/20 01:23:17 [go-init] exit status 255
2022/03/20 01:23:17 [go-init] No post-stop command defined, skip


Version-Release number of selected component (if applicable):

ose-jenkins-agent-base:v4.9.0
OpenShift 4.8.22


How reproducible:
Always

Steps to Reproduce:
1. 
# process template to configure build config and image stream
# This will build an image from a Dockerfile based on ose-jenkins-agent-base:v4.9.0
oc process -f "$DIR/template.yaml" \
  -p GIT_REF="$GIT_REF" \
  -p IMAGE_REGISTRY="$IMAGE_REGISTRY" \
  -p REGISTRY_SECRET="$REGISTRY_SECRET" \
  -p IMAGE_TAG="$IMAGE_TAG" | oc apply -f -
2. # build the image
oc start-build $DEVOPS_IMAGE_BC_NAME

3. # update image stream
oc tag ${IMAGE_REGISTRY}/jenkins-devops-pipeline-base:${IMAGE_TAG} jenkins-devops-pipeline-base:${IMAGE_TAG}

4. Try to launch the agent from a Jenkins pipeline

Actual results:
Agent fails to launch, terminating with SEVERE FIPS error.

Expected results:
Agent should work, just like it does with a previous build of ose-jenkins-agent-base:v4.9.0


Additional info:

Comment 1 jamieclinton 2022-03-23 02:28:57 UTC
Why is it that nobody has responded to this bug yet? It appears as though the latest image of ose-jenkins-agent-base:v4.8.0 does not run in a FIPS-enabled OpenShift cluster. This is a big problem for us. Please try to recreate this bug on your end.

Thanks,
  Jamie

Comment 2 jamieclinton 2022-03-23 02:31:06 UTC
Correction, v4.9.0.

Comment 3 Gabe Montero 2022-03-24 19:03:56 UTC
So I have confirmed by bringing up "more recent versions of OCP" (I've tried 4.9 and the 4.11 currently under development) that with FIPS on for a cluster, both to start up Jenkins in general, as well as to launch agent based pods, you need to employ some form of https://access.redhat.com/documentation/en-us/openjdk/11/html-single/configuring_openjdk_11_on_rhel_with_fips/index#config-fips-in-openjdk in order to get things to come up.

Employing the simplest of the options, setting the system property -Dcom.redhat.fips=false on both the Jenkins pod/container's JVM args, as well as the same for the PodTemplate I used for the agent based build, I could get things to work.

Presumably someone could also set up the java trust store with a valid FIPS cert config, as is indicated in the openJDK article I noted above, and get things to work as well, and avoid the exception the customer is seeing.

I have had trouble getting 4.8 clusters up today, so as of yet, have not been able to clearly discover the reason why "it works" on 4.8, but I would be shocked if it is not a JDK version difference of some sort.

I will report back when I have been able to test on 4.8.

Comment 4 Gabe Montero 2022-03-24 19:18:57 UTC
The other possible difference I can think of, prior to actual attempts on 4.8, is differences in the PodTemplates used with the 4.8 image vs. the 4.9 image.  But we can dive into that if my 4.8 testing results drive us in that direction.

Comment 6 Gabe Montero 2022-03-24 21:21:58 UTC
OK, using the most recent 4.8 level, 4.8.35, I still see the same issues I saw on 4.9 and 4.11, where I need to set the JVM arg -Dcom.redhat.fips=false on both the jenkins pod as well as the any agent pods launched from PodTemplates in order 
to avoid the FIPS related exception reported in this BZ's description.

I also found the customer logs attached to the support case (probably since support did not open the BZ, they were not attached here as well ?), and I did see a JVM version difference between the customer's jenkins and what was present 
in my 4.8.35 based jenkins pod.

4.8.35 had /usr/lib/jvm/java-11-openjdk-11.0.14.1.1-1.el8_4.x86_64/bin/java and /usr/lib/jvm/java-11-openjdk-11.0.14.1.1-1.el8_4.x86_64/bin/javac

The customer's had /usr/lib/jvm/java-11-openjdk-11.0.13.0.8-1.el8_4.x86_64/bin/java and /usr/lib/jvm/java-11-openjdk-11.0.13.0.8-1.el8_4.x86_64/bin/javac

I have not sense at this time how disparate those openjdk version are, but it is "something".

I see in the description the customer is at 4.8.22.  I will now deploy that, and see what happens.

Comment 7 Gabe Montero 2022-03-24 22:30:05 UTC
And sure enough, at 4.8.22, my jenkins and agent pods were at /usr/lib/jvm/java-11-openjdk-11.0.13.0.8-1.el8_4.x86_64/bin/java and /usr/lib/jvm/java-11-openjdk-11.0.13.0.8-1.el8_4.x86_64/bin/javac
and I did not have to set -Dcom.redhat.fips=false in order for either of them to work properly on my FIPS cluster.

OK, so here is how we are moving forward:

- Dipak Karde, I have added a needinfo for you.  In your role as support, I want you to open an item associated with the customer case here on the openjdk team's queue, and see if they can explain why JVMs at 11.0.13.0.8-1 can run on FIPS clusters with disabling FIPS or setting up a JVM friendly FIPS trust store in some way per https://access.redhat.com/documentation/en-us/openjdk/11/html-single/configuring_openjdk_11_on_rhel_with_fips/index#config-fips-in-openjdk but we need such settings at 11.0.14.1.1-1. It would be good to know what exactly is going on here, is it expected, is there a bug in 11.0.14.1.1-1, whatever.

- In the interim, I will be updating our images, templates, pod templates to default to -Dcom.redhat.fips=false but allow for override via environment variables.

- I may change the default to something else if we get an answer from openjdk that steers us in such a direction.

- Jamie Clinton:  if you want to set -Dcom.redhat.fips=false on your own in the interim in your PodTemplate, it is fairly straight forward to do that from the Jenkins UI (add it to the front of the args before the computer.jnlp related stuff for example).  But if you need pointers let us know.
Also, if you upgrade your jenkins pod to one of these newer levels, you'll need to add the JENKINS_JAVA_OPTIONS env var on your jenkins container and set it to -Dcom.redhat.fips=false

Comment 8 jamieclinton 2022-03-25 21:32:57 UTC
Hi Gabe,

I was able to get the new ose-jenkins-agent-base:v4.9.0 image to run successfully by adding the JAVA_TOOL_OPTIONS env var to the PodTemplate yaml file and setting the value to -Dcom.redhat.fips=false.

                <org.csanchez.jenkins.plugins.kubernetes.model.KeyValueEnvVar>
                  <key>JAVA_TOOL_OPTIONS</key>
                  <value>-Dcom.redhat.fips=false</value>
                </org.csanchez.jenkins.plugins.kubernetes.model.KeyValueEnvVar>

Thanks for the workaround!

Jamie

Comment 16 Gabe Montero 2022-04-12 14:49:31 UTC
*** Bug 2065519 has been marked as a duplicate of this bug. ***

Comment 17 Jitendar Singh 2022-04-13 10:13:12 UTC
verified
=============
provisioned 4.11 cluster with private-templates/functionality-testing/aos-4_11/ipi-on-aws/versioned-installer-fips flexy template

created jenkins using template

created and triggered builds using sidecar agent images

started 20 build and all builds completed succesfully

checked jenkins master pod log and found "-Dcom.redhat.fips=false"  in the below section which is desired behavior as per the PR

+ exec java -XX:+UseParallelGC -XX:MinHeapFreeRatio=5 -XX:MaxHeapFreeRatio=10 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -Xmx512m -Dfile.encoding=UTF8 -Djavamelody.displayed-counters=log,error -Djava.util.logging.config.file=/var/lib/jenkins/logging.properties -Djavax.net.ssl.trustStore=/var/lib/jenkins/ca-anchors-keystore "-Dcom.redhat.fips=false" -Djdk.http.auth.tunneling.disabledSchemes= -Djdk.http.auth.proxying.disabledSchemes= -Duser.home=/var/lib/jenkins -Djavamelody.application-name=jenkins -Dhudson.security.csrf.GlobalCrumbIssuerConfiguration.DISABLE_CSRF_PROTECTION=true -Djenkins.install.runSetupWizard=false -jar /usr/lib/jenkins/jenkins.war
Picked up JAVA_TOOL_OPTIONS: -XX:+UnlockExperimentalVMOptions -Dsun.zip.disableMemoryMapping=true

Comment 18 Gabe Montero 2022-04-13 17:13:23 UTC
the openjdk team explained why the behavior changed in https://issues.redhat.com/browse/OPENJDK-629?focusedCommentId=20095046&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-20095046 ... basically they had to address a bug with running openjdk on RHEL 8.6 and later with FIPS enabled.