Bug 1477139

Summary: Jenkins master/slaves have bad permissions on /etc/passwd and jvm directories
Product: OpenShift Online Reporter: xipang
Component: ImageAssignee: Samuel Munilla <smunilla>
Status: CLOSED CURRENTRELEASE QA Contact: Dongbo Yan <dyan>
Severity: high Docs Contact:
Priority: high    
Version: 3.xCC: aos-bugs, bparees, dakini, dyan, gmontero, jokerman, mmccomas, xipang, xtian
Target Milestone: ---Keywords: OnlineStarter, Regression
Target Release: 3.x   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-09 18:47:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1477847    
Bug Blocks:    
Attachments:
Description Flags
pods_error none

Description xipang 2017-08-01 10:16:30 UTC
Created attachment 1307482 [details]
pods_error

Description of problem:
Jenkins slave pods create error constantly when started pipeline build ,and Events did not give error message.

Version-Release number of selected component (if applicable):
openshift v3.5.5.31
kubernetes v1.5.2+43a9be4
Image: openshift3/jenkins-slave-nodejs-rhel7

How reproducible:
Always

Steps to Reproduce:
1.oc login, create project
2.Create a jenkinsPipelineStrategy bc

$ oc new-app -f https://raw.githubusercontent.com/openshift/origin/master/examples/jenkins/pipeline/samplepipeline.yaml

3.Make new build that would be "Complete"
$ oc start-build sample-pipeline

The started pipeline build will launch jenkins slave pod:
$ oc get pod

Actual results:
3.Create multiple pods,all status is error.
NAME                   READY     STATUS    RESTARTS   AGE
jenkins-1-b5dv8        1/1       Running   0          2h
mongodb-1-793mm        1/1       Running   0          2h
nodejs-1e71a4e626708   0/1       Error     0          6m
nodejs-1e71ca278ffae   0/1       Error     0          6m
nodejs-1e7239e835116   0/1       Error     0          5m
nodejs-1e72a9ac3e26b   0/1       Error     0          5m
nodejs-1e73196d66d03   0/1       Error     0          4m
nodejs-1e73892f842b6   0/1       Error     0          4m
nodejs-1e73f8f24baf9   0/1       Error     0          3m
nodejs-1e741e350ce3d   0/1       Error     0          3m
nodejs-1e7443754a061   0/1       Error     0          3m
nodejs-1e7468b720cc9   0/1       Error     0          3m

Expected results:
3.nodejs-xxx status is Running

Additional info:
$ oc logs -f jenkins-1-b5dv8

SEVERE: Error in provisioning; slave=KubernetesSlave name: nodejs-1e84b5057c455, template=org.csanchez.jenkins.plugins.kubernetes.PodTemplate@7abfde80
java.lang.IllegalStateException: Containers are terminated with exit codes: {jnlp=2}
    at org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback.call(KubernetesCloud.java:600)
    at org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback.call(KubernetesCloud.java:532)
    at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Aug 01, 2017 9:55:24 AM hudson.slaves.NodeProvisioner$2 run
WARNING: Provisioned agent Kubernetes Pod Template failed to launch
java.lang.IllegalStateException: Containers are terminated with exit codes: {jnlp=2}
    at org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback.call(KubernetesCloud.java:600)
    at org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback.call(KubernetesCloud.java:532)
    at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

Comment 1 Ben Parees 2017-08-01 14:08:12 UTC
Can we get logs from the pod ("oc logs nodejs-1e7468b720cc9") and also the output of "oc get pod nodejs-1e7468b720cc9 -o yaml"

Comment 2 Gabe Montero 2017-08-01 15:34:14 UTC
Also, the assumption the 3.5 openshift jenkins images are being used, but if you provide the precise docker image ID in the image stream we can confirm that.

thanks

Comment 3 xipang 2017-08-02 02:21:15 UTC
Image ID:docker-pullable://registry.access.redhat.com/openshift3/jenkins-2-rhel7@sha256:57e9295813aefbf3f604f1389e2f43c4157fad359611c2b4ff3530c3d52df267


$oc logs nodejs-xxxx
/usr/local/bin/generate_container_user: line 7: /etc/passwd: Permission denied
Using 64 bit Java since OPENSHIFT_JENKINS_JVM_ARCH is not set
failed to create /var/lib/alternatives/java.new: Permission denied
Downloading http://172.30.222.229:80/jnlpJars/remoting.jar ...
max heap in MB is 256 and 64 bit was not explicitly set so using 32 bit Java
alternatives version 1.7.2 - Copyright (C) 2001 Red Hat, Inc.
This may be freely redistributed under the terms of the GNU Public License.

usage: alternatives --install <link> <name> <path> <priority>
                    [--initscript <service>]
                    [--family <family>]
                    [--slave <link> <name> <path>]*
       alternatives --remove <name> <path>
       alternatives --auto <name>
       alternatives --config <name>
       alternatives --display <name>
       alternatives --set <name> <path>
       alternatives --list

common options: --verbose --test --help --usage --version --keep-missing
                --altdir <directory> --admindir <directory>



$oc get pod nodejs-xxx -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/limit-ranger: 'LimitRanger plugin set: cpu, memory request for container
      jnlp; cpu, memory limit for container jnlp'
    openshift.io/scc: restricted
  creationTimestamp: 2017-08-02T01:17:57Z
  labels:
    jenkins: slave
    jenkins/nodejs: "true"
  name: nodejs-21aaa0e67e94c
  namespace: xp-t01
  resourceVersion: "11636403"
  selfLink: /api/v1/namespaces/xp-t01/pods/nodejs-21aaa0e67e94c
  uid: 6ae77d1b-7720-11e7-be9a-06e7d92b1aa4
spec:
  activeDeadlineSeconds: 3600
  containers:
  - args:
    - 9e8f52cb56c9f07c69779756a05a060acc46497be996c3c434db8aee63291a63
    - nodejs-21aaa0e67e94c
    env:
    - name: JENKINS_SECRET
      value: 9e8f52cb56c9f07c69779756a05a060acc46497be996c3c434db8aee63291a63
    - name: JENKINS_NAME
      value: nodejs-21aaa0e67e94c
    - name: JENKINS_LOCATION_URL
    - name: JENKINS_URL
      value: http://172.30.222.229:80
    - name: JENKINS_TUNNEL
      value: 172.30.192.59:50000
    - name: JENKINS_JNLP_URL
      value: http://172.30.222.229:80/computer/nodejs-21aaa0e67e94c/slave-agent.jnlp
    - name: HOME
      value: /tmp
    image: registry.access.redhat.com/openshift3/jenkins-slave-nodejs-rhel7
    imagePullPolicy: Always
    name: jnlp
    resources:
      limits:
        cpu: "1"
        memory: 512Mi
      requests:
        cpu: 60m
        memory: 307Mi
    securityContext:
      capabilities:
        drop:
        - KILL
        - MKNOD
        - NET_RAW
        - SETGID
        - SETUID
        - SYS_CHROOT
      privileged: false
      runAsUser: 1003100000
      seLinuxOptions:
        level: s0:c56,c10
    terminationMessagePath: /dev/termination-log
    volumeMounts:
    - mountPath: /tmp
      name: workspace-volume
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: jenkins-token-f7qhs
      readOnly: true
    workingDir: /tmp
  dnsPolicy: ClusterFirst
  imagePullSecrets:
  - name: jenkins-dockercfg-3xmtw
  nodeName: ip-172-31-22-20.us-west-1.compute.internal
  nodeSelector:
    type: compute
  restartPolicy: Never
  securityContext:
    fsGroup: 1003100000
    seLinuxOptions:
      level: s0:c56,c10
  serviceAccount: jenkins
  serviceAccountName: jenkins
  terminationGracePeriodSeconds: 30
  volumes:
  - emptyDir: {}
    name: workspace-volume
  - name: jenkins-token-f7qhs
    secret:
      defaultMode: 420
      secretName: jenkins-token-f7qhs
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: 2017-08-02T01:17:57Z
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: 2017-08-02T01:17:57Z
    message: 'containers with unready status: [jnlp]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: 2017-08-02T01:17:57Z
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://19ac0e9e98fbfa47dabeb725723ff10f71e52db2a1d5d30f0bfec03839557b5c
    image: registry.access.redhat.com/openshift3/jenkins-slave-nodejs-rhel7
    imageID: docker-pullable://registry.access.redhat.com/openshift3/jenkins-slave-nodejs-rhel7@sha256:42da1f399677fbf3013ae267b3b357e0e0631cdcae4456dc170bf942d1e7a1d9
    lastState: {}
    name: jnlp
    ready: false
    restartCount: 0
    state:
      terminated:
        containerID: docker://19ac0e9e98fbfa47dabeb725723ff10f71e52db2a1d5d30f0bfec03839557b5c
        exitCode: 2
        finishedAt: 2017-08-02T01:18:00Z
        reason: Error
        startedAt: 2017-08-02T01:17:59Z
  hostIP: 172.31.22.20
  phase: Failed
  startTime: 2017-08-02T01:17:57Z

Comment 4 Ben Parees 2017-08-02 02:57:22 UTC
yeah this is the same issue that was discussed on SME list.  The recently built+published slave images are broken.  Assigning to Sam, we've discussed the problem on IRC.  We either need to roll back to the previous image, or build+publish new ones.

Comment 5 Dongbo Yan 2017-08-02 03:05:26 UTC
Hi, ben
I test with the same jenkins slave image on OCP, but cannot reproduce the issue.
What do you mean "recently built+published slave images are broken" ?

Comment 6 Ben Parees 2017-08-02 03:08:25 UTC
These errors: 

/usr/local/bin/generate_container_user: line 7: /etc/passwd: Permission denied
Using 64 bit Java since OPENSHIFT_JENKINS_JVM_ARCH is not set
failed to create /var/lib/alternatives/java.new: Permission denied

indicate issues with the file permissions in the slave image that is on registry.access.redhat.com and in your case, it prevented the JDK from being configured properly.


Depending how you tested the image on OCP you might not run into the problem.  Did you run the same pipeline?  Or were you using the image from brew, which does not have this issue?

Comment 7 Dongbo Yan 2017-08-02 03:34:52 UTC
I test with
docker-pullable://registry.access.redhat.com/openshift3/jenkins-slave-nodejs-rhel7@sha256:42da1f399677fbf3013ae267b3b357e0e0631cdcae4456dc170bf942d1e7a1d9

yes, it will display the same error, but slave pod is still running, not turn to error.

# oc logs -f nodejs-16d2063e557e
/usr/local/bin/generate_container_user: line 7: /etc/passwd: Permission denied
Using 64 bit Java since OPENSHIFT_JENKINS_JVM_ARCH is not set
failed to create /var/lib/alternatives/java.new: Permission denied
Downloading http://172.30.211.171:80/jnlpJars/remoting.jar ...
Running java -XX:+UseParallelGC -XX:MaxPermSize=100m -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=40 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -XX:MaxMetaspaceSize=100m -cp /home/jenkins/remoting.jar hudson.remoting.jnlp.Main -headless -url http://172.30.211.171:80 -tunnel 172.30.166.73:50000 b3da9d0e63965143370024b84fb46a7b4c58d41964fb8df7b23c0fbaa4d75cd3 nodejs-16d2063e557e
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=100m; support was removed in 8.0
Aug 02, 2017 3:30:54 AM hudson.remoting.jnlp.Main createEngine
INFO: Setting up slave: nodejs-16d2063e557e
Aug 02, 2017 3:30:54 AM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Aug 02, 2017 3:30:54 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://172.30.211.171:80]
Aug 02, 2017 3:30:55 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Agent discovery successful
  Agent address: 172.30.166.73
  Agent port:    50000
  Identity:      08:e4:34:d6:99:1d:f9:dd:a1:d3:5c:f5:0f:2d:ae:43
Aug 02, 2017 3:30:55 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Aug 02, 2017 3:30:55 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to 172.30.166.73:50000
Aug 02, 2017 3:30:55 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Server reports protocol JNLP4-connect not supported, skipping
Aug 02, 2017 3:30:55 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Server reports protocol JNLP4-plaintext not supported, skipping
Aug 02, 2017 3:30:55 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Server reports protocol JNLP3-connect not supported, skipping
Aug 02, 2017 3:30:55 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Trying protocol: JNLP2-connect
Aug 02, 2017 3:30:55 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connected
Aug 02, 2017 3:31:54 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Terminated

Comment 8 Ben Parees 2017-08-02 03:39:02 UTC
this is the difference.. in online, the container memory is being set to 512meg which leads to a calculated heap of 256meg, which tells the jenkins image to use a 32bit JVM:

Using 64 bit Java since OPENSHIFT_JENKINS_JVM_ARCH is not set
failed to create /var/lib/alternatives/java.new: Permission denied
Downloading http://172.30.222.229:80/jnlpJars/remoting.jar ...
max heap in MB is 256 and 64 bit was not explicitly set so using 32 bit Java

And because the permissions are wrong, we are unable to configure the 32bit jvm, resulting in this output instead:

alternatives version 1.7.2 - Copyright (C) 2001 Red Hat, Inc.
This may be freely redistributed under the terms of the GNU Public License.

usage: alternatives --install <link> <name> <path> <priority>
                    [--initscript <service>]
                    [--family <family>]
                    [--slave <link> <name> <path>]*
       alternatives --remove <name> <path>
       alternatives --auto <name>
       alternatives --config <name>
       alternatives --display <name>
       alternatives --set <name> <path>
       alternatives --list

common options: --verbose --test --help --usage --version --keep-missing
                --altdir <directory> --admindir <directory>




In the working case, your container has more memory, so the 64bit JVM is used.  Because the 64bit JVM is the default, we do not have to configure the JVM, so the permission error does not prevent things from working.

If you set your default resource limit to 512megs for your project in OCP, you will see the same issue as is seen in online.

Comment 15 Ben Parees 2017-08-10 14:20:28 UTC
The published jenkins image has valid permissions now.

Comment 16 Dongbo Yan 2017-08-11 06:21:01 UTC
The published jenkins image has valid permissions, but jenkins slave image still miss 32 bit jvm, slave pod is still error, move to modified

Comment 17 Ben Parees 2017-08-11 13:34:21 UTC
as of this morning I see the 32bit jvm in the brew slave images so this should be locally verifiable, but online-starter won't be fixed until the images are published with the 3.6.1. errata since that cluster only uses published images today.

Comment 18 Dongbo Yan 2017-08-18 02:33:06 UTC
Test on starter-us-west-1
Since jenkins slave images have been published, trigger new pipeline build using these slave images as pipeline job node. Slave pod is running without error.

# oc get pod -w
NAME              READY     STATUS    RESTARTS   AGE
jenkins-1-2kcvw   1/1       Running   0          7m
mongodb-1-1b434   1/1       Running   0         7m
nodejs-7125f62bb51ec   1/1       Running   0         51s
nodejs-mongodb-example-1-build   0/1       ContainerCreating   0         2s
nodejs-mongodb-example-1-build   1/1       Running   0         5s

# oc logs -f nodejs-7125f62bb51ec
Using 64 bit Java since OPENSHIFT_JENKINS_JVM_ARCH is not set
Downloading http://172.30.107.25:80/jnlpJars/remoting.jar ...
max heap in MB is 256 and 64 bit was not explicitly set so using 32 bit Java
Running java -XX:+UseParallelGC -XX:MaxPermSize=100m -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=40 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -XX:MaxMetaspaceSize=100m -Xms256m -Xmx256m -cp /home/jenkins/remoting.jar hudson.remoting.jnlp.Main -headless -url http://172.30.107.25:80 -tunnel 172.30.111.92:50000 bcd82cebe8749a567e63851768023388808b9c6b47351fd5652b67c736f07a6d nodejs-7125f62bb51ec
Aug 18, 2017 2:27:43 AM hudson.remoting.jnlp.Main createEngine
INFO: Setting up slave: nodejs-7125f62bb51ec
Aug 18, 2017 2:27:43 AM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Aug 18, 2017 2:27:43 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://172.30.107.25:80]
Aug 18, 2017 2:27:43 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
INFO: Remoting server accepts the following protocols: [JNLP4-connect, CLI2-connect, JNLP-connect, Ping, CLI-connect, JNLP2-connect]
Aug 18, 2017 2:27:43 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Agent discovery successful
  Agent address: 172.30.111.92
  Agent port:    50000
  Identity:      cf:75:d2:45:7c:7e:20:64:32:1c:96:c9:f9:12:74:45
Aug 18, 2017 2:27:43 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Aug 18, 2017 2:27:43 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to 172.30.111.92:50000
Aug 18, 2017 2:27:43 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Trying protocol: JNLP4-connect
Aug 18, 2017 2:27:44 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Remote identity confirmed: cf:75:d2:45:7c:7e:20:64:32:1c:96:c9:f9:12:74:45
Aug 18, 2017 2:27:44 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connected

This bug could move to verified