Bug 1477139
Summary: | Jenkins master/slaves have bad permissions on /etc/passwd and jvm directories | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Online | Reporter: | xipang | ||||
Component: | Image | Assignee: | Samuel Munilla <smunilla> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Dongbo Yan <dyan> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 3.x | CC: | aos-bugs, bparees, dakini, dyan, gmontero, jokerman, mmccomas, xipang, xtian | ||||
Target Milestone: | --- | Keywords: | OnlineStarter, Regression | ||||
Target Release: | 3.x | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2017-11-09 18:47:18 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1477847 | ||||||
Bug Blocks: | |||||||
Attachments: |
|
Description
xipang
2017-08-01 10:16:30 UTC
Can we get logs from the pod ("oc logs nodejs-1e7468b720cc9") and also the output of "oc get pod nodejs-1e7468b720cc9 -o yaml" Also, the assumption the 3.5 openshift jenkins images are being used, but if you provide the precise docker image ID in the image stream we can confirm that. thanks Image ID:docker-pullable://registry.access.redhat.com/openshift3/jenkins-2-rhel7@sha256:57e9295813aefbf3f604f1389e2f43c4157fad359611c2b4ff3530c3d52df267 $oc logs nodejs-xxxx /usr/local/bin/generate_container_user: line 7: /etc/passwd: Permission denied Using 64 bit Java since OPENSHIFT_JENKINS_JVM_ARCH is not set failed to create /var/lib/alternatives/java.new: Permission denied Downloading http://172.30.222.229:80/jnlpJars/remoting.jar ... max heap in MB is 256 and 64 bit was not explicitly set so using 32 bit Java alternatives version 1.7.2 - Copyright (C) 2001 Red Hat, Inc. This may be freely redistributed under the terms of the GNU Public License. usage: alternatives --install <link> <name> <path> <priority> [--initscript <service>] [--family <family>] [--slave <link> <name> <path>]* alternatives --remove <name> <path> alternatives --auto <name> alternatives --config <name> alternatives --display <name> alternatives --set <name> <path> alternatives --list common options: --verbose --test --help --usage --version --keep-missing --altdir <directory> --admindir <directory> $oc get pod nodejs-xxx -o yaml apiVersion: v1 kind: Pod metadata: annotations: kubernetes.io/limit-ranger: 'LimitRanger plugin set: cpu, memory request for container jnlp; cpu, memory limit for container jnlp' openshift.io/scc: restricted creationTimestamp: 2017-08-02T01:17:57Z labels: jenkins: slave jenkins/nodejs: "true" name: nodejs-21aaa0e67e94c namespace: xp-t01 resourceVersion: "11636403" selfLink: /api/v1/namespaces/xp-t01/pods/nodejs-21aaa0e67e94c uid: 6ae77d1b-7720-11e7-be9a-06e7d92b1aa4 spec: activeDeadlineSeconds: 3600 containers: - args: - 9e8f52cb56c9f07c69779756a05a060acc46497be996c3c434db8aee63291a63 - nodejs-21aaa0e67e94c env: - name: JENKINS_SECRET value: 9e8f52cb56c9f07c69779756a05a060acc46497be996c3c434db8aee63291a63 - name: JENKINS_NAME value: nodejs-21aaa0e67e94c - name: JENKINS_LOCATION_URL - name: JENKINS_URL value: http://172.30.222.229:80 - name: JENKINS_TUNNEL value: 172.30.192.59:50000 - name: JENKINS_JNLP_URL value: http://172.30.222.229:80/computer/nodejs-21aaa0e67e94c/slave-agent.jnlp - name: HOME value: /tmp image: registry.access.redhat.com/openshift3/jenkins-slave-nodejs-rhel7 imagePullPolicy: Always name: jnlp resources: limits: cpu: "1" memory: 512Mi requests: cpu: 60m memory: 307Mi securityContext: capabilities: drop: - KILL - MKNOD - NET_RAW - SETGID - SETUID - SYS_CHROOT privileged: false runAsUser: 1003100000 seLinuxOptions: level: s0:c56,c10 terminationMessagePath: /dev/termination-log volumeMounts: - mountPath: /tmp name: workspace-volume - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: jenkins-token-f7qhs readOnly: true workingDir: /tmp dnsPolicy: ClusterFirst imagePullSecrets: - name: jenkins-dockercfg-3xmtw nodeName: ip-172-31-22-20.us-west-1.compute.internal nodeSelector: type: compute restartPolicy: Never securityContext: fsGroup: 1003100000 seLinuxOptions: level: s0:c56,c10 serviceAccount: jenkins serviceAccountName: jenkins terminationGracePeriodSeconds: 30 volumes: - emptyDir: {} name: workspace-volume - name: jenkins-token-f7qhs secret: defaultMode: 420 secretName: jenkins-token-f7qhs status: conditions: - lastProbeTime: null lastTransitionTime: 2017-08-02T01:17:57Z status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: 2017-08-02T01:17:57Z message: 'containers with unready status: [jnlp]' reason: ContainersNotReady status: "False" type: Ready - lastProbeTime: null lastTransitionTime: 2017-08-02T01:17:57Z status: "True" type: PodScheduled containerStatuses: - containerID: docker://19ac0e9e98fbfa47dabeb725723ff10f71e52db2a1d5d30f0bfec03839557b5c image: registry.access.redhat.com/openshift3/jenkins-slave-nodejs-rhel7 imageID: docker-pullable://registry.access.redhat.com/openshift3/jenkins-slave-nodejs-rhel7@sha256:42da1f399677fbf3013ae267b3b357e0e0631cdcae4456dc170bf942d1e7a1d9 lastState: {} name: jnlp ready: false restartCount: 0 state: terminated: containerID: docker://19ac0e9e98fbfa47dabeb725723ff10f71e52db2a1d5d30f0bfec03839557b5c exitCode: 2 finishedAt: 2017-08-02T01:18:00Z reason: Error startedAt: 2017-08-02T01:17:59Z hostIP: 172.31.22.20 phase: Failed startTime: 2017-08-02T01:17:57Z yeah this is the same issue that was discussed on SME list. The recently built+published slave images are broken. Assigning to Sam, we've discussed the problem on IRC. We either need to roll back to the previous image, or build+publish new ones. Hi, ben I test with the same jenkins slave image on OCP, but cannot reproduce the issue. What do you mean "recently built+published slave images are broken" ? These errors: /usr/local/bin/generate_container_user: line 7: /etc/passwd: Permission denied Using 64 bit Java since OPENSHIFT_JENKINS_JVM_ARCH is not set failed to create /var/lib/alternatives/java.new: Permission denied indicate issues with the file permissions in the slave image that is on registry.access.redhat.com and in your case, it prevented the JDK from being configured properly. Depending how you tested the image on OCP you might not run into the problem. Did you run the same pipeline? Or were you using the image from brew, which does not have this issue? I test with docker-pullable://registry.access.redhat.com/openshift3/jenkins-slave-nodejs-rhel7@sha256:42da1f399677fbf3013ae267b3b357e0e0631cdcae4456dc170bf942d1e7a1d9 yes, it will display the same error, but slave pod is still running, not turn to error. # oc logs -f nodejs-16d2063e557e /usr/local/bin/generate_container_user: line 7: /etc/passwd: Permission denied Using 64 bit Java since OPENSHIFT_JENKINS_JVM_ARCH is not set failed to create /var/lib/alternatives/java.new: Permission denied Downloading http://172.30.211.171:80/jnlpJars/remoting.jar ... Running java -XX:+UseParallelGC -XX:MaxPermSize=100m -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=40 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -XX:MaxMetaspaceSize=100m -cp /home/jenkins/remoting.jar hudson.remoting.jnlp.Main -headless -url http://172.30.211.171:80 -tunnel 172.30.166.73:50000 b3da9d0e63965143370024b84fb46a7b4c58d41964fb8df7b23c0fbaa4d75cd3 nodejs-16d2063e557e OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=100m; support was removed in 8.0 Aug 02, 2017 3:30:54 AM hudson.remoting.jnlp.Main createEngine INFO: Setting up slave: nodejs-16d2063e557e Aug 02, 2017 3:30:54 AM hudson.remoting.jnlp.Main$CuiListener <init> INFO: Jenkins agent is running in headless mode. Aug 02, 2017 3:30:54 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Locating server among [http://172.30.211.171:80] Aug 02, 2017 3:30:55 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Agent discovery successful Agent address: 172.30.166.73 Agent port: 50000 Identity: 08:e4:34:d6:99:1d:f9:dd:a1:d3:5c:f5:0f:2d:ae:43 Aug 02, 2017 3:30:55 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Handshaking Aug 02, 2017 3:30:55 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to 172.30.166.73:50000 Aug 02, 2017 3:30:55 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Server reports protocol JNLP4-connect not supported, skipping Aug 02, 2017 3:30:55 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Server reports protocol JNLP4-plaintext not supported, skipping Aug 02, 2017 3:30:55 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Server reports protocol JNLP3-connect not supported, skipping Aug 02, 2017 3:30:55 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Trying protocol: JNLP2-connect Aug 02, 2017 3:30:55 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Connected Aug 02, 2017 3:31:54 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Terminated this is the difference.. in online, the container memory is being set to 512meg which leads to a calculated heap of 256meg, which tells the jenkins image to use a 32bit JVM: Using 64 bit Java since OPENSHIFT_JENKINS_JVM_ARCH is not set failed to create /var/lib/alternatives/java.new: Permission denied Downloading http://172.30.222.229:80/jnlpJars/remoting.jar ... max heap in MB is 256 and 64 bit was not explicitly set so using 32 bit Java And because the permissions are wrong, we are unable to configure the 32bit jvm, resulting in this output instead: alternatives version 1.7.2 - Copyright (C) 2001 Red Hat, Inc. This may be freely redistributed under the terms of the GNU Public License. usage: alternatives --install <link> <name> <path> <priority> [--initscript <service>] [--family <family>] [--slave <link> <name> <path>]* alternatives --remove <name> <path> alternatives --auto <name> alternatives --config <name> alternatives --display <name> alternatives --set <name> <path> alternatives --list common options: --verbose --test --help --usage --version --keep-missing --altdir <directory> --admindir <directory> In the working case, your container has more memory, so the 64bit JVM is used. Because the 64bit JVM is the default, we do not have to configure the JVM, so the permission error does not prevent things from working. If you set your default resource limit to 512megs for your project in OCP, you will see the same issue as is seen in online. The published jenkins image has valid permissions now. The published jenkins image has valid permissions, but jenkins slave image still miss 32 bit jvm, slave pod is still error, move to modified as of this morning I see the 32bit jvm in the brew slave images so this should be locally verifiable, but online-starter won't be fixed until the images are published with the 3.6.1. errata since that cluster only uses published images today. Test on starter-us-west-1 Since jenkins slave images have been published, trigger new pipeline build using these slave images as pipeline job node. Slave pod is running without error. # oc get pod -w NAME READY STATUS RESTARTS AGE jenkins-1-2kcvw 1/1 Running 0 7m mongodb-1-1b434 1/1 Running 0 7m nodejs-7125f62bb51ec 1/1 Running 0 51s nodejs-mongodb-example-1-build 0/1 ContainerCreating 0 2s nodejs-mongodb-example-1-build 1/1 Running 0 5s # oc logs -f nodejs-7125f62bb51ec Using 64 bit Java since OPENSHIFT_JENKINS_JVM_ARCH is not set Downloading http://172.30.107.25:80/jnlpJars/remoting.jar ... max heap in MB is 256 and 64 bit was not explicitly set so using 32 bit Java Running java -XX:+UseParallelGC -XX:MaxPermSize=100m -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=40 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -XX:MaxMetaspaceSize=100m -Xms256m -Xmx256m -cp /home/jenkins/remoting.jar hudson.remoting.jnlp.Main -headless -url http://172.30.107.25:80 -tunnel 172.30.111.92:50000 bcd82cebe8749a567e63851768023388808b9c6b47351fd5652b67c736f07a6d nodejs-7125f62bb51ec Aug 18, 2017 2:27:43 AM hudson.remoting.jnlp.Main createEngine INFO: Setting up slave: nodejs-7125f62bb51ec Aug 18, 2017 2:27:43 AM hudson.remoting.jnlp.Main$CuiListener <init> INFO: Jenkins agent is running in headless mode. Aug 18, 2017 2:27:43 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Locating server among [http://172.30.107.25:80] Aug 18, 2017 2:27:43 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve INFO: Remoting server accepts the following protocols: [JNLP4-connect, CLI2-connect, JNLP-connect, Ping, CLI-connect, JNLP2-connect] Aug 18, 2017 2:27:43 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Agent discovery successful Agent address: 172.30.111.92 Agent port: 50000 Identity: cf:75:d2:45:7c:7e:20:64:32:1c:96:c9:f9:12:74:45 Aug 18, 2017 2:27:43 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Handshaking Aug 18, 2017 2:27:43 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to 172.30.111.92:50000 Aug 18, 2017 2:27:43 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Trying protocol: JNLP4-connect Aug 18, 2017 2:27:44 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Remote identity confirmed: cf:75:d2:45:7c:7e:20:64:32:1c:96:c9:f9:12:74:45 Aug 18, 2017 2:27:44 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Connected This bug could move to verified |