Hide Forgot
+++ This bug was initially created as a clone of Bug #1318510 +++ Online/Image registry.access.redhat.com/openshift3/jenkins-1-rhel7 908b6dd3dafb Version-Release number of selected component (if applicable): kubernetes v1.2.0-alpha.7-703-gbc4550d Docker 1.8.2-el7, build a01dc02/1.8.2 kernel 3.10.0-327.10.1.el7.x86_64 How reproducible: always Description of problem: Error syncing pod when using jenkins-ephemeral-template to create jenkins Steps to Reproduce: 1. $ oc new-project test 2. $oc policy add-role-to-user admin system:serviceaccount:test:default -n test 3. $oc new-app -f https://raw.githubusercontent.com/openshift/origin/master/examples/jenkins/jenkins-ephemeral-template.json 4. $ Check the pod # oc get pods NAME READY STATUS RESTARTS AGE jenkins-1-deploy 0/1 ContainerCreating 0 1h [root@dhcp-128-91 backup]# oc describe pod jenkins-1-deploy Name: jenkins-1-deploy Namespace: test Image(s): openshift3/ose-deployer:v3.1.1.910 Node: ip-172-31-15-139.ec2.internal/172.31.15.139 Start Time: Thu, 17 Mar 2016 11:46:47 +0800 Labels: openshift.io/deployer-pod-for.name=jenkins-1 Status: Pending Reason: Message: IP: Controllers: <none> Containers: deployment: Container ID: Image: openshift3/ose-deployer:v3.1.1.910 Image ID: Port: QoS Tier: memory: BestEffort cpu: BestEffort State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Environment Variables: KUBERNETES_MASTER: https://ip-172-31-4-121.ec2.internal OPENSHIFT_MASTER: https://ip-172-31-4-121.ec2.internal BEARER_TOKEN_FILE: /var/run/secrets/kubernetes.io/serviceaccount/token OPENSHIFT_CA_DATA: -----BEGIN CERTIFICATE----- MIIC5jCCAdCgAwIBAgIBATALBgkqhkiG9w0BAQswJjEkMCIGA1UEAwwbb3BlbnNo aWZ0LXNpZ25lckAxNDU3MzkxMjUxMB4XDTE2MDMwNzIyNTQxMVoXDTIxMDMwNjIy NTQxMlowJjEkMCIGA1UEAwwbb3BlbnNoaWZ0LXNpZ25lckAxNDU3MzkxMjUxMIIB IjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAswpnGk9V5a1/BMPhRPkkY/bz iad06kV8tzM7KXCga11S74x7D1cJ6TEpRx7PDHrzkfFc6EuGr8h6IgAD9twsxM2I sUkGfeRbWlBx+UmB7mdpfbdbyWsP0JV/h1pgHXVZbHn3P42KfkpdCOoxfbdXmwBd fEdlBX6+DVQoNE0leHnoQ1B52uXhYeUJF9yjGh47CHNSnJH1HbdOS3UPb46WYC0e dp/U7Ho0zFpmsHAJMnSMN0EJU8EZiF8LmSS/S9y27lfK8Wji8W0f6B2bAALHGvwC J9vCYsci86eOlEcFsLmevEwOxLYaNCe9xFM7ujMk/Ic5fmv+PFgHVPS9WGoFiQID AQABoyMwITAOBgNVHQ8BAf8EBAMCAKQwDwYDVR0TAQH/BAUwAwEB/zALBgkqhkiG 9w0BAQsDggEBAF6wzswIVTXRHW+26AmIbq6ZQWoJY3Nsw0fYl3wKDOFsBzxKe/Wf iI0yikZl07m2gY/oBvTzuKiuuiiD7WjMxjbJUKTnLNOzQ2HJ8893SWf0vIFeXyVs fkTjZFV9yMJyl4pso69zsRurh+7whb7tnxpyCNQ5Dx9S9wQ1tRnSl1p0rrUxh4cc JyNE6SCHW9rXDlUwqD/9DIqgE3Org8EewMVCH65YwXV2Xny0+wQGIBeThJN9TI7T HvfhrPXMa8J7yhv2MqjqFAYLbcJh/8fRRNITDuVG5PDlWEY7bGieKo8ElVyShYlH HNJ2fm3e8L9ZRRzWy4TtC1e++DLXSsz8G04= -----END CERTIFICATE----- OPENSHIFT_DEPLOYMENT_NAME: jenkins-1 OPENSHIFT_DEPLOYMENT_NAMESPACE: test Conditions: Type Status Ready False Volumes: deployer-token-a7woy: Type: Secret (a secret that should populate this volume) SecretName: deployer-token-a7woy Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 1h 1h 1 {default-scheduler } Normal Scheduled Successfully assigned jenkins-1-deploy to ip-172-31-15-139.ec2.internal 1h 1h 1 {kubelet ip-172-31-15-139.ec2.internal} Warning FailedSync Error syncing pod, skipping: No such container: 8ace102fd4462e67bdbd4eea4f32d0c5257e0f914dbe1f2aac92786acdce1753 1h 1h 1 {kubelet ip-172-31-15-139.ec2.internal} Warning FailedSync Error syncing pod, skipping: No such container: 1085c0d8ef8cf554dce5048e9e4193f7aef50fe54374f0c229fa4d3bfdec9663 1h 1h 1 {kubelet ip-172-31-15-139.ec2.internal} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with RunContainerError: "runContainer: API error (500): Error running DeviceResume dm_task_run failed\n" 1h 1h 122 {kubelet ip-172-31-15-139.ec2.internal} Warning FailedSync Error syncing pod, skipping: failed to "SetupNetwork" for "jenkins-1-deploy_test" with SetupNetworkError: "Failed to setup network for pod \"jenkins-1-deploy_test(dfeb8369-ebf2-11e5-90a5-0aadb0f8cf89)\" using network plugins \"redhat/openshift-ovs-subnet\": exit status 1; Skipping pod" 1h 1h 132 {kubelet ip-172-31-15-139.ec2.internal} Warning FailedSync Error syncing pod, skipping: API error (500): Unknown device acc7e11a7e98340f0efcfefefc267e43a07eec266b4960508d3bd005b449a3b5 1h 21s 296 {kubelet ip-172-31-15-139.ec2.internal} Warning FailedSync Error syncing pod, skipping: API error (500): Unknown device acc7e11a7e98340f0efcfefefc267e43a07eec266b4960508d3bd005b449a3b5 Actual results: pod is not running Expected results: pod should be running status --- Additional comment from Ben Parees on 2016-03-17 17:59:58 EDT --- passing to networking team based on: 1h 1h 122 {kubelet ip-172-31-15-139.ec2.internal} Warning FailedSync Error syncing pod, skipping: failed to "SetupNetwork" for "jenkins-1-deploy_test" with SetupNetworkError: "Failed to setup network for pod \"jenkins-1-deploy_test(dfeb8369-ebf2-11e5-90a5-0aadb0f8cf89)\" using network plugins \"redhat/openshift-ovs-subnet\": exit status 1; Skipping pod" --- Additional comment from wewang on 2016-03-17 23:10:00 EDT --- Tested it again, pod is CrashLoopBackOff: # oc get pods NAME READY STATUS RESTARTS AGE jenkins-1-deploy 1/1 Running 0 38s jenkins-1-yfahj 0/1 Running 0 34s # oc get pods NAME READY STATUS RESTARTS AGE jenkins-1-deploy 1/1 Running 0 1m jenkins-1-yfahj 0/1 CrashLoopBackOff 1 1m # oc get pods NAME READY STATUS RESTARTS AGE jenkins-1-deploy 1/1 Running 0 1m jenkins-1-yfahj 0/1 Running 2 1m # oc get pods NAME READY STATUS RESTARTS AGE jenkins-1-deploy 1/1 Running 0 1m jenkins-1-yfahj 0/1 Running 2 1m # oc get pods NAME READY STATUS RESTARTS AGE jenkins-1-deploy 1/1 Running 0 1m jenkins-1-yfahj 0/1 CrashLoopBackOff 2 1m # oc describe jenkins-1-yfahj the server doesn't have a resource type "jenkins-1-yfahj" #oc describe pod jenkins-1-yfahj Name: jenkins-1-yfahj Namespace: wewang7 Image(s): registry.access.redhat.com/openshift3/jenkins-1-rhel7:latest Node: ip-172-31-15-139.ec2.internal/172.31.15.139 Start Time: Fri, 18 Mar 2016 10:43:26 +0800 Labels: deployment=jenkins-1,deploymentconfig=jenkins,name=jenkins Status: Running Reason: Message: IP: 10.1.0.237 Controllers: ReplicationController/jenkins-1 Containers: jenkins: Container ID: docker://68050e3ce2b12550e08d3fe62e433c1bf110679500698de950fc60bec201fd83 Image: registry.access.redhat.com/openshift3/jenkins-1-rhel7:latest Image ID: docker://908b6dd3dafbabbb1cf38b60bb8a281988c04f4953854df19e0ed804fe9d4dfa Port: QoS Tier: cpu: Burstable memory: Guaranteed Limits: cpu: 1 memory: 512Mi Requests: cpu: 60m memory: 512Mi State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: OOMKilled Exit Code: 137 Started: Fri, 18 Mar 2016 10:44:39 +0800 Finished: Fri, 18 Mar 2016 10:44:54 +0800 Ready: False Restart Count: 2 Liveness: http-get http://:8080/login delay=30s timeout=3s period=10s #success=1 #failure=3 Readiness: http-get http://:8080/login delay=3s timeout=3s period=10s #success=1 #failure=3 Environment Variables: JENKINS_PASSWORD: password Conditions: Type Status Ready False Volumes: jenkins-data: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: default-token-m8i52: Type: Secret (a secret that should populate this volume) SecretName: default-token-m8i52 Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 2m 2m 1 {default-scheduler } Normal Scheduled Successfully assigned jenkins-1-yfahj to ip-172-31-15-139.ec2.internal 2m 2m 1 {kubelet ip-172-31-15-139.ec2.internal} spec.containers{jenkins} Normal Created Created container with docker id 5a465869c4db 2m 2m 1 {kubelet ip-172-31-15-139.ec2.internal} spec.containers{jenkins} Normal Started Started container with docker id 5a465869c4db 1m 1m 2 {kubelet ip-172-31-15-139.ec2.internal} spec.containers{jenkins} Warning Unhealthy Readiness probe failed: Get http://10.1.0.237:8080/login: read tcp 10.1.0.237:8080: use of closed network connection 1m 1m 1 {kubelet ip-172-31-15-139.ec2.internal} spec.containers{jenkins} Normal Created Created container with docker id 86e52801394e 1m 1m 1 {kubelet ip-172-31-15-139.ec2.internal} spec.containers{jenkins} Normal Started Started container with docker id 86e52801394e 1m 1m 1 {kubelet ip-172-31-15-139.ec2.internal} spec.containers{jenkins} Warning Unhealthy Readiness probe failed: HTTP probe failed with statuscode: 503 1m 1m 2 {kubelet ip-172-31-15-139.ec2.internal} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "jenkins" with CrashLoopBackOff: "Back-off 10s restarting failed container=jenkins pod=jenkins-1-yfahj_wewang7(306dc2f6-ecb3-11e5-9d8d-0aadb0f8cf89)" 2m 51s 3 {kubelet ip-172-31-15-139.ec2.internal} spec.containers{jenkins} Normal Pulled Container image "registry.access.redhat.com/openshift3/jenkins-1-rhel7:latest" already present on machine 50s 50s 1 {kubelet ip-172-31-15-139.ec2.internal} spec.containers{jenkins} Normal Created Created container with docker id 68050e3ce2b1 50s 50s 1 {kubelet ip-172-31-15-139.ec2.internal} spec.containers{jenkins} Normal Started Started container with docker id 68050e3ce2b1 1m 43s 2 {kubelet ip-172-31-15-139.ec2.internal} spec.containers{jenkins} Warning Unhealthy Readiness probe failed: Get http://10.1.0.237:8080/login: dial tcp 10.1.0.237:8080: connection refused 1m 19s 5 {kubelet ip-172-31-15-139.ec2.internal} spec.containers{jenkins} Warning BackOff Back-off restarting failed docker container 34s 19s 3 {kubelet ip-172-31-15-139.ec2.internal} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "jenkins" with CrashLoopBackOff: "Back-off 20s restarting failed container=jenkins pod=jenkins-1-yfahj_wewang7(306dc2f6-ecb3-11e5-9d8d-0aadb0f8cf89)" --- Additional comment from Ben Bennett on 2016-03-18 08:29:39 EDT --- Can you please get us the output from the openshift node log. And run the troubleshooting script at https://raw.githubusercontent.com/openshift/openshift-sdn/master/hack/debug.sh --- Additional comment from wewang on 2016-03-21 02:07:05 EDT --- node logs as below, and I have no permission to run the script, can you track the problem with the logs? [root@dev-preview-int-node-compute-d5623 ~]# docker ps -a | grep jenkins-1-deploy ab8870ea68fb openshift3/ose-deployer:v3.1.1.910 "/usr/bin/openshift-d" 7 minutes ago Exited (255) 5 minutes ago k8s_deployment.86d070f9_jenkins-1-deploy_test_28d451a5-ef29-11e5-9d8d-0aadb0f8cf89_1eecc842 0229e3387213 openshift3/ose-pod:v3.1.1.910 "/pod" 7 minutes ago Exited (0) 5 minutes ago k8s_POD.e5c1dc5a_jenkins-1-deploy_test_28d451a5-ef29-11e5-9d8d-0aadb0f8cf89_64850473 [root@dev-preview-int-node-compute-d5623 ~]# docker logs ab8870ea68fbdec7bab4b6046ef9e49a7aafb78c428d2f03590896efe4367f2d I0321 01:52:59.062514 1 deployer.go:199] Deploying test/jenkins-1 for the first time (replicas: 1) I0321 01:52:59.066739 1 recreate.go:126] Scaling test/jenkins-1 to 1 before performing acceptance check F0321 01:55:00.099291 1 deployer.go:69] couldn't scale test/jenkins-1 to 1: timed out waiting for the condition --- Additional comment from XiuJuan Wang on 2016-03-21 06:05:20 EDT --- Met this issue in ose env(3.2/2016-03-18.4) 0s 0s 1 {kubelet openshift-133.lab.sjc.redhat.com} spec.containers{jenkins} Normal Started Started container with docker id 1e806e9afc6b <invalid> <invalid> 1 {kubelet openshift-133.lab.sjc.redhat.com} spec.containers{jenkins} Normal Killing Killing container with docker id 1e806e9afc6b: pod "jenkins-1-a1vio_jenkins(17142722-ef4a-11e5-bcc1-fa163efe3ad5)" container "jenkins" is unhealthy, it will be killed and re-created. 5m <invalid> 6 {kubelet openshift-133.lab.sjc.redhat.com} spec.containers{jenkins} Normal Pulled Container image "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/jenkins-1-rhel7:latest" already present on machine <invalid> <invalid> 1 {kubelet openshift-133.lab.sjc.redhat.com} spec.containers{jenkins} Normal Created Created container with docker id 24a815714336 <invalid> <invalid> 1 {kubelet openshift-133.lab.sjc.redhat.com} spec.containers{jenkins} Normal Started Started container with docker id 24a815714336 5m <invalid> 23 {kubelet openshift-133.lab.sjc.redhat.com} spec.containers{jenkins} Warning Unhealthy Readiness probe failed: Get http://10.2.1.6:8080/login: dial tcp 10.2.1.6:8080: connection refused 4m <invalid> 8 {kubelet openshift-133.lab.sjc.redhat.com} spec.containers{jenkins} Warning Unhealthy Liveness probe failed: Get http://10.2.1.6:8080/login: dial tcp 10.2.1.6:8080: connection refused --- Additional comment from wewang on 2016-03-21 06:10:18 EDT --- ose env of Comment5 is : openshift v3.2.0.5 kubernetes v1.2.0-36-g4a3f9c5 etcd 2.2.5 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/jenkins-1-rhel7 d84481f80ece --- Additional comment from Ben Bennett on 2016-03-21 13:18:15 EDT --- I need the output from: journalctl -flu atomic-openshift-node (Or perhaps openshift-node... do systemctl status | grep openshift to get the unit name if those fail) --- Additional comment from Chris DiGiovanni on 2016-03-21 16:46:29 EDT --- Ben, Here are my debug logs that your requested while in the IRC... Mar 21 15:32:38 node1_vm origin-node[17123]: I0321 15:32:38.315960 17177 plugin.go:138] SetUpPod network plugin output: + lock_file=/var/lock/openshift-sdn.lock Mar 21 15:32:38 node1_vm origin-node[17123]: + action=setup Mar 21 15:32:38 node1_vm origin-node[17123]: + net_container=41c971017f78f95334cd8ded26239c1edd6f6f5eb219f9554dd4b7eb2058452f Mar 21 15:32:38 node1_vm origin-node[17123]: + tenant_id=0 Mar 21 15:32:38 node1_vm origin-node[17123]: + lockwrap run Mar 21 15:32:38 node1_vm origin-node[17123]: + flock 200 Mar 21 15:32:38 node1_vm origin-node[17123]: + run Mar 21 15:32:38 node1_vm origin-node[17123]: + get_ipaddr_pid_veth Mar 21 15:32:38 node1_vm origin-node[17123]: ++ docker inspect --format '{{.HostConfig.NetworkMode}}' 41c971017f78f95334cd8ded26239c1edd6f6f5eb219f9554dd4b7eb2058452f Mar 21 15:32:38 node1_vm origin-node[17123]: + network_mode=default Mar 21 15:32:38 node1_vm origin-node[17123]: + '[' default == host ']' Mar 21 15:32:38 node1_vm origin-node[17123]: + [[ default =~ container:.* ]] Mar 21 15:32:38 node1_vm origin-node[17123]: ++ docker inspect --format '{{.NetworkSettings.IPAddress}}' 41c971017f78f95334cd8ded26239c1edd6f6f5eb219f9554dd4b7eb2058452f Mar 21 15:32:38 node1_vm origin-node[17123]: + ipaddr=172.17.0.2 Mar 21 15:32:38 node1_vm origin-node[17123]: ++ docker inspect --format '{{.State.Pid}}' 41c971017f78f95334cd8ded26239c1edd6f6f5eb219f9554dd4b7eb2058452f Mar 21 15:32:38 node1_vm origin-node[17123]: + pid=28810 Mar 21 15:32:38 node1_vm origin-node[17123]: ++ get_veth_host 28810 Mar 21 15:32:38 node1_vm origin-node[17123]: ++ local pid=28810 Mar 21 15:32:38 node1_vm origin-node[17123]: +++ nsenter -n -t 28810 -- ethtool -S eth0 Mar 21 15:32:38 node1_vm origin-node[17123]: +++ sed -n -e 's/.*peer_ifindex: //p' Mar 21 15:32:38 node1_vm origin-node[17123]: ++ local veth_ifindex=1358 Mar 21 15:32:38 node1_vm origin-node[17123]: ++ ip link show Mar 21 15:32:38 node1_vm origin-node[17123]: ++ sed -ne 's/^1358: \([^:@]*\).*/\1/p' Mar 21 15:32:38 node1_vm origin-node[17123]: + veth_host=veth14e6fda Mar 21 15:32:38 node1_vm origin-node[17123]: ++ get_container_mac 28810 Mar 21 15:32:38 node1_vm origin-node[17123]: ++ local pid=28810 Mar 21 15:32:38 node1_vm origin-node[17123]: ++ nsenter -n -t 28810 -- ip link show dev eth0 Mar 21 15:32:38 node1_vm origin-node[17123]: ++ sed -n -e 's/.*link.ether \([^ ]*\).*/\1/p' Mar 21 15:32:38 node1_vm origin-node[17123]: + macaddr=02:42:ac:11:00:02 Mar 21 15:32:38 node1_vm origin-node[17123]: + source /run/openshift-sdn/config.env Mar 21 15:32:38 node1_vm origin-node[17123]: ++ export OPENSHIFT_CLUSTER_SUBNET=10.1.0.0/16 Mar 21 15:32:38 node1_vm origin-node[17123]: ++ OPENSHIFT_CLUSTER_SUBNET=10.1.0.0/16 Mar 21 15:32:38 node1_vm origin-node[17123]: + case "$action" in Mar 21 15:32:38 node1_vm origin-node[17123]: + add_ovs_port Mar 21 15:32:38 node1_vm origin-node[17123]: + brctl delif lbr0 veth14e6fda Mar 21 15:32:38 node1_vm origin-node[17123]: device veth14e6fda is not a slave of lbr0 Mar 21 15:32:38 node1_vm origin-node[17123]: , exit status 1 Mar 21 15:32:38 node1_vm origin-node[17123]: E0321 15:32:38.316051 17177 manager.go:1791] Failed to setup network for pod "docker-registry-1-deploy_default(4006a8ee-ef9b-11e5-bbd7-0050568848d8)" using network plugins "redhat/openshift-ovs-subnet" : exit status 1; Skipping pod Thanks, digi691 --- Additional comment from Dan Williams on 2016-03-22 14:30:17 EDT --- What RPM version of atomic-openshift is installed on this cluster? "rpm -q atomic-openshift" will tell you... Sorry if I missed it above. --- Additional comment from Dan Williams on 2016-03-22 14:32:15 EDT --- It's looking like docker's network setup isn't correct. Can you also grab: 1) the contents of /run/openshift-sdn/docker-network 2) 'ps ax | grep docker' --- Additional comment from wewang on 2016-03-30 02:44:49 EDT --- @Ben ,here is info you need [root@dev-preview-int-master-167b1 ~]# cat /run/openshift-sdn/docker-network # This file has been modified by openshift-sdn. DOCKER_NETWORK_OPTIONS='-b=lbr0 --mtu=8951' [root@dev-preview-int-master-167b1 ~]# ps ax | grep docker 29489 ? Ss 0:00 /bin/sh -c /usr/bin/docker daemon $OPTIONS $DOCKER_STORAGE_OPTIONS $DOCKER_NETWORK_OPTIONS $ADD_REGISTRY $BLOCK_REGISTRY $INSECURE_REGISTRY 2>&1 | /usr/bin/forward-journald -tag docker 29490 ? Sl 64:01 /usr/bin/docker daemon --selinux-enabled --storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/docker_vg-docker--pool --storage-opt dm.use_deferred_removal=true -b=lbr0 --mtu=8951 --add-registry registry.qe.openshift.com --add-registry registry.access.redhat.com 29491 ? Sl 0:21 /usr/bin/forward-journald -tag docker 46201 pts/1 S+ 0:00 grep --color=auto docker --- Additional comment from XiuJuan Wang on 2016-04-08 03:25 EDT --- --- Additional comment from XiuJuan Wang on 2016-04-08 03:36 EDT --- --- Additional comment from XiuJuan Wang on 2016-04-08 03:41:57 EDT --- I create a jenkins app using jenkins-ephemeral-template in ose-3.2.0.11 env. Image is brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/jenkins-1-rhel7(a414317f519d). The jenkins pod could be running, and can login to jenkins webconsole.But still has some network warning in jenkins pod describe info. scessfully assigned jenkins-1-6zk6t to openshift-120.lab.sjc.redhat.com 10m 10m 1 {kubelet openshift-120.lab.sjc.redhat.com} spec.containers{jenkins} Normal Pulled Container image "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/jenkins-1-rhel7:latest" already present on machine 10m 10m 1 {kubelet openshift-120.lab.sjc.redhat.com} spec.containers{jenkins} Normal Created Created container with docker id 137af75fa572 10m 10m 1 {kubelet openshift-120.lab.sjc.redhat.com} spec.containers{jenkins} Normal Started Started container with docker id 137af75fa572 10m 10m 1 {kubelet openshift-120.lab.sjc.redhat.com} spec.containers{jenkins} Warning Unhealthy Readiness probe failed: Get http://10.2.1.5:8080/login: dial tcp 10.2.1.5:8080: connection refused 9m 9m 1 {kubelet openshift-120.lab.sjc.redhat.com} spec.containers{jenkins} Warning Unhealthy Readiness probe failed: HTTP probe failed with statuscode: 503 9m 9m 1 {kubelet openshift-120.lab.sjc.redhat.com} spec.containers{jenkins} Warning Unhealthy Readiness probe failed: Get http://10.2.1.5:8080/login: read tcp 10.2.1.5:8080: use of closed network connection I add the attachments as devel's request.Hope that is useful. 1) the atomic-openshiftversions, 2) the result of this scripts : https://raw.githubusercontent.com/openshift/openshift-sdn/master/hack/debug.sh 3) the node log 4) the contents of /run/openshift-sdn/docker-network, 'ps ax | grep docker'
Since the bug 1318510 in ose has been fixed almostly, so clone a new bug for online. The registry.access.redhat.com/openshift3/jenkins-1-rhel7(908b6dd3dafb) in online is too old, 7 weeks ago.
Are you actually hitting this issue as described in online? @bparees indicates that there were changes made in between those versions that should have affected this.
Yes,I am hitiing this issue in online. online is using registry.access.redhat.com/openshift3/jenkins-1-rhel7(908b6dd3dafb), It's 7 weeks ago,is too old. I know this bug #1318510 has been fixed partly with brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/jenkins-1-rhel7.Just open this new bug to track issue in online.
today test in OSE-3.2 - RPM Install - RHEL-7.2 - Multitenant - GCE met the same problem registry.access.redhat.com/openshift3/jenkins-1-rhel7 908b6dd3dafb steps as follow: $oc new-app -f https://raw.githubusercontent.com/openshift/origin/master/examples/jenkins/jenkins-ephemeral-template.json # oc get pods NAME READY STATUS RESTARTS AGE jenkins-1-deploy 1/1 Running 0 1m jenkins-1-ijh0t 0/1 CrashLoopBackOff 1 1m [root@dhcp-128-91 backup]# oc describe pod jenkins-1-ijh0t Name: jenkins-1-ijh0t Namespace: wewang Node: qe-shared-master-registry-etcd-1/10.240.0.11 Start Time: Wed, 27 Apr 2016 10:04:59 +0800 Labels: deployment=jenkins-1,deploymentconfig=jenkins,name=jenkins Status: Running IP: 10.2.3.11 Controllers: ReplicationController/jenkins-1 Containers: jenkins: Container ID: docker://55b2c5688a615d0566f0b2183c52e81b8f70d1dbf8cf1914c57049cb31ab3846 Image: registry.access.redhat.com/openshift3/jenkins-1-rhel7:latest Image ID: docker://908b6dd3dafbabbb1cf38b60bb8a281988c04f4953854df19e0ed804fe9d4dfa Port: QoS Tier: cpu: BestEffort memory: Guaranteed Limits: memory: 512Mi Requests: memory: 512Mi State: Running Started: Wed, 27 Apr 2016 10:07:13 +0800 Last State: Terminated Reason: Error Exit Code: 137 Started: Wed, 27 Apr 2016 10:06:24 +0800 Finished: Wed, 27 Apr 2016 10:06:50 +0800 Ready: False Restart Count: 3 Liveness: http-get http://:8080/login delay=30s timeout=3s period=10s #success=1 #failure=3 Readiness: http-get http://:8080/login delay=3s timeout=3s period=10s #success=1 #failure=3 Environment Variables: JENKINS_PASSWORD: password Conditions: Type Status Ready False Volumes: jenkins-data: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: default-token-ll79s: Type: Secret (a volume populated by a Secret) SecretName: default-token-ll79s Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 2m 2m 1 {default-scheduler } Normal Scheduled Successfully assigned jenkins-1-ijh0t to qe-shared-master-registry-etcd-1 2m 2m 1 {kubelet qe-shared-master-registry-etcd-1} spec.containers{jenkins} Normal Pulling pulling image "registry.access.redhat.com/openshift3/jenkins-1-rhel7:latest" 1m 1m 1 {kubelet qe-shared-master-registry-etcd-1} spec.containers{jenkins} Normal Pulled Successfully pulled image "registry.access.redhat.com/openshift3/jenkins-1-rhel7:latest" 1m 1m 1 {kubelet qe-shared-master-registry-etcd-1} spec.containers{jenkins} Normal Created Created container with docker id 236c100c585e 1m 1m 1 {kubelet qe-shared-master-registry-etcd-1} spec.containers{jenkins} Normal Started Started container with docker id 236c100c585e 1m 1m 1 {kubelet qe-shared-master-registry-etcd-1} spec.containers{jenkins} Normal Created Created container with docker id 973805102a71 1m 1m 1 {kubelet qe-shared-master-registry-etcd-1} spec.containers{jenkins} Normal Started Started container with docker id 973805102a71 1m 1m 1 {kubelet qe-shared-master-registry-etcd-1} spec.containers{jenkins} Warning Unhealthy Readiness probe failed: Get http://10.2.3.11:8080/login: dial tcp 10.2.3.11:8080: connection refused 1m 1m 2 {kubelet qe-shared-master-registry-etcd-1} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "jenkins" with CrashLoopBackOff: "Back-off 10s restarting failed container=jenkins pod=jenkins-1-ijh0t_wewang(71f745e1-0c1c-11e6-a122-42010af0000e)" 59s 59s 1 {kubelet qe-shared-master-registry-etcd-1} spec.containers{jenkins} Normal Created Created container with docker id e8579d9573a1 58s 58s 1 {kubelet qe-shared-master-registry-etcd-1} spec.containers{jenkins} Normal Started Started container with docker id e8579d9573a1 1m 52s 2 {kubelet qe-shared-master-registry-etcd-1} spec.containers{jenkins} Warning Unhealthy Readiness probe failed: HTTP probe failed with statuscode: 503 1m 40s 2 {kubelet qe-shared-master-registry-etcd-1} spec.containers{jenkins} Warning Unhealthy Readiness probe failed: Get http://10.2.3.11:8080/login: read tcp 10.2.3.11:8080: use of closed network connection 32s 32s 1 {kubelet qe-shared-master-registry-etcd-1} spec.containers{jenkins} Warning Unhealthy Readiness probe failed: Get http://10.2.3.11:8080/login: read tcp 10.2.3.11:8080: connection reset by peer 1m 23s 4 {kubelet qe-shared-master-registry-etcd-1} spec.containers{jenkins} Warning BackOff Back-off restarting failed docker container 31s 23s 2 {kubelet qe-shared-master-registry-etcd-1} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "jenkins" with CrashLoopBackOff: "Back-off 20s restarting failed container=jenkins pod=jenkins-1-ijh0t_wewang(71f745e1-0c1c-11e6-a122-42010af0000e)" 1m 9s 3 {kubelet qe-shared-master-registry-etcd-1} spec.containers{jenkins} Normal Pulled Container image "registry.access.redhat.com/openshift3/jenkins-1-rhel7:latest" already present on machine 9s 9s 1 {kubelet qe-shared-master-registry-etcd-1} spec.containers{jenkins} Normal Created Created container with docker id 55b2c5688a61 9s 9s 1 {kubelet qe-shared-master-registry-etcd-1} spec.containers{jenkins} Normal Started Started container with docker id 55b2c5688a61
I confirmed that brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/jenkins-1-rhel7 does not crashloop against the readiness probe with the current template. so this should be good to go once we publish the new image. assigning to Troy.
*** Bug 1331617 has been marked as a duplicate of this bug. ***
@Ben registry.access.redhat.com/openshift3/jenkins-1-rhel7:latest has been updated to 37d7c8d851b9(13 days ago). Jenkins pod with persistent volume could be running in online env now. Just have some unhealthy messege in events.Please help to set the bug to on_qa. Thanks! 7m 7m 1 {kubelet ip-172-31-14-21.ec2.internal} spec.containers{jenkins} Warning Unhealthy Readiness probe failed: Get http://10.1.11.10:8080/login: dial tcp 10.1.11.10:8080: connection refused 7m 6m 3 {kubelet ip-172-31-14-21.ec2.internal} spec.containers{jenkins} Warning Unhealthy Readiness probe failed: HTTP probe failed with statuscode: 503 6m 6m 1 {kubelet ip-172-31-14-21.ec2.internal} spec.containers{jenkins} Warning Unhealthy Liveness probe failed: HTTP probe failed with statuscode: 503 14m 14m 1 jenkins DeploymentConfig Warning FailedUpdate {deployment-controller } Cannot update deployment xiuwang/jenkins-1 status to Pending: replicationcontrollers "jenkins-1" cannot be updated: the object has been modified; please apply your changes to the latest version and try again
According to comment #7, move this bug to verified.