Bug 1318510 - pod is ContainerCreating when using jenkins-ephemeral-template to create jenkins
Summary: pod is ContainerCreating when using jenkins-ephemeral-template to create jenkins
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.2.0
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ---
: ---
Assignee: Dan Williams
QA Contact: Meng Bo
URL:
Whiteboard:
Depends On:
Blocks: 1328727
TreeView+ depends on / blocked
 
Reported: 2016-03-17 06:16 UTC by wewang
Modified: 2017-03-08 18:26 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1328727 (view as bug list)
Environment:
Last Closed: 2016-11-22 23:21:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
jenkins-restart.log (7.95 KB, text/plain)
2016-04-19 07:17 UTC, XiuJuan Wang
no flags Details

Description wewang 2016-03-17 06:16:15 UTC
Online/Image
registry.access.redhat.com/openshift3/jenkins-1-rhel7  908b6dd3dafb

Version-Release number of selected component (if applicable):
kubernetes v1.2.0-alpha.7-703-gbc4550d
Docker 1.8.2-el7, build a01dc02/1.8.2
kernel 3.10.0-327.10.1.el7.x86_64

How reproducible:
always

Description of problem:
 Error syncing pod when using jenkins-ephemeral-template to create jenkins

Steps to Reproduce:
1. $ oc new-project test

2. $oc policy add-role-to-user admin system:serviceaccount:test:default -n test 

3. $oc new-app -f https://raw.githubusercontent.com/openshift/origin/master/examples/jenkins/jenkins-ephemeral-template.json
4. $ Check the pod 
   # oc get pods
NAME               READY     STATUS              RESTARTS   AGE
jenkins-1-deploy   0/1       ContainerCreating   0          1h
[root@dhcp-128-91 backup]# oc describe pod jenkins-1-deploy
Name:        jenkins-1-deploy
Namespace:    test
Image(s):    openshift3/ose-deployer:v3.1.1.910
Node:        ip-172-31-15-139.ec2.internal/172.31.15.139
Start Time:    Thu, 17 Mar 2016 11:46:47 +0800
Labels:        openshift.io/deployer-pod-for.name=jenkins-1
Status:        Pending
Reason:        
Message:    
IP:        
Controllers:    <none>
Containers:
  deployment:
    Container ID:    
    Image:        openshift3/ose-deployer:v3.1.1.910
    Image ID:        
    Port:        
    QoS Tier:
      memory:        BestEffort
      cpu:        BestEffort
    State:        Waiting
      Reason:        ContainerCreating
    Ready:        False
    Restart Count:    0
    Environment Variables:
      KUBERNETES_MASTER:    https://ip-172-31-4-121.ec2.internal
      OPENSHIFT_MASTER:        https://ip-172-31-4-121.ec2.internal
      BEARER_TOKEN_FILE:    /var/run/secrets/kubernetes.io/serviceaccount/token
      OPENSHIFT_CA_DATA:    -----BEGIN CERTIFICATE-----
MIIC5jCCAdCgAwIBAgIBATALBgkqhkiG9w0BAQswJjEkMCIGA1UEAwwbb3BlbnNo
aWZ0LXNpZ25lckAxNDU3MzkxMjUxMB4XDTE2MDMwNzIyNTQxMVoXDTIxMDMwNjIy
NTQxMlowJjEkMCIGA1UEAwwbb3BlbnNoaWZ0LXNpZ25lckAxNDU3MzkxMjUxMIIB
IjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAswpnGk9V5a1/BMPhRPkkY/bz
iad06kV8tzM7KXCga11S74x7D1cJ6TEpRx7PDHrzkfFc6EuGr8h6IgAD9twsxM2I
sUkGfeRbWlBx+UmB7mdpfbdbyWsP0JV/h1pgHXVZbHn3P42KfkpdCOoxfbdXmwBd
fEdlBX6+DVQoNE0leHnoQ1B52uXhYeUJF9yjGh47CHNSnJH1HbdOS3UPb46WYC0e
dp/U7Ho0zFpmsHAJMnSMN0EJU8EZiF8LmSS/S9y27lfK8Wji8W0f6B2bAALHGvwC
J9vCYsci86eOlEcFsLmevEwOxLYaNCe9xFM7ujMk/Ic5fmv+PFgHVPS9WGoFiQID
AQABoyMwITAOBgNVHQ8BAf8EBAMCAKQwDwYDVR0TAQH/BAUwAwEB/zALBgkqhkiG
9w0BAQsDggEBAF6wzswIVTXRHW+26AmIbq6ZQWoJY3Nsw0fYl3wKDOFsBzxKe/Wf
iI0yikZl07m2gY/oBvTzuKiuuiiD7WjMxjbJUKTnLNOzQ2HJ8893SWf0vIFeXyVs
fkTjZFV9yMJyl4pso69zsRurh+7whb7tnxpyCNQ5Dx9S9wQ1tRnSl1p0rrUxh4cc
JyNE6SCHW9rXDlUwqD/9DIqgE3Org8EewMVCH65YwXV2Xny0+wQGIBeThJN9TI7T
HvfhrPXMa8J7yhv2MqjqFAYLbcJh/8fRRNITDuVG5PDlWEY7bGieKo8ElVyShYlH
HNJ2fm3e8L9ZRRzWy4TtC1e++DLXSsz8G04=
-----END CERTIFICATE-----

      OPENSHIFT_DEPLOYMENT_NAME:    jenkins-1
      OPENSHIFT_DEPLOYMENT_NAMESPACE:    test
Conditions:
  Type        Status
  Ready     False 
Volumes:
  deployer-token-a7woy:
    Type:    Secret (a secret that should populate this volume)
    SecretName:    deployer-token-a7woy
Events:
  FirstSeen    LastSeen    Count    From                    SubobjectPath    Type        Reason        Message
  ---------    --------    -----    ----                    -------------    --------    ------        -------
  1h        1h        1    {default-scheduler }                    Normal        Scheduled    Successfully assigned jenkins-1-deploy to ip-172-31-15-139.ec2.internal
  1h        1h        1    {kubelet ip-172-31-15-139.ec2.internal}            Warning        FailedSync    Error syncing pod, skipping: No such container: 8ace102fd4462e67bdbd4eea4f32d0c5257e0f914dbe1f2aac92786acdce1753
  1h        1h        1    {kubelet ip-172-31-15-139.ec2.internal}            Warning        FailedSync    Error syncing pod, skipping: No such container: 1085c0d8ef8cf554dce5048e9e4193f7aef50fe54374f0c229fa4d3bfdec9663
  1h        1h        1    {kubelet ip-172-31-15-139.ec2.internal}            Warning        FailedSync    Error syncing pod, skipping: failed to "StartContainer" for "POD" with RunContainerError: "runContainer: API error (500): Error running DeviceResume dm_task_run failed\n"

  1h    1h    122    {kubelet ip-172-31-15-139.ec2.internal}        Warning    FailedSync    Error syncing pod, skipping: failed to "SetupNetwork" for "jenkins-1-deploy_test" with SetupNetworkError: "Failed to setup network for pod \"jenkins-1-deploy_test(dfeb8369-ebf2-11e5-90a5-0aadb0f8cf89)\" using network plugins \"redhat/openshift-ovs-subnet\": exit status 1; Skipping pod"

  1h    1h    132    {kubelet ip-172-31-15-139.ec2.internal}        Warning    FailedSync    Error syncing pod, skipping: API error (500): Unknown device acc7e11a7e98340f0efcfefefc267e43a07eec266b4960508d3bd005b449a3b5

  1h    21s    296    {kubelet ip-172-31-15-139.ec2.internal}        Warning    FailedSync    Error syncing pod, skipping: API error (500): Unknown device acc7e11a7e98340f0efcfefefc267e43a07eec266b4960508d3bd005b449a3b5

Actual results:
pod is not running

Expected results:

pod should be running status

Comment 1 Ben Parees 2016-03-17 21:59:58 UTC
passing to networking team based on:
 1h    1h    122    {kubelet ip-172-31-15-139.ec2.internal}        Warning    FailedSync    Error syncing pod, skipping: failed to "SetupNetwork" for "jenkins-1-deploy_test" with SetupNetworkError: "Failed to setup network for pod \"jenkins-1-deploy_test(dfeb8369-ebf2-11e5-90a5-0aadb0f8cf89)\" using network plugins \"redhat/openshift-ovs-subnet\": exit status 1; Skipping pod"

Comment 2 wewang 2016-03-18 03:10:00 UTC
Tested it again, pod is  CrashLoopBackOff:

# oc get pods
NAME               READY     STATUS    RESTARTS   AGE
jenkins-1-deploy   1/1       Running   0          38s
jenkins-1-yfahj    0/1       Running   0          34s
# oc get pods
NAME               READY     STATUS             RESTARTS   AGE
jenkins-1-deploy   1/1       Running            0          1m
jenkins-1-yfahj    0/1       CrashLoopBackOff   1          1m
# oc get pods
NAME               READY     STATUS    RESTARTS   AGE
jenkins-1-deploy   1/1       Running   0          1m
jenkins-1-yfahj    0/1       Running   2          1m
# oc get pods
NAME               READY     STATUS    RESTARTS   AGE
jenkins-1-deploy   1/1       Running   0          1m
jenkins-1-yfahj    0/1       Running   2          1m
# oc get pods
NAME               READY     STATUS             RESTARTS   AGE
jenkins-1-deploy   1/1       Running            0          1m
jenkins-1-yfahj    0/1       CrashLoopBackOff   2          1m
# oc describe jenkins-1-yfahj
the server doesn't have a resource type "jenkins-1-yfahj"
#oc describe pod jenkins-1-yfahj
Name:		jenkins-1-yfahj
Namespace:	wewang7
Image(s):	registry.access.redhat.com/openshift3/jenkins-1-rhel7:latest
Node:		ip-172-31-15-139.ec2.internal/172.31.15.139
Start Time:	Fri, 18 Mar 2016 10:43:26 +0800
Labels:		deployment=jenkins-1,deploymentconfig=jenkins,name=jenkins
Status:		Running
Reason:		
Message:	
IP:		10.1.0.237
Controllers:	ReplicationController/jenkins-1
Containers:
  jenkins:
    Container ID:	docker://68050e3ce2b12550e08d3fe62e433c1bf110679500698de950fc60bec201fd83
    Image:		registry.access.redhat.com/openshift3/jenkins-1-rhel7:latest
    Image ID:		docker://908b6dd3dafbabbb1cf38b60bb8a281988c04f4953854df19e0ed804fe9d4dfa
    Port:		
    QoS Tier:
      cpu:	Burstable
      memory:	Guaranteed
    Limits:
      cpu:	1
      memory:	512Mi
    Requests:
      cpu:		60m
      memory:		512Mi
    State:		Waiting
      Reason:		CrashLoopBackOff
    Last State:		Terminated
      Reason:		OOMKilled
      Exit Code:	137
      Started:		Fri, 18 Mar 2016 10:44:39 +0800
      Finished:		Fri, 18 Mar 2016 10:44:54 +0800
    Ready:		False
    Restart Count:	2
    Liveness:		http-get http://:8080/login delay=30s timeout=3s period=10s #success=1 #failure=3
    Readiness:		http-get http://:8080/login delay=3s timeout=3s period=10s #success=1 #failure=3
    Environment Variables:
      JENKINS_PASSWORD:	password
Conditions:
  Type		Status
  Ready 	False 
Volumes:
  jenkins-data:
    Type:	EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:	
  default-token-m8i52:
    Type:	Secret (a secret that should populate this volume)
    SecretName:	default-token-m8i52
Events:
  FirstSeen	LastSeen	Count	From					SubobjectPath			Type		Reason		Message
  ---------	--------	-----	----					-------------			--------	------		-------
  2m		2m		1	{default-scheduler }							Normal		Scheduled	Successfully assigned jenkins-1-yfahj to ip-172-31-15-139.ec2.internal
  2m		2m		1	{kubelet ip-172-31-15-139.ec2.internal}	spec.containers{jenkins}	Normal		Created		Created container with docker id 5a465869c4db
  2m		2m		1	{kubelet ip-172-31-15-139.ec2.internal}	spec.containers{jenkins}	Normal		Started		Started container with docker id 5a465869c4db
  1m		1m		2	{kubelet ip-172-31-15-139.ec2.internal}	spec.containers{jenkins}	Warning		Unhealthy	Readiness probe failed: Get http://10.1.0.237:8080/login: read tcp 10.1.0.237:8080: use of closed network connection
  1m		1m		1	{kubelet ip-172-31-15-139.ec2.internal}	spec.containers{jenkins}	Normal		Created		Created container with docker id 86e52801394e
  1m		1m		1	{kubelet ip-172-31-15-139.ec2.internal}	spec.containers{jenkins}	Normal		Started		Started container with docker id 86e52801394e
  1m		1m		1	{kubelet ip-172-31-15-139.ec2.internal}	spec.containers{jenkins}	Warning		Unhealthy	Readiness probe failed: HTTP probe failed with statuscode: 503
  1m		1m		2	{kubelet ip-172-31-15-139.ec2.internal}					Warning		FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "jenkins" with CrashLoopBackOff: "Back-off 10s restarting failed container=jenkins pod=jenkins-1-yfahj_wewang7(306dc2f6-ecb3-11e5-9d8d-0aadb0f8cf89)"

  2m	51s	3	{kubelet ip-172-31-15-139.ec2.internal}	spec.containers{jenkins}	Normal	Pulled		Container image "registry.access.redhat.com/openshift3/jenkins-1-rhel7:latest" already present on machine
  50s	50s	1	{kubelet ip-172-31-15-139.ec2.internal}	spec.containers{jenkins}	Normal	Created		Created container with docker id 68050e3ce2b1
  50s	50s	1	{kubelet ip-172-31-15-139.ec2.internal}	spec.containers{jenkins}	Normal	Started		Started container with docker id 68050e3ce2b1
  1m	43s	2	{kubelet ip-172-31-15-139.ec2.internal}	spec.containers{jenkins}	Warning	Unhealthy	Readiness probe failed: Get http://10.1.0.237:8080/login: dial tcp 10.1.0.237:8080: connection refused
  1m	19s	5	{kubelet ip-172-31-15-139.ec2.internal}	spec.containers{jenkins}	Warning	BackOff		Back-off restarting failed docker container
  34s	19s	3	{kubelet ip-172-31-15-139.ec2.internal}					Warning	FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "jenkins" with CrashLoopBackOff: "Back-off 20s restarting failed container=jenkins pod=jenkins-1-yfahj_wewang7(306dc2f6-ecb3-11e5-9d8d-0aadb0f8cf89)"

Comment 3 Ben Bennett 2016-03-18 12:29:39 UTC
Can you please get us the output from the openshift node log.  And run the troubleshooting script at https://raw.githubusercontent.com/openshift/openshift-sdn/master/hack/debug.sh

Comment 4 wewang 2016-03-21 06:07:05 UTC
node logs as below, and I have no permission to run the script, can you track the problem with the logs?
[root@dev-preview-int-node-compute-d5623 ~]# docker ps -a | grep jenkins-1-deploy
ab8870ea68fb        openshift3/ose-deployer:v3.1.1.910                                                                                          "/usr/bin/openshift-d"   7 minutes ago       Exited (255) 5 minutes ago                            k8s_deployment.86d070f9_jenkins-1-deploy_test_28d451a5-ef29-11e5-9d8d-0aadb0f8cf89_1eecc842
0229e3387213        openshift3/ose-pod:v3.1.1.910                                                                                               "/pod"                   7 minutes ago       Exited (0) 5 minutes ago                              k8s_POD.e5c1dc5a_jenkins-1-deploy_test_28d451a5-ef29-11e5-9d8d-0aadb0f8cf89_64850473
[root@dev-preview-int-node-compute-d5623 ~]# docker logs ab8870ea68fbdec7bab4b6046ef9e49a7aafb78c428d2f03590896efe4367f2d 
I0321 01:52:59.062514       1 deployer.go:199] Deploying test/jenkins-1 for the first time (replicas: 1)
I0321 01:52:59.066739       1 recreate.go:126] Scaling test/jenkins-1 to 1 before performing acceptance check
F0321 01:55:00.099291       1 deployer.go:69] couldn't scale test/jenkins-1 to 1: timed out waiting for the condition

Comment 5 XiuJuan Wang 2016-03-21 10:05:20 UTC
Met this issue in ose env(3.2/2016-03-18.4)

  0s		0s		1	{kubelet openshift-133.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal		Started		Started container with docker id 1e806e9afc6b
  <invalid>	<invalid>	1	{kubelet openshift-133.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal		Killing		Killing container with docker id 1e806e9afc6b: pod "jenkins-1-a1vio_jenkins(17142722-ef4a-11e5-bcc1-fa163efe3ad5)" container "jenkins" is unhealthy, it will be killed and re-created.
  5m		<invalid>	6	{kubelet openshift-133.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal		Pulled		Container image "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/jenkins-1-rhel7:latest" already present on machine
  <invalid>	<invalid>	1	{kubelet openshift-133.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal		Created		Created container with docker id 24a815714336
  <invalid>	<invalid>	1	{kubelet openshift-133.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal		Started		Started container with docker id 24a815714336
  5m		<invalid>	23	{kubelet openshift-133.lab.sjc.redhat.com}	spec.containers{jenkins}	Warning		Unhealthy	Readiness probe failed: Get http://10.2.1.6:8080/login: dial tcp 10.2.1.6:8080: connection refused
  4m		<invalid>	8	{kubelet openshift-133.lab.sjc.redhat.com}	spec.containers{jenkins}	Warning		Unhealthy	Liveness probe failed: Get http://10.2.1.6:8080/login: dial tcp 10.2.1.6:8080: connection refused

Comment 7 Ben Bennett 2016-03-21 17:18:15 UTC
I need the output from:
  journalctl -flu atomic-openshift-node

(Or perhaps openshift-node... do systemctl status | grep openshift to get the unit name if those fail)

Comment 8 Chris DiGiovanni 2016-03-21 20:46:29 UTC
Ben,

Here are my debug logs that your requested while in the IRC...


Mar 21 15:32:38 node1_vm origin-node[17123]: I0321 15:32:38.315960   17177 plugin.go:138] SetUpPod network plugin output: + lock_file=/var/lock/openshift-sdn.lock
Mar 21 15:32:38 node1_vm origin-node[17123]: + action=setup
Mar 21 15:32:38 node1_vm origin-node[17123]: + net_container=41c971017f78f95334cd8ded26239c1edd6f6f5eb219f9554dd4b7eb2058452f
Mar 21 15:32:38 node1_vm origin-node[17123]: + tenant_id=0
Mar 21 15:32:38 node1_vm origin-node[17123]: + lockwrap run
Mar 21 15:32:38 node1_vm origin-node[17123]: + flock 200
Mar 21 15:32:38 node1_vm origin-node[17123]: + run
Mar 21 15:32:38 node1_vm origin-node[17123]: + get_ipaddr_pid_veth
Mar 21 15:32:38 node1_vm origin-node[17123]: ++ docker inspect --format '{{.HostConfig.NetworkMode}}' 41c971017f78f95334cd8ded26239c1edd6f6f5eb219f9554dd4b7eb2058452f
Mar 21 15:32:38 node1_vm origin-node[17123]: + network_mode=default
Mar 21 15:32:38 node1_vm origin-node[17123]: + '[' default == host ']'
Mar 21 15:32:38 node1_vm origin-node[17123]: + [[ default =~ container:.* ]]
Mar 21 15:32:38 node1_vm origin-node[17123]: ++ docker inspect --format '{{.NetworkSettings.IPAddress}}' 41c971017f78f95334cd8ded26239c1edd6f6f5eb219f9554dd4b7eb2058452f
Mar 21 15:32:38 node1_vm origin-node[17123]: + ipaddr=172.17.0.2
Mar 21 15:32:38 node1_vm origin-node[17123]: ++ docker inspect --format '{{.State.Pid}}' 41c971017f78f95334cd8ded26239c1edd6f6f5eb219f9554dd4b7eb2058452f
Mar 21 15:32:38 node1_vm origin-node[17123]: + pid=28810
Mar 21 15:32:38 node1_vm origin-node[17123]: ++ get_veth_host 28810
Mar 21 15:32:38 node1_vm origin-node[17123]: ++ local pid=28810
Mar 21 15:32:38 node1_vm origin-node[17123]: +++ nsenter -n -t 28810 -- ethtool -S eth0
Mar 21 15:32:38 node1_vm origin-node[17123]: +++ sed -n -e 's/.*peer_ifindex: //p'
Mar 21 15:32:38 node1_vm origin-node[17123]: ++ local veth_ifindex=1358
Mar 21 15:32:38 node1_vm origin-node[17123]: ++ ip link show
Mar 21 15:32:38 node1_vm origin-node[17123]: ++ sed -ne 's/^1358: \([^:@]*\).*/\1/p'
Mar 21 15:32:38 node1_vm origin-node[17123]: + veth_host=veth14e6fda
Mar 21 15:32:38 node1_vm origin-node[17123]: ++ get_container_mac 28810
Mar 21 15:32:38 node1_vm origin-node[17123]: ++ local pid=28810
Mar 21 15:32:38 node1_vm origin-node[17123]: ++ nsenter -n -t 28810 -- ip link show dev eth0
Mar 21 15:32:38 node1_vm origin-node[17123]: ++ sed -n -e 's/.*link.ether \([^ ]*\).*/\1/p'
Mar 21 15:32:38 node1_vm origin-node[17123]: + macaddr=02:42:ac:11:00:02
Mar 21 15:32:38 node1_vm origin-node[17123]: + source /run/openshift-sdn/config.env
Mar 21 15:32:38 node1_vm origin-node[17123]: ++ export OPENSHIFT_CLUSTER_SUBNET=10.1.0.0/16
Mar 21 15:32:38 node1_vm origin-node[17123]: ++ OPENSHIFT_CLUSTER_SUBNET=10.1.0.0/16
Mar 21 15:32:38 node1_vm origin-node[17123]: + case "$action" in
Mar 21 15:32:38 node1_vm origin-node[17123]: + add_ovs_port
Mar 21 15:32:38 node1_vm origin-node[17123]: + brctl delif lbr0 veth14e6fda
Mar 21 15:32:38 node1_vm origin-node[17123]: device veth14e6fda is not a slave of lbr0
Mar 21 15:32:38 node1_vm origin-node[17123]: , exit status 1
Mar 21 15:32:38 node1_vm origin-node[17123]: E0321 15:32:38.316051   17177 manager.go:1791] Failed to setup network for pod "docker-registry-1-deploy_default(4006a8ee-ef9b-11e5-bbd7-0050568848d8)" using network plugins "redhat/openshift-ovs-subnet"
: exit status 1; Skipping pod

Thanks,

digi691

Comment 9 Dan Williams 2016-03-22 18:30:17 UTC
What RPM version of atomic-openshift is installed on this cluster?  "rpm -q atomic-openshift" will tell you...  Sorry if I missed it above.

Comment 10 Dan Williams 2016-03-22 18:32:15 UTC
It's looking like docker's network setup isn't correct.  Can you also grab:

1) the contents of /run/openshift-sdn/docker-network
2) 'ps ax | grep docker'

Comment 11 wewang 2016-03-30 06:44:49 UTC
@Ben ,here is info you need
[root@dev-preview-int-master-167b1 ~]# cat /run/openshift-sdn/docker-network
# This file has been modified by openshift-sdn.
 
DOCKER_NETWORK_OPTIONS='-b=lbr0 --mtu=8951'

[root@dev-preview-int-master-167b1 ~]# ps ax | grep docker
 29489 ?        Ss     0:00 /bin/sh -c /usr/bin/docker daemon $OPTIONS            $DOCKER_STORAGE_OPTIONS            $DOCKER_NETWORK_OPTIONS            $ADD_REGISTRY            $BLOCK_REGISTRY            $INSECURE_REGISTRY            2>&1 | /usr/bin/forward-journald -tag docker
 29490 ?        Sl    64:01 /usr/bin/docker daemon --selinux-enabled --storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/docker_vg-docker--pool --storage-opt dm.use_deferred_removal=true -b=lbr0 --mtu=8951 --add-registry registry.qe.openshift.com --add-registry registry.access.redhat.com
 29491 ?        Sl     0:21 /usr/bin/forward-journald -tag docker
 46201 pts/1    S+     0:00 grep --color=auto docker

Comment 15 wewang 2016-04-18 09:18:17 UTC
using jenkins-ephemeral-template in ose 3.2.0.16 env.
Image is brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/jenkins-1-rhel7(a414317f519d).
error:
[root@dhcp-128-91 backup]# oc get pods
NAME               READY     STATUS             RESTARTS   AGE
jenkins-1-deploy   1/1       Running            0          8m
jenkins-1-o0gh5    0/1       CrashLoopBackOff   6          8m
[root@dhcp-128-91 backup]# oc describe pod jenkins-1-o0gh5
Name:		jenkins-1-o0gh5
Namespace:	wewang
Image(s):	brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/jenkins-1-rhel7:latest
Node:		openshift-210.lab.sjc.redhat.com/10.14.6.210
Start Time:	Mon, 18 Apr 2016 17:08:46 +0800
Labels:		deployment=jenkins-1,deploymentconfig=jenkins,name=jenkins
Status:		Running
Reason:		
Message:	
IP:		10.2.2.3
Controllers:	ReplicationController/jenkins-1
Containers:
  jenkins:
    Container ID:	docker://1dd4a1aa921eff1a0c7200839a0f05302d98226c4f9ffa5a05fdb32f70df265e
    Image:		brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/jenkins-1-rhel7:latest
    Image ID:		docker://a414317f519d1c75b400d848bb658931eb71a4ba1cd7e6075783cd99c9ca8759
    Port:		
    QoS Tier:
      memory:	Guaranteed
      cpu:	BestEffort
    Limits:
      memory:	512Mi
    Requests:
      memory:		512Mi
    State:		Waiting
      Reason:		CrashLoopBackOff
    Last State:		Terminated
      Reason:		Error
      Exit Code:	143
      Started:		Mon, 18 Apr 2016 17:16:04 +0800
      Finished:		Mon, 18 Apr 2016 17:16:36 +0800
    Ready:		False
    Restart Count:	6
    Liveness:		http-get http://:8080/login delay=30s timeout=3s period=10s #success=1 #failure=3
    Readiness:		http-get http://:8080/login delay=3s timeout=3s period=10s #success=1 #failure=3
    Environment Variables:
      JENKINS_PASSWORD:	password
Conditions:
  Type		Status
  Ready 	False 
Volumes:
  jenkins-data:
    Type:	EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:	
  default-token-pimf1:
    Type:	Secret (a secret that should populate this volume)
    SecretName:	default-token-pimf1
Events:
  FirstSeen	LastSeen	Count	From						SubobjectPath			Type		Reason		Message
  ---------	--------	-----	----						-------------			--------	------		-------
  8m		8m		1	{default-scheduler }								Normal		Scheduled	Successfully assigned jenkins-1-o0gh5 to openshift-210.lab.sjc.redhat.com
  8m		8m		1	{kubelet openshift-210.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal		Pulling		pulling image "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/jenkins-1-rhel7:latest"
  7m		7m		1	{kubelet openshift-210.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal		Pulled		Successfully pulled image "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/jenkins-1-rhel7:latest"
  7m		7m		1	{kubelet openshift-210.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal		Created		Created container with docker id ce3582dcba64
  7m		7m		1	{kubelet openshift-210.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal		Started		Started container with docker id ce3582dcba64
  6m		6m		1	{kubelet openshift-210.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal		Killing		Killing container with docker id ce3582dcba64: pod "jenkins-1-o0gh5_wewang(27bc1236-0545-11e6-8ce7-fa163e1af809)" container "jenkins" is unhealthy, it will be killed and re-created.
  6m		6m		1	{kubelet openshift-210.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal		Created		Created container with docker id ff36f822940d
  6m		6m		1	{kubelet openshift-210.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal		Started		Started container with docker id ff36f822940d
  5m		5m		1	{kubelet openshift-210.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal		Killing		Killing container with docker id ff36f822940d: pod "jenkins-1-o0gh5_wewang(27bc1236-0545-11e6-8ce7-fa163e1af809)" container "jenkins" is unhealthy, it will be killed and re-created.
  5m		5m		1	{kubelet openshift-210.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal		Created		Created container with docker id 00f46abdf54d
  5m		5m		1	{kubelet openshift-210.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal		Started		Started container with docker id 00f46abdf54d
  5m		5m		1	{kubelet openshift-210.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal		Killing		Killing container with docker id 00f46abdf54d: pod "jenkins-1-o0gh5_wewang(27bc1236-0545-11e6-8ce7-fa163e1af809)" container "jenkins" is unhealthy, it will be killed and re-created.
  5m		5m		1	{kubelet openshift-210.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal		Created		Created container with docker id 1965693838bf
  5m		5m		1	{kubelet openshift-210.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal		Started		Started container with docker id 1965693838bf
  4m		4m		1	{kubelet openshift-210.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal		Killing		Killing container with docker id 1965693838bf: pod "jenkins-1-o0gh5_wewang(27bc1236-0545-11e6-8ce7-fa163e1af809)" container "jenkins" is unhealthy, it will be killed and re-created.
  4m		4m		1	{kubelet openshift-210.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal		Created		Created container with docker id 51e2f2b0ffae
  4m		4m		1	{kubelet openshift-210.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal		Started		Started container with docker id 51e2f2b0ffae
  3m		3m		1	{kubelet openshift-210.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal		Killing		Killing container with docker id 51e2f2b0ffae: pod "jenkins-1-o0gh5_wewang(27bc1236-0545-11e6-8ce7-fa163e1af809)" container "jenkins" is unhealthy, it will be killed and re-created.
  3m		3m		1	{kubelet openshift-210.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal		Started		Started container with docker id e9aeb42ec2b6
  3m		3m		1	{kubelet openshift-210.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal		Created		Created container with docker id e9aeb42ec2b6
  6m		3m		2	{kubelet openshift-210.lab.sjc.redhat.com}	spec.containers{jenkins}	Warning		Unhealthy	Liveness probe failed: HTTP probe failed with statuscode: 503
  6m		3m		2	{kubelet openshift-210.lab.sjc.redhat.com}	spec.containers{jenkins}	Warning		Unhealthy	Readiness probe failed: HTTP probe failed with statuscode: 503
  3m		3m		1	{kubelet openshift-210.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal		Killing		Killing container with docker id e9aeb42ec2b6: pod "jenkins-1-o0gh5_wewang(27bc1236-0545-11e6-8ce7-fa163e1af809)" container "jenkins" is unhealthy, it will be killed and re-created.
  3m		1m		8	{kubelet openshift-210.lab.sjc.redhat.com}					Warning		FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "jenkins" with CrashLoopBackOff: "Back-off 1m20s restarting failed container=jenkins pod=jenkins-1-o0gh5_wewang(27bc1236-0545-11e6-8ce7-fa163e1af809)"

  6m	1m	6	{kubelet openshift-210.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal	Pulled		Container image "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/jenkins-1-rhel7:latest" already present on machine
  1m	1m	1	{kubelet openshift-210.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal	Created		Created container with docker id 1dd4a1aa921e
  1m	1m	1	{kubelet openshift-210.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal	Started		Started container with docker id 1dd4a1aa921e
  7m	1m	23	{kubelet openshift-210.lab.sjc.redhat.com}	spec.containers{jenkins}	Warning	Unhealthy	Readiness probe failed: Get http://10.2.2.3:8080/login: dial tcp 10.2.2.3:8080: connection refused
  6m	1m	7	{kubelet openshift-210.lab.sjc.redhat.com}	spec.containers{jenkins}	Warning	Unhealthy	Liveness probe failed: Get http://10.2.2.3:8080/login: dial tcp 10.2.2.3:8080: connection refused
  1m	1m	1	{kubelet openshift-210.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal	Killing		Killing container with docker id 1dd4a1aa921e: pod "jenkins-1-o0gh5_wewang(27bc1236-0545-11e6-8ce7-fa163e1af809)" container "jenkins" is unhealthy, it will be killed and re-created.
  3m	14s	14	{kubelet openshift-210.lab.sjc.redhat.com}	spec.containers{jenkins}	Warning	BackOff		Back-off restarting failed docker container
  1m	14s	6	{kubelet openshift-210.lab.sjc.redhat.com}					Warning	FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "jenkins" with CrashLoopBackOff: "Back-off 2m40s restarting failed container=jenkins pod=jenkins-1-o0gh5_wewang(27bc1236-0545-11e6-8ce7-fa163e1af809)"

Comment 16 XiuJuan Wang 2016-04-19 07:12:45 UTC
openshift3/jenkins-1-rhel7(a414317f519d)
As my comment #14 said, my jenkins pods could be running,could login from webconsole and do jenkins job.
But sometimes, met the jenkins pod restart issue,finally the pod will be running(Maybe the networks between Beijing and US is not stable).Paste logs later.

Comment 17 XiuJuan Wang 2016-04-19 07:17:39 UTC
Created attachment 1148362 [details]
jenkins-restart.log

Comment 18 wewang 2016-04-20 03:27:39 UTC
seems pod is running, just a warn when describe pod ,so change severity to low

Comment 19 Dan Williams 2016-04-28 20:19:56 UTC
wewang, your later log messages don't show the SetupNetwork errors that the original report and Chris DiGiovanni had.  Do you still see those SetupNetwork errors?  I think the 'unhealthy container' messages are a different issue.


Note You need to log in before you can comment on or make changes to this bug.