1328727 – [online]Jenkins pod is CrashLoopBackOff when create jenkins application

Bug 1328727 - [online]Jenkins pod is CrashLoopBackOff when create jenkins application

Summary: [online]Jenkins pod is CrashLoopBackOff when create jenkins application

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Online
Classification:	Red Hat
Component:	Image
Sub Component:
Version:	3.x
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Troy Dawson
QA Contact:	Wang Haoran
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1331617 (view as bug list)
Depends On:	1318510
Blocks:
TreeView+	depends on / blocked

Reported:	2016-04-20 07:41 UTC by XiuJuan Wang
Modified:	2016-05-23 15:08 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:	1318510
Environment:
Last Closed:	2016-05-23 15:08:46 UTC
Target Upstream Version:
Embargoed:
Flags:	xiuwang: needinfo-

Attachments	(Terms of Use)

Description XiuJuan Wang 2016-04-20 07:41:15 UTC

+++ This bug was initially created as a clone of Bug #1318510 +++

Online/Image
registry.access.redhat.com/openshift3/jenkins-1-rhel7  908b6dd3dafb

Version-Release number of selected component (if applicable):
kubernetes v1.2.0-alpha.7-703-gbc4550d
Docker 1.8.2-el7, build a01dc02/1.8.2
kernel 3.10.0-327.10.1.el7.x86_64

How reproducible:
always

Description of problem:
 Error syncing pod when using jenkins-ephemeral-template to create jenkins

Steps to Reproduce:
1. $ oc new-project test

2. $oc policy add-role-to-user admin system:serviceaccount:test:default -n test 

3. $oc new-app -f https://raw.githubusercontent.com/openshift/origin/master/examples/jenkins/jenkins-ephemeral-template.json
4. $ Check the pod 
   # oc get pods
NAME               READY     STATUS              RESTARTS   AGE
jenkins-1-deploy   0/1       ContainerCreating   0          1h
[root@dhcp-128-91 backup]# oc describe pod jenkins-1-deploy
Name:        jenkins-1-deploy
Namespace:    test
Image(s):    openshift3/ose-deployer:v3.1.1.910
Node:        ip-172-31-15-139.ec2.internal/172.31.15.139
Start Time:    Thu, 17 Mar 2016 11:46:47 +0800
Labels:        openshift.io/deployer-pod-for.name=jenkins-1
Status:        Pending
Reason:        
Message:    
IP:        
Controllers:    <none>
Containers:
  deployment:
    Container ID:    
    Image:        openshift3/ose-deployer:v3.1.1.910
    Image ID:        
    Port:        
    QoS Tier:
      memory:        BestEffort
      cpu:        BestEffort
    State:        Waiting
      Reason:        ContainerCreating
    Ready:        False
    Restart Count:    0
    Environment Variables:
      KUBERNETES_MASTER:    https://ip-172-31-4-121.ec2.internal
      OPENSHIFT_MASTER:        https://ip-172-31-4-121.ec2.internal
      BEARER_TOKEN_FILE:    /var/run/secrets/kubernetes.io/serviceaccount/token
      OPENSHIFT_CA_DATA:    -----BEGIN CERTIFICATE-----
MIIC5jCCAdCgAwIBAgIBATALBgkqhkiG9w0BAQswJjEkMCIGA1UEAwwbb3BlbnNo
aWZ0LXNpZ25lckAxNDU3MzkxMjUxMB4XDTE2MDMwNzIyNTQxMVoXDTIxMDMwNjIy
NTQxMlowJjEkMCIGA1UEAwwbb3BlbnNoaWZ0LXNpZ25lckAxNDU3MzkxMjUxMIIB
IjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAswpnGk9V5a1/BMPhRPkkY/bz
iad06kV8tzM7KXCga11S74x7D1cJ6TEpRx7PDHrzkfFc6EuGr8h6IgAD9twsxM2I
sUkGfeRbWlBx+UmB7mdpfbdbyWsP0JV/h1pgHXVZbHn3P42KfkpdCOoxfbdXmwBd
fEdlBX6+DVQoNE0leHnoQ1B52uXhYeUJF9yjGh47CHNSnJH1HbdOS3UPb46WYC0e
dp/U7Ho0zFpmsHAJMnSMN0EJU8EZiF8LmSS/S9y27lfK8Wji8W0f6B2bAALHGvwC
J9vCYsci86eOlEcFsLmevEwOxLYaNCe9xFM7ujMk/Ic5fmv+PFgHVPS9WGoFiQID
AQABoyMwITAOBgNVHQ8BAf8EBAMCAKQwDwYDVR0TAQH/BAUwAwEB/zALBgkqhkiG
9w0BAQsDggEBAF6wzswIVTXRHW+26AmIbq6ZQWoJY3Nsw0fYl3wKDOFsBzxKe/Wf
iI0yikZl07m2gY/oBvTzuKiuuiiD7WjMxjbJUKTnLNOzQ2HJ8893SWf0vIFeXyVs
fkTjZFV9yMJyl4pso69zsRurh+7whb7tnxpyCNQ5Dx9S9wQ1tRnSl1p0rrUxh4cc
JyNE6SCHW9rXDlUwqD/9DIqgE3Org8EewMVCH65YwXV2Xny0+wQGIBeThJN9TI7T
HvfhrPXMa8J7yhv2MqjqFAYLbcJh/8fRRNITDuVG5PDlWEY7bGieKo8ElVyShYlH
HNJ2fm3e8L9ZRRzWy4TtC1e++DLXSsz8G04=
-----END CERTIFICATE-----

      OPENSHIFT_DEPLOYMENT_NAME:    jenkins-1
      OPENSHIFT_DEPLOYMENT_NAMESPACE:    test
Conditions:
  Type        Status
  Ready     False 
Volumes:
  deployer-token-a7woy:
    Type:    Secret (a secret that should populate this volume)
    SecretName:    deployer-token-a7woy
Events:
  FirstSeen    LastSeen    Count    From                    SubobjectPath    Type        Reason        Message
  ---------    --------    -----    ----                    -------------    --------    ------        -------
  1h        1h        1    {default-scheduler }                    Normal        Scheduled    Successfully assigned jenkins-1-deploy to ip-172-31-15-139.ec2.internal
  1h        1h        1    {kubelet ip-172-31-15-139.ec2.internal}            Warning        FailedSync    Error syncing pod, skipping: No such container: 8ace102fd4462e67bdbd4eea4f32d0c5257e0f914dbe1f2aac92786acdce1753
  1h        1h        1    {kubelet ip-172-31-15-139.ec2.internal}            Warning        FailedSync    Error syncing pod, skipping: No such container: 1085c0d8ef8cf554dce5048e9e4193f7aef50fe54374f0c229fa4d3bfdec9663
  1h        1h        1    {kubelet ip-172-31-15-139.ec2.internal}            Warning        FailedSync    Error syncing pod, skipping: failed to "StartContainer" for "POD" with RunContainerError: "runContainer: API error (500): Error running DeviceResume dm_task_run failed\n"

  1h    1h    122    {kubelet ip-172-31-15-139.ec2.internal}        Warning    FailedSync    Error syncing pod, skipping: failed to "SetupNetwork" for "jenkins-1-deploy_test" with SetupNetworkError: "Failed to setup network for pod \"jenkins-1-deploy_test(dfeb8369-ebf2-11e5-90a5-0aadb0f8cf89)\" using network plugins \"redhat/openshift-ovs-subnet\": exit status 1; Skipping pod"

  1h    1h    132    {kubelet ip-172-31-15-139.ec2.internal}        Warning    FailedSync    Error syncing pod, skipping: API error (500): Unknown device acc7e11a7e98340f0efcfefefc267e43a07eec266b4960508d3bd005b449a3b5

  1h    21s    296    {kubelet ip-172-31-15-139.ec2.internal}        Warning    FailedSync    Error syncing pod, skipping: API error (500): Unknown device acc7e11a7e98340f0efcfefefc267e43a07eec266b4960508d3bd005b449a3b5

Actual results:
pod is not running

Expected results:

pod should be running status

--- Additional comment from Ben Parees on 2016-03-17 17:59:58 EDT ---

passing to networking team based on:
 1h    1h    122    {kubelet ip-172-31-15-139.ec2.internal}        Warning    FailedSync    Error syncing pod, skipping: failed to "SetupNetwork" for "jenkins-1-deploy_test" with SetupNetworkError: "Failed to setup network for pod \"jenkins-1-deploy_test(dfeb8369-ebf2-11e5-90a5-0aadb0f8cf89)\" using network plugins \"redhat/openshift-ovs-subnet\": exit status 1; Skipping pod"

--- Additional comment from wewang on 2016-03-17 23:10:00 EDT ---

Tested it again, pod is  CrashLoopBackOff:

# oc get pods
NAME               READY     STATUS    RESTARTS   AGE
jenkins-1-deploy   1/1       Running   0          38s
jenkins-1-yfahj    0/1       Running   0          34s
# oc get pods
NAME               READY     STATUS             RESTARTS   AGE
jenkins-1-deploy   1/1       Running            0          1m
jenkins-1-yfahj    0/1       CrashLoopBackOff   1          1m
# oc get pods
NAME               READY     STATUS    RESTARTS   AGE
jenkins-1-deploy   1/1       Running   0          1m
jenkins-1-yfahj    0/1       Running   2          1m
# oc get pods
NAME               READY     STATUS    RESTARTS   AGE
jenkins-1-deploy   1/1       Running   0          1m
jenkins-1-yfahj    0/1       Running   2          1m
# oc get pods
NAME               READY     STATUS             RESTARTS   AGE
jenkins-1-deploy   1/1       Running            0          1m
jenkins-1-yfahj    0/1       CrashLoopBackOff   2          1m
# oc describe jenkins-1-yfahj
the server doesn't have a resource type "jenkins-1-yfahj"
#oc describe pod jenkins-1-yfahj
Name:		jenkins-1-yfahj
Namespace:	wewang7
Image(s):	registry.access.redhat.com/openshift3/jenkins-1-rhel7:latest
Node:		ip-172-31-15-139.ec2.internal/172.31.15.139
Start Time:	Fri, 18 Mar 2016 10:43:26 +0800
Labels:		deployment=jenkins-1,deploymentconfig=jenkins,name=jenkins
Status:		Running
Reason:		
Message:	
IP:		10.1.0.237
Controllers:	ReplicationController/jenkins-1
Containers:
  jenkins:
    Container ID:	docker://68050e3ce2b12550e08d3fe62e433c1bf110679500698de950fc60bec201fd83
    Image:		registry.access.redhat.com/openshift3/jenkins-1-rhel7:latest
    Image ID:		docker://908b6dd3dafbabbb1cf38b60bb8a281988c04f4953854df19e0ed804fe9d4dfa
    Port:		
    QoS Tier:
      cpu:	Burstable
      memory:	Guaranteed
    Limits:
      cpu:	1
      memory:	512Mi
    Requests:
      cpu:		60m
      memory:		512Mi
    State:		Waiting
      Reason:		CrashLoopBackOff
    Last State:		Terminated
      Reason:		OOMKilled
      Exit Code:	137
      Started:		Fri, 18 Mar 2016 10:44:39 +0800
      Finished:		Fri, 18 Mar 2016 10:44:54 +0800
    Ready:		False
    Restart Count:	2
    Liveness:		http-get http://:8080/login delay=30s timeout=3s period=10s #success=1 #failure=3
    Readiness:		http-get http://:8080/login delay=3s timeout=3s period=10s #success=1 #failure=3
    Environment Variables:
      JENKINS_PASSWORD:	password
Conditions:
  Type		Status
  Ready 	False 
Volumes:
  jenkins-data:
    Type:	EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:	
  default-token-m8i52:
    Type:	Secret (a secret that should populate this volume)
    SecretName:	default-token-m8i52
Events:
  FirstSeen	LastSeen	Count	From					SubobjectPath			Type		Reason		Message
  ---------	--------	-----	----					-------------			--------	------		-------
  2m		2m		1	{default-scheduler }							Normal		Scheduled	Successfully assigned jenkins-1-yfahj to ip-172-31-15-139.ec2.internal
  2m		2m		1	{kubelet ip-172-31-15-139.ec2.internal}	spec.containers{jenkins}	Normal		Created		Created container with docker id 5a465869c4db
  2m		2m		1	{kubelet ip-172-31-15-139.ec2.internal}	spec.containers{jenkins}	Normal		Started		Started container with docker id 5a465869c4db
  1m		1m		2	{kubelet ip-172-31-15-139.ec2.internal}	spec.containers{jenkins}	Warning		Unhealthy	Readiness probe failed: Get http://10.1.0.237:8080/login: read tcp 10.1.0.237:8080: use of closed network connection
  1m		1m		1	{kubelet ip-172-31-15-139.ec2.internal}	spec.containers{jenkins}	Normal		Created		Created container with docker id 86e52801394e
  1m		1m		1	{kubelet ip-172-31-15-139.ec2.internal}	spec.containers{jenkins}	Normal		Started		Started container with docker id 86e52801394e
  1m		1m		1	{kubelet ip-172-31-15-139.ec2.internal}	spec.containers{jenkins}	Warning		Unhealthy	Readiness probe failed: HTTP probe failed with statuscode: 503
  1m		1m		2	{kubelet ip-172-31-15-139.ec2.internal}					Warning		FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "jenkins" with CrashLoopBackOff: "Back-off 10s restarting failed container=jenkins pod=jenkins-1-yfahj_wewang7(306dc2f6-ecb3-11e5-9d8d-0aadb0f8cf89)"

  2m	51s	3	{kubelet ip-172-31-15-139.ec2.internal}	spec.containers{jenkins}	Normal	Pulled		Container image "registry.access.redhat.com/openshift3/jenkins-1-rhel7:latest" already present on machine
  50s	50s	1	{kubelet ip-172-31-15-139.ec2.internal}	spec.containers{jenkins}	Normal	Created		Created container with docker id 68050e3ce2b1
  50s	50s	1	{kubelet ip-172-31-15-139.ec2.internal}	spec.containers{jenkins}	Normal	Started		Started container with docker id 68050e3ce2b1
  1m	43s	2	{kubelet ip-172-31-15-139.ec2.internal}	spec.containers{jenkins}	Warning	Unhealthy	Readiness probe failed: Get http://10.1.0.237:8080/login: dial tcp 10.1.0.237:8080: connection refused
  1m	19s	5	{kubelet ip-172-31-15-139.ec2.internal}	spec.containers{jenkins}	Warning	BackOff		Back-off restarting failed docker container
  34s	19s	3	{kubelet ip-172-31-15-139.ec2.internal}					Warning	FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "jenkins" with CrashLoopBackOff: "Back-off 20s restarting failed container=jenkins pod=jenkins-1-yfahj_wewang7(306dc2f6-ecb3-11e5-9d8d-0aadb0f8cf89)"

--- Additional comment from Ben Bennett on 2016-03-18 08:29:39 EDT ---

Can you please get us the output from the openshift node log.  And run the troubleshooting script at https://raw.githubusercontent.com/openshift/openshift-sdn/master/hack/debug.sh

--- Additional comment from wewang on 2016-03-21 02:07:05 EDT ---

node logs as below, and I have no permission to run the script, can you track the problem with the logs?
[root@dev-preview-int-node-compute-d5623 ~]# docker ps -a | grep jenkins-1-deploy
ab8870ea68fb        openshift3/ose-deployer:v3.1.1.910                                                                                          "/usr/bin/openshift-d"   7 minutes ago       Exited (255) 5 minutes ago                            k8s_deployment.86d070f9_jenkins-1-deploy_test_28d451a5-ef29-11e5-9d8d-0aadb0f8cf89_1eecc842
0229e3387213        openshift3/ose-pod:v3.1.1.910                                                                                               "/pod"                   7 minutes ago       Exited (0) 5 minutes ago                              k8s_POD.e5c1dc5a_jenkins-1-deploy_test_28d451a5-ef29-11e5-9d8d-0aadb0f8cf89_64850473
[root@dev-preview-int-node-compute-d5623 ~]# docker logs ab8870ea68fbdec7bab4b6046ef9e49a7aafb78c428d2f03590896efe4367f2d 
I0321 01:52:59.062514       1 deployer.go:199] Deploying test/jenkins-1 for the first time (replicas: 1)
I0321 01:52:59.066739       1 recreate.go:126] Scaling test/jenkins-1 to 1 before performing acceptance check
F0321 01:55:00.099291       1 deployer.go:69] couldn't scale test/jenkins-1 to 1: timed out waiting for the condition

--- Additional comment from XiuJuan Wang on 2016-03-21 06:05:20 EDT ---

Met this issue in ose env（3.2/2016-03-18.4）

  0s		0s		1	{kubelet openshift-133.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal		Started		Started container with docker id 1e806e9afc6b
  <invalid>	<invalid>	1	{kubelet openshift-133.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal		Killing		Killing container with docker id 1e806e9afc6b: pod "jenkins-1-a1vio_jenkins(17142722-ef4a-11e5-bcc1-fa163efe3ad5)" container "jenkins" is unhealthy, it will be killed and re-created.
  5m		<invalid>	6	{kubelet openshift-133.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal		Pulled		Container image "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/jenkins-1-rhel7:latest" already present on machine
  <invalid>	<invalid>	1	{kubelet openshift-133.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal		Created		Created container with docker id 24a815714336
  <invalid>	<invalid>	1	{kubelet openshift-133.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal		Started		Started container with docker id 24a815714336
  5m		<invalid>	23	{kubelet openshift-133.lab.sjc.redhat.com}	spec.containers{jenkins}	Warning		Unhealthy	Readiness probe failed: Get http://10.2.1.6:8080/login: dial tcp 10.2.1.6:8080: connection refused
  4m		<invalid>	8	{kubelet openshift-133.lab.sjc.redhat.com}	spec.containers{jenkins}	Warning		Unhealthy	Liveness probe failed: Get http://10.2.1.6:8080/login: dial tcp 10.2.1.6:8080: connection refused

--- Additional comment from wewang on 2016-03-21 06:10:18 EDT ---

 ose env of Comment5 is :
openshift v3.2.0.5
kubernetes v1.2.0-36-g4a3f9c5
etcd 2.2.5
brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/jenkins-1-rhel7
d84481f80ece

--- Additional comment from Ben Bennett on 2016-03-21 13:18:15 EDT ---

I need the output from:
  journalctl -flu atomic-openshift-node

(Or perhaps openshift-node... do systemctl status | grep openshift to get the unit name if those fail)

--- Additional comment from Chris DiGiovanni on 2016-03-21 16:46:29 EDT ---

Ben,

Here are my debug logs that your requested while in the IRC...


Mar 21 15:32:38 node1_vm origin-node[17123]: I0321 15:32:38.315960   17177 plugin.go:138] SetUpPod network plugin output: + lock_file=/var/lock/openshift-sdn.lock
Mar 21 15:32:38 node1_vm origin-node[17123]: + action=setup
Mar 21 15:32:38 node1_vm origin-node[17123]: + net_container=41c971017f78f95334cd8ded26239c1edd6f6f5eb219f9554dd4b7eb2058452f
Mar 21 15:32:38 node1_vm origin-node[17123]: + tenant_id=0
Mar 21 15:32:38 node1_vm origin-node[17123]: + lockwrap run
Mar 21 15:32:38 node1_vm origin-node[17123]: + flock 200
Mar 21 15:32:38 node1_vm origin-node[17123]: + run
Mar 21 15:32:38 node1_vm origin-node[17123]: + get_ipaddr_pid_veth
Mar 21 15:32:38 node1_vm origin-node[17123]: ++ docker inspect --format '{{.HostConfig.NetworkMode}}' 41c971017f78f95334cd8ded26239c1edd6f6f5eb219f9554dd4b7eb2058452f
Mar 21 15:32:38 node1_vm origin-node[17123]: + network_mode=default
Mar 21 15:32:38 node1_vm origin-node[17123]: + '[' default == host ']'
Mar 21 15:32:38 node1_vm origin-node[17123]: + [[ default =~ container:.* ]]
Mar 21 15:32:38 node1_vm origin-node[17123]: ++ docker inspect --format '{{.NetworkSettings.IPAddress}}' 41c971017f78f95334cd8ded26239c1edd6f6f5eb219f9554dd4b7eb2058452f
Mar 21 15:32:38 node1_vm origin-node[17123]: + ipaddr=172.17.0.2
Mar 21 15:32:38 node1_vm origin-node[17123]: ++ docker inspect --format '{{.State.Pid}}' 41c971017f78f95334cd8ded26239c1edd6f6f5eb219f9554dd4b7eb2058452f
Mar 21 15:32:38 node1_vm origin-node[17123]: + pid=28810
Mar 21 15:32:38 node1_vm origin-node[17123]: ++ get_veth_host 28810
Mar 21 15:32:38 node1_vm origin-node[17123]: ++ local pid=28810
Mar 21 15:32:38 node1_vm origin-node[17123]: +++ nsenter -n -t 28810 -- ethtool -S eth0
Mar 21 15:32:38 node1_vm origin-node[17123]: +++ sed -n -e 's/.*peer_ifindex: //p'
Mar 21 15:32:38 node1_vm origin-node[17123]: ++ local veth_ifindex=1358
Mar 21 15:32:38 node1_vm origin-node[17123]: ++ ip link show
Mar 21 15:32:38 node1_vm origin-node[17123]: ++ sed -ne 's/^1358: \([^:@]*\).*/\1/p'
Mar 21 15:32:38 node1_vm origin-node[17123]: + veth_host=veth14e6fda
Mar 21 15:32:38 node1_vm origin-node[17123]: ++ get_container_mac 28810
Mar 21 15:32:38 node1_vm origin-node[17123]: ++ local pid=28810
Mar 21 15:32:38 node1_vm origin-node[17123]: ++ nsenter -n -t 28810 -- ip link show dev eth0
Mar 21 15:32:38 node1_vm origin-node[17123]: ++ sed -n -e 's/.*link.ether \([^ ]*\).*/\1/p'
Mar 21 15:32:38 node1_vm origin-node[17123]: + macaddr=02:42:ac:11:00:02
Mar 21 15:32:38 node1_vm origin-node[17123]: + source /run/openshift-sdn/config.env
Mar 21 15:32:38 node1_vm origin-node[17123]: ++ export OPENSHIFT_CLUSTER_SUBNET=10.1.0.0/16
Mar 21 15:32:38 node1_vm origin-node[17123]: ++ OPENSHIFT_CLUSTER_SUBNET=10.1.0.0/16
Mar 21 15:32:38 node1_vm origin-node[17123]: + case "$action" in
Mar 21 15:32:38 node1_vm origin-node[17123]: + add_ovs_port
Mar 21 15:32:38 node1_vm origin-node[17123]: + brctl delif lbr0 veth14e6fda
Mar 21 15:32:38 node1_vm origin-node[17123]: device veth14e6fda is not a slave of lbr0
Mar 21 15:32:38 node1_vm origin-node[17123]: , exit status 1
Mar 21 15:32:38 node1_vm origin-node[17123]: E0321 15:32:38.316051   17177 manager.go:1791] Failed to setup network for pod "docker-registry-1-deploy_default(4006a8ee-ef9b-11e5-bbd7-0050568848d8)" using network plugins "redhat/openshift-ovs-subnet"
: exit status 1; Skipping pod

Thanks,

digi691

--- Additional comment from Dan Williams on 2016-03-22 14:30:17 EDT ---

What RPM version of atomic-openshift is installed on this cluster?  "rpm -q atomic-openshift" will tell you...  Sorry if I missed it above.

--- Additional comment from Dan Williams on 2016-03-22 14:32:15 EDT ---

It's looking like docker's network setup isn't correct.  Can you also grab:

1) the contents of /run/openshift-sdn/docker-network
2) 'ps ax | grep docker'

--- Additional comment from wewang on 2016-03-30 02:44:49 EDT ---

@Ben ,here is info you need
[root@dev-preview-int-master-167b1 ~]# cat /run/openshift-sdn/docker-network
# This file has been modified by openshift-sdn.
 
DOCKER_NETWORK_OPTIONS='-b=lbr0 --mtu=8951'

[root@dev-preview-int-master-167b1 ~]# ps ax | grep docker
 29489 ?        Ss     0:00 /bin/sh -c /usr/bin/docker daemon $OPTIONS            $DOCKER_STORAGE_OPTIONS            $DOCKER_NETWORK_OPTIONS            $ADD_REGISTRY            $BLOCK_REGISTRY            $INSECURE_REGISTRY            2>&1 | /usr/bin/forward-journald -tag docker
 29490 ?        Sl    64:01 /usr/bin/docker daemon --selinux-enabled --storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/docker_vg-docker--pool --storage-opt dm.use_deferred_removal=true -b=lbr0 --mtu=8951 --add-registry registry.qe.openshift.com --add-registry registry.access.redhat.com
 29491 ?        Sl     0:21 /usr/bin/forward-journald -tag docker
 46201 pts/1    S+     0:00 grep --color=auto docker

--- Additional comment from XiuJuan Wang on 2016-04-08 03:25 EDT ---



--- Additional comment from XiuJuan Wang on 2016-04-08 03:36 EDT ---



--- Additional comment from XiuJuan Wang on 2016-04-08 03:41:57 EDT ---

I create a jenkins app using jenkins-ephemeral-template in ose-3.2.0.11 env.
Image is brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/jenkins-1-rhel7(a414317f519d).
The jenkins pod could be running, and can login to jenkins webconsole.But still has some network warning in jenkins pod describe info.

scessfully assigned jenkins-1-6zk6t to openshift-120.lab.sjc.redhat.com
  10m		10m		1	{kubelet openshift-120.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal		Pulled		Container image "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/jenkins-1-rhel7:latest" already present on machine
  10m		10m		1	{kubelet openshift-120.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal		Created		Created container with docker id 137af75fa572
  10m		10m		1	{kubelet openshift-120.lab.sjc.redhat.com}	spec.containers{jenkins}	Normal		Started		Started container with docker id 137af75fa572
  10m		10m		1	{kubelet openshift-120.lab.sjc.redhat.com}	spec.containers{jenkins}	Warning		Unhealthy	Readiness probe failed: Get http://10.2.1.5:8080/login: dial tcp 10.2.1.5:8080: connection refused
  9m		9m		1	{kubelet openshift-120.lab.sjc.redhat.com}	spec.containers{jenkins}	Warning		Unhealthy	Readiness probe failed: HTTP probe failed with statuscode: 503
  9m		9m		1	{kubelet openshift-120.lab.sjc.redhat.com}	spec.containers{jenkins}	Warning		Unhealthy	Readiness probe failed: Get http://10.2.1.5:8080/login: read tcp 10.2.1.5:8080: use of closed network connection

I add the attachments as devel's request.Hope that is useful.
1) the atomic-openshiftversions,   
2) the result of this scripts : https://raw.githubusercontent.com/openshift/openshift-sdn/master/hack/debug.sh  3) the node log 
4) the contents of /run/openshift-sdn/docker-network,  'ps ax | grep docker'

Comment 1 XiuJuan Wang 2016-04-20 07:45:02 UTC

Since the bug 1318510 in ose has been fixed almostly, so clone a new bug for online.
The registry.access.redhat.com/openshift3/jenkins-1-rhel7(908b6dd3dafb) in online is too old, 7 weeks ago.

Comment 2 Ben Bennett 2016-04-26 14:32:33 UTC

Are you actually hitting this issue as described in online?  @bparees indicates that there were changes made in between those versions that should have affected this.

Comment 3 XiuJuan Wang 2016-04-27 02:36:14 UTC

Yes,I am hitiing this issue in online.
online is using registry.access.redhat.com/openshift3/jenkins-1-rhel7(908b6dd3dafb), It's 7 weeks ago,is too old.
I know this bug #1318510 has been fixed partly with brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/jenkins-1-rhel7.Just open this new bug to track issue in online.

Comment 4 wewang 2016-04-27 03:20:47 UTC

today test in OSE-3.2 - RPM Install - RHEL-7.2 - Multitenant - GCE
met the same problem 
registry.access.redhat.com/openshift3/jenkins-1-rhel7  908b6dd3dafb

steps as follow:

$oc new-app -f https://raw.githubusercontent.com/openshift/origin/master/examples/jenkins/jenkins-ephemeral-template.json
# oc get pods
NAME               READY     STATUS             RESTARTS   AGE
jenkins-1-deploy   1/1       Running            0          1m
jenkins-1-ijh0t    0/1       CrashLoopBackOff   1          1m

[root@dhcp-128-91 backup]# oc describe pod jenkins-1-ijh0t
Name:		jenkins-1-ijh0t
Namespace:	wewang
Node:		qe-shared-master-registry-etcd-1/10.240.0.11
Start Time:	Wed, 27 Apr 2016 10:04:59 +0800
Labels:		deployment=jenkins-1,deploymentconfig=jenkins,name=jenkins
Status:		Running
IP:		10.2.3.11
Controllers:	ReplicationController/jenkins-1
Containers:
  jenkins:
    Container ID:	docker://55b2c5688a615d0566f0b2183c52e81b8f70d1dbf8cf1914c57049cb31ab3846
    Image:		registry.access.redhat.com/openshift3/jenkins-1-rhel7:latest
    Image ID:		docker://908b6dd3dafbabbb1cf38b60bb8a281988c04f4953854df19e0ed804fe9d4dfa
    Port:		
    QoS Tier:
      cpu:	BestEffort
      memory:	Guaranteed
    Limits:
      memory:	512Mi
    Requests:
      memory:		512Mi
    State:		Running
      Started:		Wed, 27 Apr 2016 10:07:13 +0800
    Last State:		Terminated
      Reason:		Error
      Exit Code:	137
      Started:		Wed, 27 Apr 2016 10:06:24 +0800
      Finished:		Wed, 27 Apr 2016 10:06:50 +0800
    Ready:		False
    Restart Count:	3
    Liveness:		http-get http://:8080/login delay=30s timeout=3s period=10s #success=1 #failure=3
    Readiness:		http-get http://:8080/login delay=3s timeout=3s period=10s #success=1 #failure=3
    Environment Variables:
      JENKINS_PASSWORD:	password
Conditions:
  Type		Status
  Ready 	False 
Volumes:
  jenkins-data:
    Type:	EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:	
  default-token-ll79s:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-ll79s
Events:
  FirstSeen	LastSeen	Count	From						SubobjectPath			Type		Reason		Message
  ---------	--------	-----	----						-------------			--------	------		-------
  2m		2m		1	{default-scheduler }								Normal		Scheduled	Successfully assigned jenkins-1-ijh0t to qe-shared-master-registry-etcd-1
  2m		2m		1	{kubelet qe-shared-master-registry-etcd-1}	spec.containers{jenkins}	Normal		Pulling		pulling image "registry.access.redhat.com/openshift3/jenkins-1-rhel7:latest"
  1m		1m		1	{kubelet qe-shared-master-registry-etcd-1}	spec.containers{jenkins}	Normal		Pulled		Successfully pulled image "registry.access.redhat.com/openshift3/jenkins-1-rhel7:latest"
  1m		1m		1	{kubelet qe-shared-master-registry-etcd-1}	spec.containers{jenkins}	Normal		Created		Created container with docker id 236c100c585e
  1m		1m		1	{kubelet qe-shared-master-registry-etcd-1}	spec.containers{jenkins}	Normal		Started		Started container with docker id 236c100c585e
  1m		1m		1	{kubelet qe-shared-master-registry-etcd-1}	spec.containers{jenkins}	Normal		Created		Created container with docker id 973805102a71
  1m		1m		1	{kubelet qe-shared-master-registry-etcd-1}	spec.containers{jenkins}	Normal		Started		Started container with docker id 973805102a71
  1m		1m		1	{kubelet qe-shared-master-registry-etcd-1}	spec.containers{jenkins}	Warning		Unhealthy	Readiness probe failed: Get http://10.2.3.11:8080/login: dial tcp 10.2.3.11:8080: connection refused
  1m		1m		2	{kubelet qe-shared-master-registry-etcd-1}					Warning		FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "jenkins" with CrashLoopBackOff: "Back-off 10s restarting failed container=jenkins pod=jenkins-1-ijh0t_wewang(71f745e1-0c1c-11e6-a122-42010af0000e)"

  59s	59s	1	{kubelet qe-shared-master-registry-etcd-1}	spec.containers{jenkins}	Normal	Created		Created container with docker id e8579d9573a1
  58s	58s	1	{kubelet qe-shared-master-registry-etcd-1}	spec.containers{jenkins}	Normal	Started		Started container with docker id e8579d9573a1
  1m	52s	2	{kubelet qe-shared-master-registry-etcd-1}	spec.containers{jenkins}	Warning	Unhealthy	Readiness probe failed: HTTP probe failed with statuscode: 503
  1m	40s	2	{kubelet qe-shared-master-registry-etcd-1}	spec.containers{jenkins}	Warning	Unhealthy	Readiness probe failed: Get http://10.2.3.11:8080/login: read tcp 10.2.3.11:8080: use of closed network connection
  32s	32s	1	{kubelet qe-shared-master-registry-etcd-1}	spec.containers{jenkins}	Warning	Unhealthy	Readiness probe failed: Get http://10.2.3.11:8080/login: read tcp 10.2.3.11:8080: connection reset by peer
  1m	23s	4	{kubelet qe-shared-master-registry-etcd-1}	spec.containers{jenkins}	Warning	BackOff		Back-off restarting failed docker container
  31s	23s	2	{kubelet qe-shared-master-registry-etcd-1}					Warning	FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "jenkins" with CrashLoopBackOff: "Back-off 20s restarting failed container=jenkins pod=jenkins-1-ijh0t_wewang(71f745e1-0c1c-11e6-a122-42010af0000e)"

  1m	9s	3	{kubelet qe-shared-master-registry-etcd-1}	spec.containers{jenkins}	Normal	Pulled	Container image "registry.access.redhat.com/openshift3/jenkins-1-rhel7:latest" already present on machine
  9s	9s	1	{kubelet qe-shared-master-registry-etcd-1}	spec.containers{jenkins}	Normal	Created	Created container with docker id 55b2c5688a61
  9s	9s	1	{kubelet qe-shared-master-registry-etcd-1}	spec.containers{jenkins}	Normal	Started	Started container with docker id 55b2c5688a61

Comment 5 Ben Parees 2016-04-28 19:45:05 UTC

I confirmed that brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/jenkins-1-rhel7 does not crashloop against the readiness probe with the current template.

so this should be good to go once we publish the new image.  assigning to Troy.

Comment 6 XiuJuan Wang 2016-04-29 07:47:14 UTC

*** Bug 1331617 has been marked as a duplicate of this bug. ***

Comment 7 XiuJuan Wang 2016-05-10 03:29:49 UTC

@Ben
registry.access.redhat.com/openshift3/jenkins-1-rhel7:latest has been updated to 37d7c8d851b9(13 days ago).

Jenkins pod with persistent volume could be running in online env now.
Just have some unhealthy messege in events.Please help to set the bug to on_qa.

Thanks!

  7m		7m		1	{kubelet ip-172-31-14-21.ec2.internal}	spec.containers{jenkins}	Warning		Unhealthy		Readiness probe failed: Get http://10.1.11.10:8080/login: dial tcp 10.1.11.10:8080: connection refused
  7m		6m		3	{kubelet ip-172-31-14-21.ec2.internal}	spec.containers{jenkins}	Warning		Unhealthy		Readiness probe failed: HTTP probe failed with statuscode: 503
  6m		6m		1	{kubelet ip-172-31-14-21.ec2.internal}	spec.containers{jenkins}	Warning		Unhealthy		Liveness probe failed: HTTP probe failed with statuscode: 503
14m         14m        1         jenkins            DeploymentConfig                                      Warning   FailedUpdate        {deployment-controller }                 Cannot update deployment xiuwang/jenkins-1 status to Pending: replicationcontrollers "jenkins-1" cannot be updated: the object has been modified; please apply your changes to the latest version and try again

Comment 12 XiuJuan Wang 2016-05-11 03:12:37 UTC

According to comment #7, move this bug to verified.

Note You need to log in before you can comment on or make changes to this bug.