1368967 – Failed to deploy jenkins pod after jenkins master build completed

Bug 1368967 - Failed to deploy jenkins pod after jenkins master build completed

Summary: Failed to deploy jenkins pod after jenkins master build completed

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	ImageStreams
Sub Component:
Version:	3.3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Ben Parees
QA Contact:	Wang Haoran
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-08-22 09:13 UTC by Dongbo Yan
Modified:	2017-03-08 18:26 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: Timeout for liveness probe for Jenkins readiness check was too short. Consequence: Jenkins pod would fail to report as ready and get restarted. Fix: Increased the timeout for the readiness probe. Result: Jenkins pod now has sufficient time to start before the readiness probe fails.
Clone Of:
Environment:
Last Closed:	2016-09-27 09:45:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:1933	0	normal	SHIPPED_LIVE	Red Hat OpenShift Container Platform 3.3 Release Advisory	2016-09-27 13:24:36 UTC

Description Dongbo Yan 2016-08-22 09:13:26 UTC

Description of problem:
Failed to deploy jenkins pod after jenkins master build completed. Readiness probe failed: Get http://10.1.2.8:8080/login: dial tcp 10.1.2.8:8080: getsockopt: connection refuse.

Version-Release number of selected component (if applicable):
openshift3/jenkins-1-rhel7 (19bc08e9d803)

How reproducible:
Always

Steps to Reproduce:
1.Create a jenkins master app.
 $ oc new-app https://raw.githubusercontent.com/openshift/origin/master/examples/jenkins/master-slave/jenkins-master-template.json
2.After jenkins master build completed, check jenkins pod
 $ oc desribe pod jenkins-5-iej5i
3.

Actual results:
step2, fail to deploy jenkins pod
oc describe pod jenkins-5-iej5i
Name:			jenkins-5-iej5i
Namespace:		dyan1
Security Policy:	restricted
Node:			ip-172-18-11-249.ec2.internal/172.18.11.249
Start Time:		Mon, 22 Aug 2016 16:54:43 +0800
Labels:			app=jenkins-master
			deployment=jenkins-5
			deploymentconfig=jenkins
			name=jenkins
Status:			Running
IP:			10.1.2.7
Controllers:		ReplicationController/jenkins-5
Containers:
  jenkins:
    Container ID:	docker://9ce9502020e8813e1db0eedb49da542b371288d63dfa1e1766e3e2b2062e3c27
    Image:		172.30.164.193:5000/dyan1/jenkins-master@sha256:b4c7ad3aa165800b7f309697499de7833a79e9112b31b0ce62cc62bbb0e61fbb
    Image ID:		docker://sha256:1b2ac44aa657e8e669f6231beaa8e7fdc82f85d077a9822ec3229c0a26436ee7
    Port:		
    State:		Running
      Started:		Mon, 22 Aug 2016 16:59:31 +0800
    Last State:		Terminated
      Reason:		Error
      Exit Code:	143
      Started:		Mon, 22 Aug 2016 16:57:59 +0800
      Finished:		Mon, 22 Aug 2016 16:58:37 +0800
    Ready:		False
    Restart Count:	5
    Liveness:		http-get http://:8080/login delay=30s timeout=3s period=10s #success=1 #failure=3
    Readiness:		http-get http://:8080/login delay=3s timeout=3s period=10s #success=1 #failure=3
    Volume Mounts:
      /var/lib/jenkins from jenkins-data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-lu3kt (ro)
    Environment Variables:
      JENKINS_PASSWORD:	password
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	False 
  PodScheduled 	True 
Volumes:
  jenkins-data:
    Type:	EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:	
  default-token-lu3kt:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-lu3kt
QoS Tier:	BestEffort
Events:
  FirstSeen	LastSeen	Count	From					SubobjectPath			Type		Reason		Message
  ---------	--------	-----	----					-------------			--------	------		-------
  4m		4m		1	{default-scheduler }							Normal		Scheduled	Successfully assigned jenkins-5-iej5i to ip-172-18-11-249.ec2.internal
  4m		4m		1	{kubelet ip-172-18-11-249.ec2.internal}	spec.containers{jenkins}	Normal		Pulling		pulling image "172.30.164.193:5000/dyan1/jenkins-master@sha256:b4c7ad3aa165800b7f309697499de7833a79e9112b31b0ce62cc62bbb0e61fbb"
  4m		4m		1	{kubelet ip-172-18-11-249.ec2.internal}	spec.containers{jenkins}	Normal		Pulled		Successfully pulled image "172.30.164.193:5000/dyan1/jenkins-master@sha256:b4c7ad3aa165800b7f309697499de7833a79e9112b31b0ce62cc62bbb0e61fbb"
  4m		4m		1	{kubelet ip-172-18-11-249.ec2.internal}	spec.containers{jenkins}	Normal		Created		Created container with docker id 110bbdde5f5b
  4m		4m		1	{kubelet ip-172-18-11-249.ec2.internal}	spec.containers{jenkins}	Normal		Started		Started container with docker id 110bbdde5f5b
  3m		3m		1	{kubelet ip-172-18-11-249.ec2.internal}	spec.containers{jenkins}	Normal		Created		Created container with docker id 5069745e1bb6
  3m		3m		1	{kubelet ip-172-18-11-249.ec2.internal}	spec.containers{jenkins}	Normal		Killing		Killing container with docker id 110bbdde5f5b: pod "jenkins-5-iej5i_dyan1(11d05c47-6846-11e6-a365-0e3b33b501d3)" container "jenkins" is unhealthy, it will be killed and re-created.
  3m		3m		1	{kubelet ip-172-18-11-249.ec2.internal}	spec.containers{jenkins}	Normal		Started		Started container with docker id 5069745e1bb6
  3m		3m		1	{kubelet ip-172-18-11-249.ec2.internal}	spec.containers{jenkins}	Normal		Started		Started container with docker id 0de04f0a114c
  3m		3m		1	{kubelet ip-172-18-11-249.ec2.internal}	spec.containers{jenkins}	Normal		Created		Created container with docker id 0de04f0a114c
  3m		3m		1	{kubelet ip-172-18-11-249.ec2.internal}	spec.containers{jenkins}	Normal		Killing		Killing container with docker id 5069745e1bb6: pod "jenkins-5-iej5i_dyan1(11d05c47-6846-11e6-a365-0e3b33b501d3)" container "jenkins" is unhealthy, it will be killed and re-created.
  2m		2m		1	{kubelet ip-172-18-11-249.ec2.internal}	spec.containers{jenkins}	Normal		Created		Created container with docker id fcc8878bb01e
  2m		2m		1	{kubelet ip-172-18-11-249.ec2.internal}	spec.containers{jenkins}	Normal		Started		Started container with docker id fcc8878bb01e
  2m		2m		1	{kubelet ip-172-18-11-249.ec2.internal}	spec.containers{jenkins}	Normal		Killing		Killing container with docker id 0de04f0a114c: pod "jenkins-5-iej5i_dyan1(11d05c47-6846-11e6-a365-0e3b33b501d3)" container "jenkins" is unhealthy, it will be killed and re-created.
  1m		1m		1	{kubelet ip-172-18-11-249.ec2.internal}	spec.containers{jenkins}	Normal		Killing		Killing container with docker id fcc8878bb01e: pod "jenkins-5-iej5i_dyan1(11d05c47-6846-11e6-a365-0e3b33b501d3)" container "jenkins" is unhealthy, it will be killed and re-created.
  1m		1m		1	{kubelet ip-172-18-11-249.ec2.internal}	spec.containers{jenkins}	Normal		Created		Created container with docker id cad292bf5f25
  1m		1m		1	{kubelet ip-172-18-11-249.ec2.internal}	spec.containers{jenkins}	Normal		Started		Started container with docker id cad292bf5f25
  4m		1m		8	{kubelet ip-172-18-11-249.ec2.internal}	spec.containers{jenkins}	Warning		Unhealthy	Readiness probe failed: Get http://10.1.2.7:8080/login: dial tcp 10.1.2.7:8080: getsockopt: connection refused
  4m		1m		5	{kubelet ip-172-18-11-249.ec2.internal}	spec.containers{jenkins}	Warning		Unhealthy	Readiness probe failed: Get http://10.1.2.7:8080/login: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  4m		1m		7	{kubelet ip-172-18-11-249.ec2.internal}	spec.containers{jenkins}	Warning		Unhealthy	Liveness probe failed: HTTP probe failed with statuscode: 503
  4m		1m		10	{kubelet ip-172-18-11-249.ec2.internal}	spec.containers{jenkins}	Warning		Unhealthy	Readiness probe failed: HTTP probe failed with statuscode: 503
  1m		1m		1	{kubelet ip-172-18-11-249.ec2.internal}	spec.containers{jenkins}	Normal		Killing		Killing container with docker id cad292bf5f25: pod "jenkins-5-iej5i_dyan1(11d05c47-6846-11e6-a365-0e3b33b501d3)" container "jenkins" is unhealthy, it will be killed and re-created.
  1m		23s		6	{kubelet ip-172-18-11-249.ec2.internal}					Warning		FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "jenkins" with CrashLoopBackOff: "Back-off 40s restarting failed container=jenkins pod=jenkins-5-iej5i_dyan1(11d05c47-6846-11e6-a365-0e3b33b501d3)"

  1m	23s	6	{kubelet ip-172-18-11-249.ec2.internal}	spec.containers{jenkins}	Warning	BackOff	Back-off restarting failed docker container
  3m	8s	5	{kubelet ip-172-18-11-249.ec2.internal}	spec.containers{jenkins}	Normal	Pulled	Container image "172.30.164.193:5000/dyan1/jenkins-master@sha256:b4c7ad3aa165800b7f309697499de7833a79e9112b31b0ce62cc62bbb0e61fbb" already present on machine
  8s	8s	1	{kubelet ip-172-18-11-249.ec2.internal}	spec.containers{jenkins}	Normal	Created	Created container with docker id 9ce9502020e8
  8s	8s	1	{kubelet ip-172-18-11-249.ec2.internal}	spec.containers{jenkins}	Normal	Started	Started container with docker id 9ce9502020e8


Expected results:
Could deploy jenkins pod successfully

Additional info:

Comment 1 Ben Parees 2016-08-22 17:35:54 UTC

Seems to work ok with jenkins from pulp.  the jenkins on registry.access.redhat.com is very old and will be updated with 3.3.

QE do you want this as ON_QA now, or when the image publishes to the public registry?

Comment 3 Ben Parees 2016-08-23 02:09:32 UTC

please share the logs from your jenkins pod. I was not able to recreate your issue.

Comment 4 Dongbo Yan 2016-08-23 02:56:11 UTC

pod log, please check
http://pastebin.test.redhat.com/404979

Comment 5 Ben Parees 2016-08-23 15:18:35 UTC

I think the liveness probe was too short, causing the container to be killed and restarted if it did not start fast enough.  Fix is here:
https://github.com/openshift/origin/pull/10593

Comment 6 Dongbo Yan 2016-08-24 09:44:53 UTC

Verified
After changed "initialDelaySeconds" to 120 in liveness probe, deploy successfully

Comment 8 errata-xmlrpc 2016-09-27 09:45:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1933

Note You need to log in before you can comment on or make changes to this bug.