1274242 – Build pod was deleted automaticly after create roughly 1 hour later

Bug 1274242 - Build pod was deleted automaticly after create roughly 1 hour later

Summary: Build pod was deleted automaticly after create roughly 1 hour later

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Build
Sub Component:
Version:	3.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Ben Parees
QA Contact:	Wenjing Zheng
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-10-22 10:56 UTC by XiuJuan Wang
Modified:	2015-11-23 14:43 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-11-23 14:43:17 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description XiuJuan Wang 2015-10-22 10:56:41 UTC

Description of problem:
Build pod was deleted automaticly after create roughly 1 hour later, so can't get build-log anymore.

$ oc build-logs ruby-sample-build-4 -n gits
API error (404): no such id: 7ef69dfb99d623cad1f0351b16b2bfa6c77f36336c4fec3063083e2750ba1155


Version-Release number of selected component (if applicable):
# openshift version
openshift v3.0.2.902
kubernetes v1.2.0-alpha.1-1107-g4c8e6f4
etcd 2.1.2


How reproducible:
always

Steps to Reproduce:
1.Trigger a build
2.Check build log after 1 hour
3.

Actual results:
$ oc  get  builds  -n gits
NAME                   TYPE      FROM      STATUS     STARTED
ruby-sample-build-1    Source    Git       Failed     2 hours ago
ruby-sample-build-2    Source    Git       Failed     About an hour ago
ruby-sample-build-3    Source    Git       Failed     About an hour ago
ruby-sample-build-4    Source    Git       Complete   About an hour ago

$ oc build-logs ruby-sample-build-4 -n gits
API error (404): no such id: 7ef69dfb99d623cad1f0351b16b2bfa6c77f36336c4fec3063083e2750ba1155

Expected results:
Build pod should exist,and can check build-logs.

Additional info:

Comment 2 XiuJuan Wang 2015-11-16 09:15:05 UTC

Could reproduce this bug in ose 3.1 env

oc v3.1.0.4-9-g72d3991
kubernetes v1.1.0-origin-1107-g4c8e6f4

Comment 3 Ben Parees 2015-11-18 18:00:15 UTC

Can you enable loglevel 5 in the openshift master and recreate providing those logs?  We need to determine if the pod is being deleted by our sync logic that tries to delete pods when the build is deleted.

Specifically we'll be looking for trace indicating:
"Handling deletion of build <buildname>"

Comment 4 Andy Goldstein 2015-11-18 18:09:15 UTC

This looks to me like the container has been deleted, but I assume the pod is still there; otherwise, you would get a different error message.

Comment 5 Ben Parees 2015-11-18 20:27:43 UTC

Good point Andy, sounds like this is probably working as designed, though the logging api should report a better error message when the container doesn't exist.  That's probably an upstream issue.

XiuJuan, can you check on your container garbage collection settings?

https://docs.openshift.org/latest/admin_guide/garbage_collection.html#container-garbage-collection

see also this clarification to the docs: 
https://github.com/openshift/openshift-docs/pull/1219/files

Comment 6 XiuJuan Wang 2015-11-19 08:59:43 UTC

Ben,
Check the /etc/origin/node/node-config.yaml file in ose 3.1 env, no the three arguments setting.
kubeletArguments:
  minimum-container-ttl-duration:
    - 10s
  maximum-dead-containers-per-container:
    - 2
  maximum-dead-containers:


And in today's ose env, could streamback build-logs if builds have been created by 3 hours. So will move this bug as verified.
oc v3.1.0.4-5-gebe80f5
kubernetes v1.1.0-origin-1107-g4c8e6f4

$oc  get  builds 
NAME                 TYPE      FROM      STATUS     STARTED       DURATION
ruby-hello-world-1   Docker    Git       Complete   3 hours ago   2m17s
ruby-hello-world-2   Docker    Git       Complete   3 hours ago   2m2s
ruby-hello-world-3   Docker    Git       Complete   3 hours ago   2m6s


The openshift log after setting loglevel=5
111949:Nov 19 16:52:38 openshift-146 atomic-openshift-master: I1119 16:52:38.076031   17677 controller.go:81] Handling build xiuwang/ruby-hello-world-3
113431:Nov 19 16:54:39 openshift-146 atomic-openshift-master: I1119 16:54:39.075729   17677 controller.go:81] Handling build xiuwang/ruby-hello-world-3
113823:Nov 19 16:55:13 openshift-146 atomic-openshift-master: I1119 16:55:13.953638   17677 factory.go:448] Found build pod xiuwang/ruby-hello-world-3-build
113824:Nov 19 16:55:13 openshift-146 atomic-openshift-master: I1119 16:55:13.959613   17677 factory.go:472] Found build xiuwang/ruby-hello-world-3 for pod ruby-hello-world-3-build
113832:Nov 19 16:55:13 openshift-146 atomic-openshift-master: I1119 16:55:13.997732   17677 factory.go:528] Found build xiuwang/ruby-hello-world-3
113833:Nov 19 16:55:13 openshift-146 atomic-openshift-master: I1119 16:55:13.997738   17677 factory.go:530] Ignoring build xiuwang/ruby-hello-world-3 because it is complete
114950:Nov 19 16:56:40 openshift-146 atomic-openshift-master: I1119 16:56:40.074691   17677 controller.go:81] Handling build xiuwang/ruby-hello-world-3

Comment 7 Ben Parees 2015-11-19 15:17:56 UTC

i've opened an upstream issue (https://github.com/kubernetes/kubernetes/issues/17501) for the bad error message that occurs when this happens.

Note You need to log in before you can comment on or make changes to this bug.