Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1335941

Summary:	Openshift should clean up containers from jobs no longer in the system
Product:	OpenShift Container Platform	Reporter:	Robert Rati <rrati>
Component:	Node	Assignee:	Andy Goldstein <agoldste>
Status:	CLOSED INSUFFICIENT_DATA	QA Contact:	DeShuai Ma <dma>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.2.0	CC:	agoldste, aos-bugs, jokerman, jvyas, mmccomas, rmeggins, tstclair
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-06-03 19:52:36 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1335939
Bug Blocks:

Description Robert Rati 2016-05-13 15:09:54 UTC

Description of problem:
Container information stays around forever, even if a job/pod has been removed from openshift.  This has 2 issues:

1) It uses up disk space
2) It can slow down docker operations

Containers should be removed minimally once a job/pod has been removed/completed in openshift.

Version-Release number of selected component (if applicable):
3.2.0

How reproducible:
100%

Steps to Reproduce:
1. Submit a job to openshift
2. Use oc describe to find the node running the job, etc
3. Remove the job
4. Look on the node that ran the job.  In /var/lib/docker/containers all the information will still exist for that container

Actual results:


Expected results:


Additional info:

Comment 1 Andy Goldstein 2016-05-24 17:34:40 UTC

Is this specific to jobs, pods, or both? Kubernetes will remove containers when their pod is deleted. If a pod is still around but its containers have died and been restarted, the old dead containers are preserved until the container GC thresholds are hit. I believe the defaults are 100 total dead containers, and up to 2 per pod. What this means is that until you have 101 dead containers, nothing will get GC'd (or maybe the 2 per pod cap is applied, I can't remember offhand).

It would be useful to have a specific reproducer if possible.

Comment 2 Andy Goldstein 2016-05-27 15:01:21 UTC

I am unable to reproduce based on the Steps to Reproduce listed above.

Comment 3 Jay Vyas 2016-05-27 15:13:33 UTC

openshift also needs to clean logs as well.  See https://bugzilla.redhat.com/show_bug.cgi?id=1335951 and https://github.com/kubernetes/kubernetes/compare/master...jayunit100:LoggingSoak to systematically reproduce/test logging strain at scale.

Comment 4 Andy Goldstein 2016-05-27 15:15:23 UTC

Ok... but this bz is about data remaining in /var/lib/docker/containers after the job has been deleted. I can't reproduce. Can you?

Comment 5 Jay Vyas 2016-05-27 16:27:40 UTC

updated https://bugzilla.redhat.com/show_bug.cgi?id=1335951 with details regarding the logging soak portion.  that ticket also his details regarding oom exceptions when logging has no breaks.

Comment 6 Andy Goldstein 2016-05-27 16:41:44 UTC

I'm sorry for being a stickler here, but please provide details on whether or not you can reproduce this. Under normal conditions, the container data is removed from /var/lib/docker/containers when the job (and its underlying pod) is deleted. Otherwise I'm going to close this.

Comment 7 Jay Vyas 2016-05-31 13:27:03 UTC

I have not ran any jobs in openshift, ive only reproduced similar errors using raw pod spinups with highly verbose logging..

Comment 8 Andy Goldstein 2016-05-31 14:46:33 UTC

What happened when you had a pod with verbose logging and you tried to delete it? Was the pod itself successfully deleted? Were the containers for the pod deleted?

Comment 9 Timothy St. Clair 2016-06-03 19:52:36 UTC

I did witness the issue, however I'm going to temporarily close this until we can find a valid reproducer.

I've been unable to reproduce it.

Comment 10 Jay Vyas 2016-11-07 13:51:20 UTC

+1 to close  i dont think its reproducible anymore on new openshift/docker versions.