Bug 1515358

Summary: [RFE] ensure also non-K8s managed containers get garbage collected
Product: OpenShift Container Platform Reporter: Carsten Lichy-Bittendorf <clichybi>
Component: BuildAssignee: Cesar Wong <cewong>
Status: CLOSED DUPLICATE QA Contact: Wenjing Zheng <wzheng>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.6.1CC: aos-bugs, bparees, jokerman, mmccomas
Target Milestone: ---   
Target Release: 3.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-06 20:30:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Carsten Lichy-Bittendorf 2017-11-20 16:34:47 UTC
1. Proposed title of this feature request

ensure also non-K8s managed containers get garbage collected

2. Who is the customer behind the request?

Account: Allianz Deutschland - #5793039 
- TAM customer: no
- SRM customer: yes
- Strategic: no

3. What is the nature and description of the request?

Get a garbage collection for those containers which aren't managed by kubelet and therefore can block resources when they don't terminate well.

4. Why does the customer need this? (List the business requirements here)

When using OpenShift builds with the Docker build strategy, failed builds leave exited containers behind. These containers are not directly managed by OpenShift as they're created by the Docker build process and have names like "clever_hawking" instead of the usual "k8s_...". As those containers are not managed by kubelet they are not subject of container garbage collection. As consequence exited build containers don't get removed and are slowly filling the disks. After some time, nodes executing builds will accumulate dead containers and their images. These images cannot be removed as there are stopped containers referencing them. These containers need to get garbage collected to ensure the clusters overall health.


5. How would the customer like to achieve this? (List the functional requirements here)

If there would be a pruning for those images available a clean-up via a job or daemon-set would be feasible, like to be defined for images and others.
Such a feature is available in Docker 1.13, see [1], while OCP 3.7 (and RHEL 7.4) still ships Docker 1.12. So an option would be to upgrade to Docker 1.13 or to implement the capability on any other way.
Thereby the feature should have the capability to limit it to touch not K8s managed containers only to ensure that those managed by the kublet get managed by the kublet only.

[1] https://docs.docker.com/engine/reference/commandline/container_prune/

6. For each functional requirement listed, specify how Red Hat and the customer can test to confirm the requirement is successfully implemented.

Create a bunch of those containers, which by today don't get garbage collected and test if they get removed. And only those who are purposed get removed.


7. Is there already an existing RFE upstream or in Red Hat Bugzilla?

None found

8. Does the customer have any specific timeline dependencies and which release would they like to target (i.e. RHEL5, RHEL6)?

As soon as possible, as it impacts the productions environments

9. Is the sales team involved in this request and do they have any additional input?

Sales as no additional input

10. List any affected packages or components.

depends on the solution. Most probably Docker

11. Would the customer be able to assist in testing this functionality if implemented?

Sure, the customer has a sandbox environment, where he can run validations in.

Comment 1 Ben Parees 2017-11-20 18:44:46 UTC
we've updated the code to delete these containers in the case of docker build failures, so these containers will no longer be left around.  Cesar can you point this bug to the PR you delivered for this?

Comment 2 Cesar Wong 2017-11-20 21:10:22 UTC
BZ is https://bugzilla.redhat.com/show_bug.cgi?id=1512679

Comment 3 Ben Parees 2017-12-06 20:30:19 UTC

*** This bug has been marked as a duplicate of bug 1512679 ***