Bug 1320327

Summary: Garbage Collector not removing dead/exited OpenShift Containers
Product: OpenShift Container Platform Reporter: Eric Jones <erjones>
Component: NodeAssignee: Andy Goldstein <agoldste>
Status: CLOSED NOTABUG QA Contact: DeShuai Ma <dma>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.1.0CC: aos-bugs, erjones, jokerman, mmccomas
Target Milestone: ---Keywords: UpcomingRelease
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
OpenShift Enterprise 3.1.1.6
Last Closed: 2016-03-28 16:27:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Eric Jones 2016-03-22 20:39:58 UTC
Description of problem:
Customer has noticed that there are over 100 containers when they run `docker ps -a` even after lowering the maximum-dead-containers and maximum-dead-containers-per-container. The containers have been confirmed to be OpenShift containers (name starts with k8s_).

Version-Release number of selected component (if applicable):
OpenShift 3.1.1.6

How reproducible:
I was unable to reproduce it

Actual results:
Containers are not being deleted

Expected results:
Containers deleted

Additional info:
Customer lowered maximum-dead-containers to 25 and maximum-dead-containers-per-container to 1

Comment 1 Andy Goldstein 2016-03-22 21:19:56 UTC
I tested this with a modified version of 3.1.1.6 to add some debugging to the container GC logic so I could see what it was doing. It correctly deleted containers to get me down to the maximum number I had specified.

Could we please get:

- docker ps -a
- oc get pod -o yaml
- for a container you expect to be deleted, docker inspect <container>

Comment 2 Andy Goldstein 2016-03-23 13:14:48 UTC
Eric, please see comment #1

Comment 4 Andy Goldstein 2016-03-24 17:24:30 UTC
Customer reported 104 total containers on the node, 4 running. Which means 100 are dead. It sounds like the node has been configured with max dead containers = 100, and it looks like this is working properly. Could we confirm what's in node-config.yaml under kubeletArguments? Could they provide a copy of that section of the config file?

Comment 6 Eric Jones 2016-03-28 13:13:17 UTC
Hi Andy,

Customer was unable to provide the full file but they did provide the following:

kubletArguments:
  maximum-dead-containers-per-container:
    - "1"
  maximum-dead-containers:
    - "25"

Comment 7 Eric Jones 2016-03-28 16:27:45 UTC
As shown in comment #6, customer had a typo in the config file.

kubletArguments    Should be    kubeletArguments

Rather than providing an error/warning message the config file simply skipped over that section and used the default settings (1m, 2, 100). I am closing this bug and I have filed bug #1321622 as an RFE to provide an error/warning if something like this occurs