Bug 1408309

Summary: Conflicting defaults prevents Image Garbage Collection
Product: OpenShift Container Platform Reporter: Jean Abraham <jeabraha>
Component: NodeAssignee: Seth Jennings <sjenning>
Status: CLOSED ERRATA QA Contact: Zhang Cheng <chezhang>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.4.0CC: aos-bugs, decarr, eparis, jokerman, mmccomas, tdawson
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Fixes an issue where docker refuses to start new containers due to reaching dm.min_free_space (default 10%) but the devicemapper thin pool usage does not exceed image-gc-high-threshold (default 90%), so the image reclaim occurs thus, the node is stuck. This is fixed by changing the default image-gc-high-threshold to 85%, which causes image reclaim to occur before the default dm.min_free_space is reached.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-04-12 19:08:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jean Abraham 2016-12-22 20:46:48 UTC
Description of problem:
The dm.min_free_space defaults to 10%, which "specifies the min free space percent in a thin pool require for new device creation to succeed....Whenever a new a thin pool device is created (during docker pull or during container creation), the Engine checks if the minimum free space is available. If sufficient space is unavailable, then device creation fails and any relevant docker operation fails." [1]

This setting is preventing the storage usage to cross the 90% limit. However, image GC is expected to kick in only beyond image-gc-high-threshold. The image-gc-high-threshold has a default value of 90%, and hence GC never triggers. If image-gc-high-threshold is set to a value lower than (100 - dm.min_free_space)%, GC triggers.

https://github.com/docker/docker/blob/master/docs/reference/commandline/dockerd.md#storage-driver-options

How reproducible:
Everytime.

Steps to Reproduce:
Allow images on disk consumption % to exceed image-gc-high-threshold, with Kubernetes and docker defaults.

Actual results:
GC does not trigger due to conflicting defaults between Kubernetes (image-gc-high-threshold) and Docker (dm.min_free_space)

Expected results:
Same as actual.

Suggestion is to set out-of-the-box image-gc-high-threshold default to a value lower than 90%, to prevent conflicting defaults.

Comment 3 Seth Jennings 2017-01-25 17:45:21 UTC
Upstream PR:
https://github.com/kubernetes/kubernetes/pull/40432

Comment 4 Derek Carr 2017-02-03 15:59:15 UTC
Origin PR:
https://github.com/openshift/origin/pull/12762

Comment 5 Troy Dawson 2017-02-06 19:24:45 UTC
This has been merged into ocp and is in OCP v3.5.0.17 or newer.

Comment 7 Zhang Cheng 2017-02-10 06:31:38 UTC
Verified.
openshift v3.5.0.18+9a5d1aa
kubernetes v1.5.2+43a9be4

1. Check default value of node-config.yaml
  image-gc-high-threshold:
  - '85'

2. Docker pull images in node, let Disk usage on "rhel-docker--pool" more than 85%, and less than 90%.

3. Check images count and node log
[root@qe-chezhang-node-registry-router-2 ~]# docker images | wc -l
15

[root@qe-chezhang-node-registry-router-2 ~]# journalctl -u atomic-openshift-node | grep "over the high threshold"
Feb 10 01:10:12 qe-chezhang-node-registry-router-2 atomic-openshift-node[26506]: I0210 01:10:12.321159   26570 image_gc_manager.go:270] [imageGCManager]: Disk usage on "rhel-docker--pool" () is at 88% which is over the high threshold (85%). Trying to free 1523580928 bytes

4. Check images count again, some (oldest)images were removed.
[root@qe-chezhang-node-registry-router-2 ~]# docker images | wc -l
12

Comment 9 errata-xmlrpc 2017-04-12 19:08:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0884