Bug 1408309 - Conflicting defaults prevents Image Garbage Collection
Summary: Conflicting defaults prevents Image Garbage Collection
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.4.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Seth Jennings
QA Contact: Zhang Cheng
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-12-22 20:46 UTC by Jean Abraham
Modified: 2020-09-20 13:08 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Fixes an issue where docker refuses to start new containers due to reaching dm.min_free_space (default 10%) but the devicemapper thin pool usage does not exceed image-gc-high-threshold (default 90%), so the image reclaim occurs thus, the node is stuck. This is fixed by changing the default image-gc-high-threshold to 85%, which causes image reclaim to occur before the default dm.min_free_space is reached.
Clone Of:
Environment:
Last Closed: 2017-04-12 19:08:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Origin (Github) 12762 0 None None None 2017-02-03 18:41:42 UTC
Red Hat Bugzilla 1372674 0 medium CLOSED Garbage collection fails with " Image garbage collection failed: unable to find data for container /"" 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHBA-2017:0884 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.5 RPM Release Advisory 2017-04-12 22:50:07 UTC

Internal Links: 1372674

Description Jean Abraham 2016-12-22 20:46:48 UTC
Description of problem:
The dm.min_free_space defaults to 10%, which "specifies the min free space percent in a thin pool require for new device creation to succeed....Whenever a new a thin pool device is created (during docker pull or during container creation), the Engine checks if the minimum free space is available. If sufficient space is unavailable, then device creation fails and any relevant docker operation fails." [1]

This setting is preventing the storage usage to cross the 90% limit. However, image GC is expected to kick in only beyond image-gc-high-threshold. The image-gc-high-threshold has a default value of 90%, and hence GC never triggers. If image-gc-high-threshold is set to a value lower than (100 - dm.min_free_space)%, GC triggers.

https://github.com/docker/docker/blob/master/docs/reference/commandline/dockerd.md#storage-driver-options

How reproducible:
Everytime.

Steps to Reproduce:
Allow images on disk consumption % to exceed image-gc-high-threshold, with Kubernetes and docker defaults.

Actual results:
GC does not trigger due to conflicting defaults between Kubernetes (image-gc-high-threshold) and Docker (dm.min_free_space)

Expected results:
Same as actual.

Suggestion is to set out-of-the-box image-gc-high-threshold default to a value lower than 90%, to prevent conflicting defaults.

Comment 3 Seth Jennings 2017-01-25 17:45:21 UTC
Upstream PR:
https://github.com/kubernetes/kubernetes/pull/40432

Comment 4 Derek Carr 2017-02-03 15:59:15 UTC
Origin PR:
https://github.com/openshift/origin/pull/12762

Comment 5 Troy Dawson 2017-02-06 19:24:45 UTC
This has been merged into ocp and is in OCP v3.5.0.17 or newer.

Comment 7 Zhang Cheng 2017-02-10 06:31:38 UTC
Verified.
openshift v3.5.0.18+9a5d1aa
kubernetes v1.5.2+43a9be4

1. Check default value of node-config.yaml
  image-gc-high-threshold:
  - '85'

2. Docker pull images in node, let Disk usage on "rhel-docker--pool" more than 85%, and less than 90%.

3. Check images count and node log
[root@qe-chezhang-node-registry-router-2 ~]# docker images | wc -l
15

[root@qe-chezhang-node-registry-router-2 ~]# journalctl -u atomic-openshift-node | grep "over the high threshold"
Feb 10 01:10:12 qe-chezhang-node-registry-router-2 atomic-openshift-node[26506]: I0210 01:10:12.321159   26570 image_gc_manager.go:270] [imageGCManager]: Disk usage on "rhel-docker--pool" () is at 88% which is over the high threshold (85%). Trying to free 1523580928 bytes

4. Check images count again, some (oldest)images were removed.
[root@qe-chezhang-node-registry-router-2 ~]# docker images | wc -l
12

Comment 9 errata-xmlrpc 2017-04-12 19:08:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0884


Note You need to log in before you can comment on or make changes to this bug.