Bug 1300570 - Image garbage collection setting should be more specific for different disk configuration.
Image garbage collection setting should be more specific for different disk c...
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Kubernetes (Show other bugs)
3.1.0
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Derek Carr
DeShuai Ma
:
Depends On:
Blocks: 1267746
  Show dependency treegraph
 
Reported: 2016-01-21 02:44 EST by Johnny Liu
Modified: 2016-05-12 12:27 EDT (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-05-12 12:27:08 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Johnny Liu 2016-01-21 02:44:09 EST
Description of problem:
Following https://docs.openshift.org/latest/install_config/install/prerequisites.html#configuring-docker-storage to set up docker-pool volume as docker back-end storage.

Here is my env info:
# cat /etc/sysconfig/docker-storage
DOCKER_STORAGE_OPTIONS=--storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/rhel72-docker--pool --storage-opt dm.use_deferred_removal=true

# ps -ef|grep docker
root      15538      1  0 Jan20 ?        00:08:01 /usr/bin/docker daemon --insecure-registry=172.31.0.0/16 --selinux-enabled --storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/rhel72-docker--pool --storage-opt dm.use_deferred_removal=true -b=lbr0 --mtu=1450 --add-registry rcm-img-docker01.build.eng.bos.redhat.com:5001 --add-registry registry.access.redhat.com --insecure-registry 0.0.0.0/0

# lvs
  LV          VG     Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  docker-pool rhel72 twi-aot--- 17.44g             73.35  9.57                            
  root        rhel72 -wi-ao---- 10.00g                                                    
  swap        rhel72 -wi-ao----  2.00g  

# df -h
Filesystem                                               Size  Used Avail Use% Mounted on
/dev/mapper/rhel72-root                                   10G  2.7G  7.4G  27% /
devtmpfs                                                 1.9G     0  1.9G   0% /dev
tmpfs                                                    1.9G     0  1.9G   0% /dev/shm
tmpfs                                                    1.9G  191M  1.7G  11% /run
tmpfs                                                    1.9G     0  1.9G   0% /sys/fs/cgroup
/dev/vda1                                                497M  214M  284M  43% /boot

If deploy docker-registry with an external storage. e.g: NFS, that means docker images will be pulled in docker-pool volume, sti build will push image to external storage, root partition will be used less and less.

After setup env, and follow https://docs.openshift.org/latest/admin_guide/garbage_collection.html to configure "Image Garbage Collection", then restart node service.
kubeletArguments: 
  image-gc-high-threshold:
  - '20'
  image-gc-low-threshold:
  - '10'


Seen from node log, found ImageManager is monitoring root disk usage by default.
<--snip-->
Jan 20 18:49:20 openshift-149 atomic-openshift-node: I0120 18:49:20.682843   32634 image_manager.go:202] [ImageManager]: Disk usage on "/dev/mapper/rhel72-root" (/) is at 26% which is over the high threshold (20%). Trying to free 672878592 bytes
Jan 20 18:49:20 openshift-149 docker: time="2016-01-20T18:49:20.683168834+08:00" level=info msg="GET /images/json"
Jan 20 18:49:20 openshift-149 docker: time="2016-01-20T18:49:20.768170253+08:00" level=info msg="GET /containers/json?all=1"
Jan 20 18:49:20 openshift-149 atomic-openshift-node: I0120 18:49:20.770445   32634 image_manager.go:254] [ImageManager]: Removing image "05f86996004c05346d746261b53a406a43e9016753f0ab3bd3a62756828db551" to free 235319119 bytes
<--snip-->

Sometimes, to do image garbage on root disk is NOT what user want to do. In the above scenarios, I want to do image garbage on docker-pool volume.
"Image garbage collection" configuration should allow user to specify which disk partition to be done against.


Version-Release number of selected component (if applicable):
# openshift version
openshift v3.1.1.5
kubernetes v1.1.0-origin-1107-g4c8e6f4
etcd 2.1.2


How reproducible:
Always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 1 Andy Goldstein 2016-01-21 10:17:12 EST
Note: the registry has no impact on Kubelet image garbage collection. Kubelet image garbage collection only targets images stored in the Docker daemon's graph storage. You don't need to have a registry deployed to test image garbage collection, and if you do have a registry deployed, its storage configuration (emptyDir vs hostPath vs NFS vs PV) does not affect node image storage or garbage collection.
Comment 2 Andy Goldstein 2016-01-21 13:30:31 EST
Possible fix here: https://github.com/google/cadvisor/issues/944#issuecomment-173665149. Waiting on more discussion with upstream before I proceed further.
Comment 3 Andy Goldstein 2016-01-21 16:56:36 EST
cadvisor PR: https://github.com/google/cadvisor/pull/1070

Once this is merged, we'll need PRs for Kubernetes and Origin to pull in the updated cadvisor. Or we may cherry-pick the fix into Origin in the short term if we aren't comfortable bumping all of cadvisor for this fix.
Comment 5 Derek Carr 2016-02-03 11:33:39 EST
https://github.com/kubernetes/kubernetes/pull/19354 - merged 1/29
https://github.com/kubernetes/kubernetes/pull/20395 - not yet merged, but tagged lgtm
Comment 6 Derek Carr 2016-02-03 11:34:58 EST
Taking bug in Andy's absence and will look to cherry-pick.
Comment 7 Derek Carr 2016-02-09 10:13:47 EST
https://github.com/kubernetes/kubernetes/pull/20395 just merged upstream.

This will be picked up in the next rebase into Origin.
Comment 8 Andy Goldstein 2016-02-19 11:35:19 EST
Will be in next puddle
Comment 9 DeShuai Ma 2016-02-22 02:12:48 EST
Verify on openshift v3.1.1.904

[root@openshift-135 ~]# openshift version
openshift v3.1.1.904
kubernetes v1.2.0-alpha.7-703-gbc4550d
etcd 2.2.5

[root@openshift-129 ~]# cat /etc/sysconfig/docker-storage
DOCKER_STORAGE_OPTIONS=--storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/rhel72-docker--pool --storage-opt dm.use_deferred_removal=true
[root@openshift-129 ~]# lvs
  LV          VG     Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  docker-pool rhel72 twi-aot--- 17.44g             20.99  5.33                            
  root        rhel72 -wi-ao---- 10.00g                                                    
  swap        rhel72 -wi-ao----  2.00g   


atomic-openshift-node logs:

I0222 15:01:45.411470   27751 image_manager.go:230] [ImageManager]: Disk usage on "rhel72-docker--pool" () is at 70% which is over the high threshold (20%). Trying to free 11319902208 bytes
I0222 15:01:45.508647   27751 docker.go:357] Docker Container: /atomic-openshift-node is not managed by kubelet.
I0222 15:01:45.508679   27751 docker.go:357] Docker Container: /openvswitch is not managed by kubelet.
I0222 15:01:45.508687   27751 docker.go:357] Docker Container: /small_wozniak is not managed by kubelet.
I0222 15:01:45.508914   27751 image_manager.go:287] [ImageManager]: Removing image "0192cfcebeb04ff778cf44aa7f6d336e43ee9fdc6cfb02de091364248a700cfc" to free 490397531 bytes
I0222 15:01:45.651197   27751 docker.go:357] Docker Container: /atomic-openshift-node is not managed by kubelet.
I0222 15:01:45.651227   27751 docker.go:357] Docker Container: /openvswitch is not managed by kubelet.
I0222 15:01:45.651235   27751 docker.go:357] Docker Container: /small_wozniak is not managed by kubelet.
I0222 15:01:45.655037   27751 kubelet.go:2409] SyncLoop (housekeeping)
I0222 15:01:45.668928   27751 image_manager.go:287] [ImageManager]: Removing image "0bbc57b809f12a1a21c8105fa428f714bee4c588e9d47beb4b053bccffb68416" to free 603628030 bytes
Comment 11 errata-xmlrpc 2016-05-12 12:27:08 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:1064

Note You need to log in before you can comment on or make changes to this bug.