Bug 1300570 - Image garbage collection setting should be more specific for different disk configuration.
Summary: Image garbage collection setting should be more specific for different disk c...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.1.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Derek Carr
QA Contact: DeShuai Ma
URL:
Whiteboard:
Depends On:
Blocks: 1267746
TreeView+ depends on / blocked
 
Reported: 2016-01-21 07:44 UTC by Johnny Liu
Modified: 2019-12-16 05:18 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-05-12 16:27:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2016:1064 0 normal SHIPPED_LIVE Important: Red Hat OpenShift Enterprise 3.2 security, bug fix, and enhancement update 2016-05-12 20:19:17 UTC

Description Johnny Liu 2016-01-21 07:44:09 UTC
Description of problem:
Following https://docs.openshift.org/latest/install_config/install/prerequisites.html#configuring-docker-storage to set up docker-pool volume as docker back-end storage.

Here is my env info:
# cat /etc/sysconfig/docker-storage
DOCKER_STORAGE_OPTIONS=--storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/rhel72-docker--pool --storage-opt dm.use_deferred_removal=true

# ps -ef|grep docker
root      15538      1  0 Jan20 ?        00:08:01 /usr/bin/docker daemon --insecure-registry=172.31.0.0/16 --selinux-enabled --storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/rhel72-docker--pool --storage-opt dm.use_deferred_removal=true -b=lbr0 --mtu=1450 --add-registry rcm-img-docker01.build.eng.bos.redhat.com:5001 --add-registry registry.access.redhat.com --insecure-registry 0.0.0.0/0

# lvs
  LV          VG     Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  docker-pool rhel72 twi-aot--- 17.44g             73.35  9.57                            
  root        rhel72 -wi-ao---- 10.00g                                                    
  swap        rhel72 -wi-ao----  2.00g  

# df -h
Filesystem                                               Size  Used Avail Use% Mounted on
/dev/mapper/rhel72-root                                   10G  2.7G  7.4G  27% /
devtmpfs                                                 1.9G     0  1.9G   0% /dev
tmpfs                                                    1.9G     0  1.9G   0% /dev/shm
tmpfs                                                    1.9G  191M  1.7G  11% /run
tmpfs                                                    1.9G     0  1.9G   0% /sys/fs/cgroup
/dev/vda1                                                497M  214M  284M  43% /boot

If deploy docker-registry with an external storage. e.g: NFS, that means docker images will be pulled in docker-pool volume, sti build will push image to external storage, root partition will be used less and less.

After setup env, and follow https://docs.openshift.org/latest/admin_guide/garbage_collection.html to configure "Image Garbage Collection", then restart node service.
kubeletArguments: 
  image-gc-high-threshold:
  - '20'
  image-gc-low-threshold:
  - '10'


Seen from node log, found ImageManager is monitoring root disk usage by default.
<--snip-->
Jan 20 18:49:20 openshift-149 atomic-openshift-node: I0120 18:49:20.682843   32634 image_manager.go:202] [ImageManager]: Disk usage on "/dev/mapper/rhel72-root" (/) is at 26% which is over the high threshold (20%). Trying to free 672878592 bytes
Jan 20 18:49:20 openshift-149 docker: time="2016-01-20T18:49:20.683168834+08:00" level=info msg="GET /images/json"
Jan 20 18:49:20 openshift-149 docker: time="2016-01-20T18:49:20.768170253+08:00" level=info msg="GET /containers/json?all=1"
Jan 20 18:49:20 openshift-149 atomic-openshift-node: I0120 18:49:20.770445   32634 image_manager.go:254] [ImageManager]: Removing image "05f86996004c05346d746261b53a406a43e9016753f0ab3bd3a62756828db551" to free 235319119 bytes
<--snip-->

Sometimes, to do image garbage on root disk is NOT what user want to do. In the above scenarios, I want to do image garbage on docker-pool volume.
"Image garbage collection" configuration should allow user to specify which disk partition to be done against.


Version-Release number of selected component (if applicable):
# openshift version
openshift v3.1.1.5
kubernetes v1.1.0-origin-1107-g4c8e6f4
etcd 2.1.2


How reproducible:
Always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Andy Goldstein 2016-01-21 15:17:12 UTC
Note: the registry has no impact on Kubelet image garbage collection. Kubelet image garbage collection only targets images stored in the Docker daemon's graph storage. You don't need to have a registry deployed to test image garbage collection, and if you do have a registry deployed, its storage configuration (emptyDir vs hostPath vs NFS vs PV) does not affect node image storage or garbage collection.

Comment 2 Andy Goldstein 2016-01-21 18:30:31 UTC
Possible fix here: https://github.com/google/cadvisor/issues/944#issuecomment-173665149. Waiting on more discussion with upstream before I proceed further.

Comment 3 Andy Goldstein 2016-01-21 21:56:36 UTC
cadvisor PR: https://github.com/google/cadvisor/pull/1070

Once this is merged, we'll need PRs for Kubernetes and Origin to pull in the updated cadvisor. Or we may cherry-pick the fix into Origin in the short term if we aren't comfortable bumping all of cadvisor for this fix.

Comment 5 Derek Carr 2016-02-03 16:33:39 UTC
https://github.com/kubernetes/kubernetes/pull/19354 - merged 1/29
https://github.com/kubernetes/kubernetes/pull/20395 - not yet merged, but tagged lgtm

Comment 6 Derek Carr 2016-02-03 16:34:58 UTC
Taking bug in Andy's absence and will look to cherry-pick.

Comment 7 Derek Carr 2016-02-09 15:13:47 UTC
https://github.com/kubernetes/kubernetes/pull/20395 just merged upstream.

This will be picked up in the next rebase into Origin.

Comment 8 Andy Goldstein 2016-02-19 16:35:19 UTC
Will be in next puddle

Comment 9 DeShuai Ma 2016-02-22 07:12:48 UTC
Verify on openshift v3.1.1.904

[root@openshift-135 ~]# openshift version
openshift v3.1.1.904
kubernetes v1.2.0-alpha.7-703-gbc4550d
etcd 2.2.5

[root@openshift-129 ~]# cat /etc/sysconfig/docker-storage
DOCKER_STORAGE_OPTIONS=--storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/rhel72-docker--pool --storage-opt dm.use_deferred_removal=true
[root@openshift-129 ~]# lvs
  LV          VG     Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  docker-pool rhel72 twi-aot--- 17.44g             20.99  5.33                            
  root        rhel72 -wi-ao---- 10.00g                                                    
  swap        rhel72 -wi-ao----  2.00g   


atomic-openshift-node logs:

I0222 15:01:45.411470   27751 image_manager.go:230] [ImageManager]: Disk usage on "rhel72-docker--pool" () is at 70% which is over the high threshold (20%). Trying to free 11319902208 bytes
I0222 15:01:45.508647   27751 docker.go:357] Docker Container: /atomic-openshift-node is not managed by kubelet.
I0222 15:01:45.508679   27751 docker.go:357] Docker Container: /openvswitch is not managed by kubelet.
I0222 15:01:45.508687   27751 docker.go:357] Docker Container: /small_wozniak is not managed by kubelet.
I0222 15:01:45.508914   27751 image_manager.go:287] [ImageManager]: Removing image "0192cfcebeb04ff778cf44aa7f6d336e43ee9fdc6cfb02de091364248a700cfc" to free 490397531 bytes
I0222 15:01:45.651197   27751 docker.go:357] Docker Container: /atomic-openshift-node is not managed by kubelet.
I0222 15:01:45.651227   27751 docker.go:357] Docker Container: /openvswitch is not managed by kubelet.
I0222 15:01:45.651235   27751 docker.go:357] Docker Container: /small_wozniak is not managed by kubelet.
I0222 15:01:45.655037   27751 kubelet.go:2409] SyncLoop (housekeeping)
I0222 15:01:45.668928   27751 image_manager.go:287] [ImageManager]: Removing image "0bbc57b809f12a1a21c8105fa428f714bee4c588e9d47beb4b053bccffb68416" to free 603628030 bytes

Comment 11 errata-xmlrpc 2016-05-12 16:27:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:1064


Note You need to log in before you can comment on or make changes to this bug.