1300570 – Image garbage collection setting should be more specific for different disk configuration.

Bug 1300570 - Image garbage collection setting should be more specific for different disk configuration.

Summary: Image garbage collection setting should be more specific for different disk c...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	3.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Derek Carr
QA Contact:	DeShuai Ma
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1267746
TreeView+	depends on / blocked

Reported:	2016-01-21 07:44 UTC by Johnny Liu
Modified:	2019-12-16 05:18 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-05-12 16:27:08 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2016:1064	0	normal	SHIPPED_LIVE	Important: Red Hat OpenShift Enterprise 3.2 security, bug fix, and enhancement update	2016-05-12 20:19:17 UTC

Description Johnny Liu 2016-01-21 07:44:09 UTC

Description of problem:
Following https://docs.openshift.org/latest/install_config/install/prerequisites.html#configuring-docker-storage to set up docker-pool volume as docker back-end storage.

Here is my env info:
# cat /etc/sysconfig/docker-storage
DOCKER_STORAGE_OPTIONS=--storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/rhel72-docker--pool --storage-opt dm.use_deferred_removal=true

# ps -ef|grep docker
root      15538      1  0 Jan20 ?        00:08:01 /usr/bin/docker daemon --insecure-registry=172.31.0.0/16 --selinux-enabled --storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/rhel72-docker--pool --storage-opt dm.use_deferred_removal=true -b=lbr0 --mtu=1450 --add-registry rcm-img-docker01.build.eng.bos.redhat.com:5001 --add-registry registry.access.redhat.com --insecure-registry 0.0.0.0/0

# lvs
  LV          VG     Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  docker-pool rhel72 twi-aot--- 17.44g             73.35  9.57                            
  root        rhel72 -wi-ao---- 10.00g                                                    
  swap        rhel72 -wi-ao----  2.00g  

# df -h
Filesystem                                               Size  Used Avail Use% Mounted on
/dev/mapper/rhel72-root                                   10G  2.7G  7.4G  27% /
devtmpfs                                                 1.9G     0  1.9G   0% /dev
tmpfs                                                    1.9G     0  1.9G   0% /dev/shm
tmpfs                                                    1.9G  191M  1.7G  11% /run
tmpfs                                                    1.9G     0  1.9G   0% /sys/fs/cgroup
/dev/vda1                                                497M  214M  284M  43% /boot

If deploy docker-registry with an external storage. e.g: NFS, that means docker images will be pulled in docker-pool volume, sti build will push image to external storage, root partition will be used less and less.

After setup env, and follow https://docs.openshift.org/latest/admin_guide/garbage_collection.html to configure "Image Garbage Collection", then restart node service.
kubeletArguments: 
  image-gc-high-threshold:
  - '20'
  image-gc-low-threshold:
  - '10'


Seen from node log, found ImageManager is monitoring root disk usage by default.
<--snip-->
Jan 20 18:49:20 openshift-149 atomic-openshift-node: I0120 18:49:20.682843   32634 image_manager.go:202] [ImageManager]: Disk usage on "/dev/mapper/rhel72-root" (/) is at 26% which is over the high threshold (20%). Trying to free 672878592 bytes
Jan 20 18:49:20 openshift-149 docker: time="2016-01-20T18:49:20.683168834+08:00" level=info msg="GET /images/json"
Jan 20 18:49:20 openshift-149 docker: time="2016-01-20T18:49:20.768170253+08:00" level=info msg="GET /containers/json?all=1"
Jan 20 18:49:20 openshift-149 atomic-openshift-node: I0120 18:49:20.770445   32634 image_manager.go:254] [ImageManager]: Removing image "05f86996004c05346d746261b53a406a43e9016753f0ab3bd3a62756828db551" to free 235319119 bytes
<--snip-->

Sometimes, to do image garbage on root disk is NOT what user want to do. In the above scenarios, I want to do image garbage on docker-pool volume.
"Image garbage collection" configuration should allow user to specify which disk partition to be done against.


Version-Release number of selected component (if applicable):
# openshift version
openshift v3.1.1.5
kubernetes v1.1.0-origin-1107-g4c8e6f4
etcd 2.1.2


How reproducible:
Always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Andy Goldstein 2016-01-21 15:17:12 UTC

Note: the registry has no impact on Kubelet image garbage collection. Kubelet image garbage collection only targets images stored in the Docker daemon's graph storage. You don't need to have a registry deployed to test image garbage collection, and if you do have a registry deployed, its storage configuration (emptyDir vs hostPath vs NFS vs PV) does not affect node image storage or garbage collection.

Comment 2 Andy Goldstein 2016-01-21 18:30:31 UTC

Possible fix here: https://github.com/google/cadvisor/issues/944#issuecomment-173665149. Waiting on more discussion with upstream before I proceed further.

Comment 3 Andy Goldstein 2016-01-21 21:56:36 UTC

cadvisor PR: https://github.com/google/cadvisor/pull/1070

Once this is merged, we'll need PRs for Kubernetes and Origin to pull in the updated cadvisor. Or we may cherry-pick the fix into Origin in the short term if we aren't comfortable bumping all of cadvisor for this fix.

Comment 4 Andy Goldstein 2016-02-01 17:19:28 UTC

Kube PRs: https://github.com/kubernetes/kubernetes/pull/19354, https://github.com/kubernetes/kubernetes/pull/20395

Comment 5 Derek Carr 2016-02-03 16:33:39 UTC

https://github.com/kubernetes/kubernetes/pull/19354 - merged 1/29
https://github.com/kubernetes/kubernetes/pull/20395 - not yet merged, but tagged lgtm

Comment 6 Derek Carr 2016-02-03 16:34:58 UTC

Taking bug in Andy's absence and will look to cherry-pick.

Comment 7 Derek Carr 2016-02-09 15:13:47 UTC

https://github.com/kubernetes/kubernetes/pull/20395 just merged upstream.

This will be picked up in the next rebase into Origin.

Comment 8 Andy Goldstein 2016-02-19 16:35:19 UTC

Will be in next puddle

Comment 9 DeShuai Ma 2016-02-22 07:12:48 UTC

Verify on openshift v3.1.1.904

[root@openshift-135 ~]# openshift version
openshift v3.1.1.904
kubernetes v1.2.0-alpha.7-703-gbc4550d
etcd 2.2.5

[root@openshift-129 ~]# cat /etc/sysconfig/docker-storage
DOCKER_STORAGE_OPTIONS=--storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/rhel72-docker--pool --storage-opt dm.use_deferred_removal=true
[root@openshift-129 ~]# lvs
  LV          VG     Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  docker-pool rhel72 twi-aot--- 17.44g             20.99  5.33                            
  root        rhel72 -wi-ao---- 10.00g                                                    
  swap        rhel72 -wi-ao----  2.00g   


atomic-openshift-node logs:

I0222 15:01:45.411470   27751 image_manager.go:230] [ImageManager]: Disk usage on "rhel72-docker--pool" () is at 70% which is over the high threshold (20%). Trying to free 11319902208 bytes
I0222 15:01:45.508647   27751 docker.go:357] Docker Container: /atomic-openshift-node is not managed by kubelet.
I0222 15:01:45.508679   27751 docker.go:357] Docker Container: /openvswitch is not managed by kubelet.
I0222 15:01:45.508687   27751 docker.go:357] Docker Container: /small_wozniak is not managed by kubelet.
I0222 15:01:45.508914   27751 image_manager.go:287] [ImageManager]: Removing image "0192cfcebeb04ff778cf44aa7f6d336e43ee9fdc6cfb02de091364248a700cfc" to free 490397531 bytes
I0222 15:01:45.651197   27751 docker.go:357] Docker Container: /atomic-openshift-node is not managed by kubelet.
I0222 15:01:45.651227   27751 docker.go:357] Docker Container: /openvswitch is not managed by kubelet.
I0222 15:01:45.651235   27751 docker.go:357] Docker Container: /small_wozniak is not managed by kubelet.
I0222 15:01:45.655037   27751 kubelet.go:2409] SyncLoop (housekeeping)
I0222 15:01:45.668928   27751 image_manager.go:287] [ImageManager]: Removing image "0bbc57b809f12a1a21c8105fa428f714bee4c588e9d47beb4b053bccffb68416" to free 603628030 bytes

Comment 11 errata-xmlrpc 2016-05-12 16:27:08 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:1064

Note You need to log in before you can comment on or make changes to this bug.