Bug 1292845 - Cleaning up docker space
Cleaning up docker space
Status: ASSIGNED
Product: OpenShift Container Platform
Classification: Red Hat
Component: Containers (Show other bugs)
3.1.0
Unspecified Unspecified
low Severity low
: ---
: 3.7.0
Assigned To: Michal Fojtik
DeShuai Ma
: Performance
Depends On: 1292964
Blocks:
  Show dependency treegraph
 
Reported: 2015-12-18 08:44 EST by Miheer Salunke
Modified: 2017-08-14 14:06 EDT (History)
20 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Miheer Salunke 2015-12-18 08:44:48 EST
Description of problem:

On busy nodes it seems the docker space (docker vg) is constantly building up and event the cleanup playbook ( https://github.com/openshift/openshift-ansible/blob/master/playbooks/adhoc/docker_storage_cleanup/docker_storage_cleanup.yml ) does not fix it.

[root@ose-node2 ~]# lvs
  LV          VG             Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  docker-pool docker-vg      twi-aot--- 99.43g             92.35  6.18                            
  root        rhel_ose-node2 -wi-ao---- 17.47g                                                    
  swap        rhel_ose-node2 -wi-a-----  2.00g                                 

We tried evicting the node:

oadm manage-node --schedulable=false --selector='env=ewu,kubernetes.io/hostname=ose-node2.example.com,router=ewu'

oadm manage-node --evacuate --selector='env=ewu,kubernetes.io/hostname=ose-node2.example.com,router=ewu'

We tried the playbook which seems to reclaim minimal space, at max a few GB:

"docker ps -a | awk '/Exited|Dead/ {print $1}' | xargs --no-run-if-empty docker rm"
"docker images -q -f dangling=true | xargs --no-run-if-empty docker rmi"
"docker images | grep -v -e registry.access.redhat.com -e docker-registry.usersys.redhat.com -e docker-registry.ops.rhcloud.com | awk '{print $3}' | xargs --no-run-if-empty docker rmi 2>/dev/null"

We tried rebooting afterwards.

Docker images show about 4GB of images:

[root@ose-node2 ~]# docker images -a
REPOSITORY                                                        TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
registry.access.redhat.com/openshift3/ose-docker-builder          v3.1.0.4            90857752fab7        4 weeks ago         395.3 MB
registry.access.redhat.com/openshift3/ose-sti-builder             v3.1.0.4            27fb9b206f9c        4 weeks ago         395.3 MB
registry.access.redhat.com/openshift3/ose-haproxy-router          v3.1.0.4            81c50204da05        4 weeks ago         410.2 MB
registry.access.redhat.com/openshift3/ose-deployer                v3.1.0.4            f8f3ccba9dd4        4 weeks ago         395.3 MB
<none>                                                            <none>              9380de59a89c        4 weeks ago         284.4 MB
registry.access.redhat.com/openshift3/ose-keepalived-ipfailover   v3.1.0.4            bb53dc265861        4 weeks ago         291.7 MB
<none>                                                            <none>              2eb816e9a7d0        4 weeks ago         395.3 MB
registry.access.redhat.com/openshift3/ose-pod                     v3.1.0.4            092ca40663d5        4 weeks ago         327.4 MB
<none>                                                            <none>              f38f7d43a5da        4 weeks ago         270.4 MB
<none>                                                            <none>              6883d5422f4e        4 weeks ago         201.7 MB

Still about 98GB of space is used:

[root@ose-node2 ~]# docker info
Containers: 9
Images: 10
Storage Driver: devicemapper
 Pool Name: docker--vg-docker--pool
 Pool Blocksize: 524.3 kB
 Backing Filesystem: xfs
 Data file: 
 Metadata file: 
 Data Space Used: 98.6 GB
 Data Space Total: 106.8 GB
 Data Space Available: 8.168 GB
 Metadata Space Used: 6.742 MB
 Metadata Space Total: 109.1 MB
 Metadata Space Available: 102.3 MB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Library Version: 1.02.107-RHEL7 (2015-10-14)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.10.0-327.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.2 (Maipo)
CPUs: 2
Total Memory: 7.64 GiB
Name: ose-node2.ewu.tb.noris.de
ID: FTHB:UJOW:SUYJ:GXFC:4INH:M5AR:27HB:6HJJ:GKF4:XPR3:HI5Q:LSBM
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled


Version-Release number of selected component (if applicable):
3.1

How reproducible:
Reproducible on customer end

Steps to Reproduce:
1.Mentioned in the description
2.
3.

Actual results:
It does not free up the space released after deleting images

Expected results:
It should free up the space released after deleting images

Additional info:
Comment 1 Vivek Goyal 2015-12-21 14:43:23 EST
What about containers? Are there any containers in the system. (docker ps -a)
Comment 2 Alexander Koksharov 2015-12-22 02:53:17 EST
containers were deleted. Customer run "docker rm $(docker ps -a -q)" command to delete them all.
Comment 3 Vivek Goyal 2015-12-22 07:59:58 EST
In docker info output, it seems to say there are 9 containers.

[root@ose-node2 ~]# docker info
Containers: 9
Images: 10

It will be a good idea to check with customer again the output of 'docker ps' and make sure all containers have been deleted.
Comment 7 Wang Haoran 2016-01-26 20:59:23 EST
Mybe this will helpfull, the GC of OSE
https://docs.openshift.com/enterprise/3.1/admin_guide/garbage_collection.html
Comment 8 Michal Minar 2016-01-27 04:59:04 EST
Can anybody explain, how a deletion of volumes on the rootfs can reclaim the space on the thin pool? Sure, we can add periodic removal of orphaned volumes to the garbage collector and remove containers with `-v`, but I don't think it solves the original issue.

I'll try to reproduce on rhel.
Comment 9 Michal Minar 2016-01-27 08:57:50 EST
With docker-1.9.1-10.el7 with thinpool storage I created around 400 containers with volumes. With deletion of the containers, some of the space was freed. There were 165 orphaned volumes left. Their deletion freed space on rootfs without any impact on the thinpool. After deletion of all images, the free space on thinpool was back to original value.

Shortened Docker info:

    $ docker info
    Containers: 0
    Images: 0
    Server Version: 1.9.1-el7
    Storage Driver: devicemapper
     Pool Name: vgdocker-tp
     Pool Blocksize: 65.54 kB
     Base Device Size: 107.4 GB
     Backing Filesystem: xfs
     Data file: 
     Metadata file: 
     Data Space Used: 53.74 MB
     Data Space Total: 32.14 GB
     Data Space Available: 32.09 GB
     Metadata Space Used: 188.4 kB
     Metadata Space Total: 33.55 MB
     Metadata Space Available: 33.37 MB
     Udev Sync Supported: true
     Deferred Removal Enabled: false
     Deferred Deletion Enabled: false
     Deferred Deleted Device Count: 0
     Library Version: 1.02.107-RHEL7 (2015-10-14)
    Execution Driver: native-0.2
    Logging Driver: json-file
    Kernel Version: 3.10.0-327.el7.x86_64
    Operating System: Red Hat Enterprise Linux Server 7.3 Beta (Maipo)

I'm not sure what needs to be fixed here. Shall we investigate further in devicemapper not freeing the space after images being removed (which I failed to reproduce on the first try with just a Docker), or shall we make sure that all the orphaned volumes are deleted?

Miheer or Alexander, any thoughts?
Comment 10 Vivek Goyal 2016-01-27 11:11:42 EST
Michael, agreed that volumes are on rootfs and removing volumes does not free space in thin pool.

I think in the example above there were still some containers in the system, and that implies there must have been some images. I suspect that these containers might have written lot of data of their own and might be consuming significant space in thin pool.
Comment 18 Michal Fojtik 2016-10-27 03:54:31 EDT
Setting as upcoming release as there were not significant improvements in pruning in 3.4, however you can now use scheduled jobs (alpha) to automatically prune the image.
Comment 21 Michal Fojtik 2017-04-04 06:36:25 EDT
Documentation PR: https://github.com/openshift/origin/pull/11317
Comment 22 ge liu 2017-04-17 03:47:44 EDT
@mfojtik, this bug's original target is docker volume space reclaim issue, and the last fix is cronjob to auto-prune images topic, could you help to give more clue for this issue, and how to verify it? many thanks in advance!

Note You need to log in before you can comment on or make changes to this bug.