Bug 1870050 - Image garbage collection is not cleaning up dangling images
Summary: Image garbage collection is not cleaning up dangling images
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.11.0
Hardware: All
OS: All
medium
high
Target Milestone: ---
: 3.11.z
Assignee: Joel Smith
QA Contact: MinLi
URL:
Whiteboard:
Depends On: 1899717 1902067
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-19 09:33 UTC by Pamela Escorza
Modified: 2024-03-25 16:19 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-01-20 16:52:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
/proc/self/mountinfo (35.86 KB, text/plain)
2021-01-11 10:09 UTC, MinLi
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 25683 0 None closed Bug 1870050: Mount ImageFS in node container 2021-01-27 03:31:40 UTC
Red Hat Product Errata RHSA-2021:0079 0 None None None 2021-01-20 16:53:09 UTC

Description Pamela Escorza 2020-08-19 09:33:40 UTC
Description of problem:

Image garbage collection is not cleaning up  dangling images

Version-Release number of selected component (if applicable):
3.11.216

How reproducible:
always

Steps to Reproduce:
1. Verify the list of dangling images after image garbage collection execution finished on the node.


Actual results:
Is not clear if Image Garbage collector should delete or not the dangling images.
  

Expected results:
No dangling image on the node

Additional info:
This is causing over usage of the disk.

Comment 2 Pamela Escorza 2020-08-24 11:36:49 UTC
Hi, 
could you please provide further information?
rgds,

Comment 10 Pamela Escorza 2020-10-01 14:57:40 UTC
hi Joel, 

Customer has tested the variable "minimum-container-ttl-duration" to 0 but the issue persist.

From the shared session today, I have collected information about an image that should be deleted by the image GC project but it's not:
On the affected node, there is the dangling image:

docker images -a | grep vass
docker-registry.default.svc:5000/default/vass-netutils                                           <none>              74327aedbb2c        22 months ago       342 MB

the image RepoDigest information is:
docker inspect 74327aedbb2c
        "Id": "sha256:74327aedbb2cd18a8a73b47d565929af154d88db47608701dd2abd0538805ab5",
        "RepoTags": [],
        "RepoDigests": [
            "docker-registry.default.svc:5000/default/vass-netutils@sha256:495c416a7fd930d1ad244b077b3bc81f1824ef1afe0d8746ad026425f73721dd"
        ]
....


Checking the project default, there is no container using the image in question:
$ oc get pods -n default -o=jsonpath='{range .items[*]}{"\n"}{.metadata.name}{":\t"}{range .spec.containers[*]}{.image}{", "}{end}{end}' | sort
docker-registry-6-4jm5l:        registry.redhat.io/openshift3/ose-docker-registry:v3.11.216, 
docker-registry-6-tbdfz:        registry.redhat.io/openshift3/ose-docker-registry:v3.11.216, 
docker-registry-6-zs45z:        registry.redhat.io/openshift3/ose-docker-registry:v3.11.216, 
registry-console-5-z8l62:       registry.redhat.io/openshift3/registry-console:v3.11.216, 
router-1-f7nwx: registry.redhat.io/openshift3/ose-haproxy-router:v3.11.216, 
router-1-grw2r: registry.redhat.io/openshift3/ose-haproxy-router:v3.11.216, 
router-1-qr5hp: registry.redhat.io/openshift3/ose-haproxy-router:v3.11.216, 

After activating the debug and checking in log, the image_gc_manager.go is adding the images to the currentImages list but is not deleting it:
journalctl -u atomic-openshift-node.service --since "1 hour ago" -f | grep -i "image_gc_manager.go" | grep 74327aedbb2c
Oct 01 15:03:54 ********** atomic-openshift-node[31041]: I1001 15:03:53.887799   31056 image_gc_manager.go:242] Image ID sha256:74327aedbb2cd18a8a73b47d565929af154d88db47608701dd2abd0538805ab5 is new
Oct 01 15:03:54 ********** atomic-openshift-node[31041]: I1001 15:03:53.887806   31056 image_gc_manager.go:254] Image ID sha256:74327aedbb2cd18a8a73b47d565929af154d88db47608701dd2abd0538805ab5 has size 341959589
Oct 01 15:08:54 ********** atomic-openshift-node[31041]: I1001 15:08:54.126915   31056 image_gc_manager.go:237] Adding image ID sha256:74327aedbb2cd18a8a73b47d565929af154d88db47608701dd2abd0538805ab5 to currentImages


Why this image was not collected by the imageGC process?, is there any additional verification that need to be performed?

Looking forward for your reply.

Regards,

Comment 17 Pamela Escorza 2020-10-15 08:23:54 UTC
hi!
Could please provide a feedback on this issue? Have you been able to reproduce it? Sorry to push that hard but CU is quite worried about the disk usage due to the dangling images. 
Don't hesitate to contact in case of further information is needed.
Many thanks in advance
rgds,

Comment 21 Pamela Escorza 2020-10-23 08:33:28 UTC
Hi Joel, 
The test has been performed as requested in TEST environment and it worked, but on PROD no image has been clean up. 

All logs are now available in the drive.

Please don't hesitate to contact in case further information is needed.

Regards,

Comment 30 Pamela Escorza 2020-11-11 12:22:35 UTC
Hi Joel, 
Information requested is attached. Cheers

Comment 35 MinLi 2020-12-23 10:11:02 UTC
Hi, Joel Smith, must I verify this fix on RHEL Atomic Host? If my 3.11 cluster is running on RHEL 7.7 node, can I verify it?

Comment 37 MinLi 2020-12-24 09:31:36 UTC
Hi, Joel Smith
I created a 3.11 cluster on openstack-upshift platform. 
Flexy job: https://mastern-jenkins-csb-openshift-qe.cloud.paas.psi.redhat.com/job/Launch%20Environment%20Flexy/128942/artifact/host.spec/*view*/ 
openstack console: https://rhos-d.infra.prod.upshift.rdu2.redhat.com/dashboard/project/instances/ (you can filter by name min1224-311)
Thanks if you can add an Atomic Host node or tell me how to add it.

Comment 38 MinLi 2021-01-11 10:05:06 UTC
Hi, Joel Smith 
I created an atomic host cluster, but I can't find any Filesystem which mounted on /var/lib/docker. So how should I verify this bug? 

FYI:
[root@minmli-0111311node-1 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Atomic Host release 7.7

[root@minmli-0111311node-1 ~]# df -h 
文件系统                   容量  已用  可用 已用% 挂载点
devtmpfs                   3.8G     0  3.8G    0% /dev
tmpfs                      3.9G     0  3.9G    0% /dev/shm
tmpfs                      3.9G  2.5M  3.9G    1% /run
tmpfs                      3.9G     0  3.9G    0% /sys/fs/cgroup
/dev/mapper/atomicos-root   60G  7.2G   53G   12% /sysroot
/dev/vda1                  297M  115M  183M   39% /boot
tmpfs                      3.9G   32K  3.9G    1% /var/lib/origin/openshift.local.volumes/pods/e745e6cb-53b8-11eb-9b1a-fa163ee44674/volumes/kubernetes.io~secret/sync-token-cdqq5
tmpfs                      3.9G   32K  3.9G    1% /var/lib/origin/openshift.local.volumes/pods/e748d8df-53b8-11eb-9b1a-fa163ee44674/volumes/kubernetes.io~secret/sdn-token-xcswj
tmpfs                      3.9G   32K  3.9G    1% /var/lib/origin/openshift.local.volumes/pods/e74d5bfd-53b8-11eb-9b1a-fa163ee44674/volumes/kubernetes.io~secret/sdn-token-xcswj
overlay                     60G  7.2G   53G   12% /var/lib/docker/overlay2/43fc458381b097bcd04478eb26586a9a14762dddcf1df4c0368029bf0cdd06c7/merged
overlay                     60G  7.2G   53G   12% /var/lib/docker/overlay2/2b67f299295400066fc20808d846362ae6db9da805d18bc7df13e80ed8f0ba12/merged
shm                         64M     0   64M    0% /var/lib/docker/containers/9ab5201235a39c4118a2ce44e7948a722f67249566c15c53e8d95f6e89d58868/shm
shm                         64M     0   64M    0% /var/lib/docker/containers/8e1a757cbafaa24fe0fcf8f26cc25508cc1a2f92adf8943c624a63987d8a464b/shm
overlay                     60G  7.2G   53G   12% /var/lib/docker/overlay2/b513d4618b1bab852c9eaa996b831102ebda57ecffb9331c5ce8c11160eb50c7/merged
overlay                     60G  7.2G   53G   12% /var/lib/docker/overlay2/697c6be477d5d7e42168e0a87075657596001f47c6738f4543338c1b2be1de30/merged
overlay                     60G  7.2G   53G   12% /var/lib/docker/overlay2/3da666ecdbe36c4f5102162c4ddae7b7fe2fb0507454422db2bdd5b19b53a3c4/merged
shm                         64M     0   64M    0% /var/lib/docker/containers/785d4971aa1694236840870a5d42ec52893563f9fd95d245a36d67dfa1719a53/shm
overlay                     60G  7.2G   53G   12% /var/lib/docker/overlay2/bfc91a1c4b7677d984e5709b72be93e1cee15e79a6f001a57d94c4c6405a0929/merged
tmpfs                      3.9G   32K  3.9G    1% /var/lib/origin/openshift.local.volumes/pods/b0a50d3d-53b9-11eb-9b15-fa163ee44674/volumes/kubernetes.io~secret/node-exporter-token-b6c5z
tmpfs                      3.9G  8.0K  3.9G    1% /var/lib/origin/openshift.local.volumes/pods/b0a50d3d-53b9-11eb-9b15-fa163ee44674/volumes/kubernetes.io~secret/node-exporter-tls
overlay                     60G  7.2G   53G   12% /var/lib/docker/overlay2/c575d0329009f8f92c260ba433b254461aa839010cf42cac48acb5be5e256123/merged
shm                         64M     0   64M    0% /var/lib/docker/containers/d179c88ba0bf13125c592e685da83a08dd918a7aee52905e6696f3376cad616c/shm
overlay                     60G  7.2G   53G   12% /var/lib/docker/overlay2/d97104d9488de82a22b16bc94ac1c065efce6e569aa54faf5c37295d4d6d7b46/merged
overlay                     60G  7.2G   53G   12% /var/lib/docker/overlay2/9cc307e33dd151e2f9753d94c83ba44a906f69bcd895d11a769da0e14b1a4afa/merged
tmpfs                      3.9G   32K  3.9G    1% /var/lib/origin/openshift.local.volumes/pods/36462dbe-53ba-11eb-9b15-fa163ee44674/volumes/kubernetes.io~secret/default-token-2bwb9
overlay                     60G  7.2G   53G   12% /var/lib/docker/overlay2/44f0ef3c3dccc77e299f2898751f522d0ef2a24873a00a119c98273882c746ee/merged
shm                         64M     0   64M    0% /var/lib/docker/containers/05d479b5f1f4a46de48e092b24ea2e1f0ccd4ffa2da4d471960c4dcb1ca9b517/shm
overlay                     60G  7.2G   53G   12% /var/lib/docker/overlay2/5b79a0dacbd438b99cb2e3c3da0279eda4c5eeeceda49f3b62a4a9745679cf17/merged
tmpfs                      3.9G   32K  3.9G    1% /var/lib/origin/openshift.local.volumes/pods/49f462e9-53ba-11eb-9b15-fa163ee44674/volumes/kubernetes.io~secret/default-token-2bwb9
overlay                     60G  7.2G   53G   12% /var/lib/docker/overlay2/4b1ad42610f3c0331508581d1f82462ab206fc71e765dcc8dfa289e56a6dc868/merged
shm                         64M     0   64M    0% /var/lib/docker/containers/4e3e14b054cb7e183eb1eeff5c15256afcc80f353746e41cc766ec0bea191731/shm
overlay                     60G  7.2G   53G   12% /var/lib/docker/overlay2/a6da61a301aef0a1ef21ee9bdeeed89c6ec5a6670c18a3f365e79ca1d4456dea/merged
tmpfs                      3.9G   32K  3.9G    1% /var/lib/origin/openshift.local.volumes/pods/9f798941-53e7-11eb-9b15-fa163ee44674/volumes/kubernetes.io~secret/default-token-m56p7
overlay                     60G  7.2G   53G   12% /var/lib/docker/overlay2/1e00a077a53c069e70bcd206440b62bacd5f28f217735580877992aa094bf42b/merged
shm                         64M     0   64M    0% /var/lib/docker/containers/22210c5f01cd9f7bf0aa0e224dc9ee5bcecaf009c4f6384aa9b38af4c4f67b35/shm
overlay                     60G  7.2G   53G   12% /var/lib/docker/overlay2/2c804cf068a2a1fe5ef0480b1a9a04df599afd911cc0c82618a5215b57185259/merged
tmpfs                      783M     0  783M    0% /run/user/0

[root@minmli-0111311node-1 ~]# lsblk -fs
NAME          FSTYPE      LABEL UUID                                   MOUNTPOINT
vda1          xfs               e6db7aea-4e85-4d5d-b9d4-f262eef3baab   /boot
└─vda                                                                  
atomicos-root xfs               b10b2665-e2a5-4dd6-a1b7-57a708d11399   /sysroot
└─vda2        LVM2_member       waOc6G-LE2o-HORr-qBHh-gIlP-eANU-vVdfxI 
  └─vda                            

[root@minmli-0111311node-1 ~]# docker info|grep Root.Dir
  WARNING: You're not using the default seccomp profile
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Docker Root Dir: /var/lib/docker

# oc get --raw /api/v1/nodes/minmli-0111311node-1/proxy/metrics/cadvisor | grep 'container_fs_\(usage\|limit\)_bytes.*,name=""'
container_fs_limit_bytes{container_name="",device="/dev/mapper/atomicos-root",id="/",image="",name="",namespace="",pod_name=""} 6.4095256576e+10
container_fs_limit_bytes{container_name="",device="/dev/vda1",id="/",image="",name="",namespace="",pod_name=""} 3.1107072e+08
container_fs_limit_bytes{container_name="",device="shm",id="/",image="",name="",namespace="",pod_name=""} 6.7108864e+07
container_fs_limit_bytes{container_name="",device="tmpfs",id="/",image="",name="",namespace="",pod_name=""} 6.7108864e+07
container_fs_usage_bytes{container_name="",device="/dev/mapper/atomicos-root",id="/",image="",name="",namespace="",pod_name=""} 7.647141888e+09
container_fs_usage_bytes{container_name="",device="/dev/vda1",id="/",image="",name="",namespace="",pod_name=""} 1.1993088e+08
container_fs_usage_bytes{container_name="",device="shm",id="/",image="",name="",namespace="",pod_name=""} 0
container_fs_usage_bytes{container_name="",device="tmpfs",id="/",image="",name="",namespace="",pod_name=""} 0

# oc get --raw /api/v1/nodes/minmli-0111311node-1/proxy/metrics/cadvisor | grep 'container_fs_usage_bytes.*,name="[^"]' | head -1
container_fs_usage_bytes{container_name="POD",device="/dev/mapper/atomicos-root",id="/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod9f798941_53e7_11eb_9b15_fa163ee44674.slice/docker-22210c5f01cd9f7bf0aa0e224dc9ee5bcecaf009c4f6384aa9b38af4c4f67b35.scope",image="registry.access.stage.redhat.com/openshift3/ose-pod:v3.11.346",name="k8s_POD_httpd-1-sgqcp_httpd_9f798941-53e7-11eb-9b15-fa163ee44674_0",namespace="httpd",pod_name="httpd-1-sgqcp"} 61440


Also I attach the output of command "cat /proc/$(systemctl show --property MainPID atomic-openshift-node.service | sed 's/.*=//')/mountinfo"

Comment 39 MinLi 2021-01-11 10:09:02 UTC
Created attachment 1746225 [details]
/proc/self/mountinfo

Comment 40 MinLi 2021-01-11 10:10:50 UTC
[root@minmli-0111311node-1 ~]# runc exec atomic-openshift-node df -h
Filesystem                 Size  Used Avail Use% Mounted on
/dev/mapper/atomicos-root   60G  7.2G   53G  12% /
devtmpfs                   3.8G     0  3.8G   0% /dev
shm                         64M     0   64M   0% /dev/shm
tmpfs                      3.9G     0  3.9G   0% /sys/fs/cgroup
tmpfs                      3.9G  2.5M  3.9G   1% /run
tmpfs                      3.9G   32K  3.9G   1% /var/lib/origin/openshift.local.volumes/pods/e745e6cb-53b8-11eb-9b1a-fa163ee44674/volumes/kubernetes.io~secret/sync-token-cdqq5
tmpfs                      3.9G   32K  3.9G   1% /var/lib/origin/openshift.local.volumes/pods/e748d8df-53b8-11eb-9b1a-fa163ee44674/volumes/kubernetes.io~secret/sdn-token-xcswj
tmpfs                      3.9G   32K  3.9G   1% /var/lib/origin/openshift.local.volumes/pods/e74d5bfd-53b8-11eb-9b1a-fa163ee44674/volumes/kubernetes.io~secret/sdn-token-xcswj
tmpfs                      3.9G     0  3.9G   0% /rootfs/dev/shm
shm                         64M     0   64M   0% /rootfs/var/lib/docker/containers/9ab5201235a39c4118a2ce44e7948a722f67249566c15c53e8d95f6e89d58868/shm
shm                         64M     0   64M   0% /rootfs/var/lib/docker/containers/8e1a757cbafaa24fe0fcf8f26cc25508cc1a2f92adf8943c624a63987d8a464b/shm
overlay                     60G  7.2G   53G  12% /rootfs/var/lib/docker/overlay2/43fc458381b097bcd04478eb26586a9a14762dddcf1df4c0368029bf0cdd06c7/merged
overlay                     60G  7.2G   53G  12% /rootfs/var/lib/docker/overlay2/2b67f299295400066fc20808d846362ae6db9da805d18bc7df13e80ed8f0ba12/merged
/dev/vda1                  297M  115M  183M  39% /rootfs/boot
tmpfs                       64M     0   64M   0% /tmp
tmpfs                      3.9G   32K  3.9G   1% /var/lib/origin/openshift.local.volumes/pods/b0a50d3d-53b9-11eb-9b15-fa163ee44674/volumes/kubernetes.io~secret/node-exporter-token-b6c5z
tmpfs                      3.9G  8.0K  3.9G   1% /var/lib/origin/openshift.local.volumes/pods/b0a50d3d-53b9-11eb-9b15-fa163ee44674/volumes/kubernetes.io~secret/node-exporter-tls
tmpfs                      3.9G   32K  3.9G   1% /var/lib/origin/openshift.local.volumes/pods/36462dbe-53ba-11eb-9b15-fa163ee44674/volumes/kubernetes.io~secret/default-token-2bwb9
tmpfs                      3.9G   32K  3.9G   1% /var/lib/origin/openshift.local.volumes/pods/49f462e9-53ba-11eb-9b15-fa163ee44674/volumes/kubernetes.io~secret/default-token-2bwb9
tmpfs                      3.9G   32K  3.9G   1% /var/lib/origin/openshift.local.volumes/pods/9f798941-53e7-11eb-9b15-fa163ee44674/volumes/kubernetes.io~secret/default-token-m56p7
tmpfs                      783M     0  783M   0% /run/user/0

Comment 42 MinLi 2021-01-15 03:35:04 UTC
on atomic host, 
checked /var/lib/containers/atomic/atomic-openshift-node.0/config.json: 
        {
            "type": "bind",
            "source": "/var/lib/docker",
            "destination": "/var/lib/docker",
            "options": [
                "bind",
                "slave",
                "rw",
                "mode=755"
            ]
        },

and verified as Comment 14, when image volume usage reach gc-high-threshold, the dangling images are deleted.

Comment 44 errata-xmlrpc 2021-01-20 16:52:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 3.11.374 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0079


Note You need to log in before you can comment on or make changes to this bug.