Bug 1870050
| Summary: | Image garbage collection is not cleaning up dangling images | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Pamela Escorza <pescorza> | ||||
| Component: | Node | Assignee: | Joel Smith <joelsmith> | ||||
| Node sub component: | Kubelet | QA Contact: | MinLi <minmli> | ||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||
| Severity: | high | ||||||
| Priority: | medium | CC: | aos-bugs, dwalsh, joelsmith, jokerman, rphillips | ||||
| Version: | 3.11.0 | ||||||
| Target Milestone: | --- | ||||||
| Target Release: | 3.11.z | ||||||
| Hardware: | All | ||||||
| OS: | All | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2021-01-20 16:52:47 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 1899717, 1902067 | ||||||
| Bug Blocks: | |||||||
| Attachments: |
|
||||||
|
Description
Pamela Escorza
2020-08-19 09:33:40 UTC
Hi, could you please provide further information? rgds, hi Joel,
Customer has tested the variable "minimum-container-ttl-duration" to 0 but the issue persist.
From the shared session today, I have collected information about an image that should be deleted by the image GC project but it's not:
On the affected node, there is the dangling image:
docker images -a | grep vass
docker-registry.default.svc:5000/default/vass-netutils <none> 74327aedbb2c 22 months ago 342 MB
the image RepoDigest information is:
docker inspect 74327aedbb2c
"Id": "sha256:74327aedbb2cd18a8a73b47d565929af154d88db47608701dd2abd0538805ab5",
"RepoTags": [],
"RepoDigests": [
"docker-registry.default.svc:5000/default/vass-netutils@sha256:495c416a7fd930d1ad244b077b3bc81f1824ef1afe0d8746ad026425f73721dd"
]
....
Checking the project default, there is no container using the image in question:
$ oc get pods -n default -o=jsonpath='{range .items[*]}{"\n"}{.metadata.name}{":\t"}{range .spec.containers[*]}{.image}{", "}{end}{end}' | sort
docker-registry-6-4jm5l: registry.redhat.io/openshift3/ose-docker-registry:v3.11.216,
docker-registry-6-tbdfz: registry.redhat.io/openshift3/ose-docker-registry:v3.11.216,
docker-registry-6-zs45z: registry.redhat.io/openshift3/ose-docker-registry:v3.11.216,
registry-console-5-z8l62: registry.redhat.io/openshift3/registry-console:v3.11.216,
router-1-f7nwx: registry.redhat.io/openshift3/ose-haproxy-router:v3.11.216,
router-1-grw2r: registry.redhat.io/openshift3/ose-haproxy-router:v3.11.216,
router-1-qr5hp: registry.redhat.io/openshift3/ose-haproxy-router:v3.11.216,
After activating the debug and checking in log, the image_gc_manager.go is adding the images to the currentImages list but is not deleting it:
journalctl -u atomic-openshift-node.service --since "1 hour ago" -f | grep -i "image_gc_manager.go" | grep 74327aedbb2c
Oct 01 15:03:54 ********** atomic-openshift-node[31041]: I1001 15:03:53.887799 31056 image_gc_manager.go:242] Image ID sha256:74327aedbb2cd18a8a73b47d565929af154d88db47608701dd2abd0538805ab5 is new
Oct 01 15:03:54 ********** atomic-openshift-node[31041]: I1001 15:03:53.887806 31056 image_gc_manager.go:254] Image ID sha256:74327aedbb2cd18a8a73b47d565929af154d88db47608701dd2abd0538805ab5 has size 341959589
Oct 01 15:08:54 ********** atomic-openshift-node[31041]: I1001 15:08:54.126915 31056 image_gc_manager.go:237] Adding image ID sha256:74327aedbb2cd18a8a73b47d565929af154d88db47608701dd2abd0538805ab5 to currentImages
Why this image was not collected by the imageGC process?, is there any additional verification that need to be performed?
Looking forward for your reply.
Regards,
hi! Could please provide a feedback on this issue? Have you been able to reproduce it? Sorry to push that hard but CU is quite worried about the disk usage due to the dangling images. Don't hesitate to contact in case of further information is needed. Many thanks in advance rgds, Hi Joel, The test has been performed as requested in TEST environment and it worked, but on PROD no image has been clean up. All logs are now available in the drive. Please don't hesitate to contact in case further information is needed. Regards, Hi Joel, Information requested is attached. Cheers Hi, Joel Smith, must I verify this fix on RHEL Atomic Host? If my 3.11 cluster is running on RHEL 7.7 node, can I verify it? Hi, Joel Smith I created a 3.11 cluster on openstack-upshift platform. Flexy job: https://mastern-jenkins-csb-openshift-qe.cloud.paas.psi.redhat.com/job/Launch%20Environment%20Flexy/128942/artifact/host.spec/*view*/ openstack console: https://rhos-d.infra.prod.upshift.rdu2.redhat.com/dashboard/project/instances/ (you can filter by name min1224-311) Thanks if you can add an Atomic Host node or tell me how to add it. Hi, Joel Smith
I created an atomic host cluster, but I can't find any Filesystem which mounted on /var/lib/docker. So how should I verify this bug?
FYI:
[root@minmli-0111311node-1 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Atomic Host release 7.7
[root@minmli-0111311node-1 ~]# df -h
文件系统 容量 已用 可用 已用% 挂载点
devtmpfs 3.8G 0 3.8G 0% /dev
tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs 3.9G 2.5M 3.9G 1% /run
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/mapper/atomicos-root 60G 7.2G 53G 12% /sysroot
/dev/vda1 297M 115M 183M 39% /boot
tmpfs 3.9G 32K 3.9G 1% /var/lib/origin/openshift.local.volumes/pods/e745e6cb-53b8-11eb-9b1a-fa163ee44674/volumes/kubernetes.io~secret/sync-token-cdqq5
tmpfs 3.9G 32K 3.9G 1% /var/lib/origin/openshift.local.volumes/pods/e748d8df-53b8-11eb-9b1a-fa163ee44674/volumes/kubernetes.io~secret/sdn-token-xcswj
tmpfs 3.9G 32K 3.9G 1% /var/lib/origin/openshift.local.volumes/pods/e74d5bfd-53b8-11eb-9b1a-fa163ee44674/volumes/kubernetes.io~secret/sdn-token-xcswj
overlay 60G 7.2G 53G 12% /var/lib/docker/overlay2/43fc458381b097bcd04478eb26586a9a14762dddcf1df4c0368029bf0cdd06c7/merged
overlay 60G 7.2G 53G 12% /var/lib/docker/overlay2/2b67f299295400066fc20808d846362ae6db9da805d18bc7df13e80ed8f0ba12/merged
shm 64M 0 64M 0% /var/lib/docker/containers/9ab5201235a39c4118a2ce44e7948a722f67249566c15c53e8d95f6e89d58868/shm
shm 64M 0 64M 0% /var/lib/docker/containers/8e1a757cbafaa24fe0fcf8f26cc25508cc1a2f92adf8943c624a63987d8a464b/shm
overlay 60G 7.2G 53G 12% /var/lib/docker/overlay2/b513d4618b1bab852c9eaa996b831102ebda57ecffb9331c5ce8c11160eb50c7/merged
overlay 60G 7.2G 53G 12% /var/lib/docker/overlay2/697c6be477d5d7e42168e0a87075657596001f47c6738f4543338c1b2be1de30/merged
overlay 60G 7.2G 53G 12% /var/lib/docker/overlay2/3da666ecdbe36c4f5102162c4ddae7b7fe2fb0507454422db2bdd5b19b53a3c4/merged
shm 64M 0 64M 0% /var/lib/docker/containers/785d4971aa1694236840870a5d42ec52893563f9fd95d245a36d67dfa1719a53/shm
overlay 60G 7.2G 53G 12% /var/lib/docker/overlay2/bfc91a1c4b7677d984e5709b72be93e1cee15e79a6f001a57d94c4c6405a0929/merged
tmpfs 3.9G 32K 3.9G 1% /var/lib/origin/openshift.local.volumes/pods/b0a50d3d-53b9-11eb-9b15-fa163ee44674/volumes/kubernetes.io~secret/node-exporter-token-b6c5z
tmpfs 3.9G 8.0K 3.9G 1% /var/lib/origin/openshift.local.volumes/pods/b0a50d3d-53b9-11eb-9b15-fa163ee44674/volumes/kubernetes.io~secret/node-exporter-tls
overlay 60G 7.2G 53G 12% /var/lib/docker/overlay2/c575d0329009f8f92c260ba433b254461aa839010cf42cac48acb5be5e256123/merged
shm 64M 0 64M 0% /var/lib/docker/containers/d179c88ba0bf13125c592e685da83a08dd918a7aee52905e6696f3376cad616c/shm
overlay 60G 7.2G 53G 12% /var/lib/docker/overlay2/d97104d9488de82a22b16bc94ac1c065efce6e569aa54faf5c37295d4d6d7b46/merged
overlay 60G 7.2G 53G 12% /var/lib/docker/overlay2/9cc307e33dd151e2f9753d94c83ba44a906f69bcd895d11a769da0e14b1a4afa/merged
tmpfs 3.9G 32K 3.9G 1% /var/lib/origin/openshift.local.volumes/pods/36462dbe-53ba-11eb-9b15-fa163ee44674/volumes/kubernetes.io~secret/default-token-2bwb9
overlay 60G 7.2G 53G 12% /var/lib/docker/overlay2/44f0ef3c3dccc77e299f2898751f522d0ef2a24873a00a119c98273882c746ee/merged
shm 64M 0 64M 0% /var/lib/docker/containers/05d479b5f1f4a46de48e092b24ea2e1f0ccd4ffa2da4d471960c4dcb1ca9b517/shm
overlay 60G 7.2G 53G 12% /var/lib/docker/overlay2/5b79a0dacbd438b99cb2e3c3da0279eda4c5eeeceda49f3b62a4a9745679cf17/merged
tmpfs 3.9G 32K 3.9G 1% /var/lib/origin/openshift.local.volumes/pods/49f462e9-53ba-11eb-9b15-fa163ee44674/volumes/kubernetes.io~secret/default-token-2bwb9
overlay 60G 7.2G 53G 12% /var/lib/docker/overlay2/4b1ad42610f3c0331508581d1f82462ab206fc71e765dcc8dfa289e56a6dc868/merged
shm 64M 0 64M 0% /var/lib/docker/containers/4e3e14b054cb7e183eb1eeff5c15256afcc80f353746e41cc766ec0bea191731/shm
overlay 60G 7.2G 53G 12% /var/lib/docker/overlay2/a6da61a301aef0a1ef21ee9bdeeed89c6ec5a6670c18a3f365e79ca1d4456dea/merged
tmpfs 3.9G 32K 3.9G 1% /var/lib/origin/openshift.local.volumes/pods/9f798941-53e7-11eb-9b15-fa163ee44674/volumes/kubernetes.io~secret/default-token-m56p7
overlay 60G 7.2G 53G 12% /var/lib/docker/overlay2/1e00a077a53c069e70bcd206440b62bacd5f28f217735580877992aa094bf42b/merged
shm 64M 0 64M 0% /var/lib/docker/containers/22210c5f01cd9f7bf0aa0e224dc9ee5bcecaf009c4f6384aa9b38af4c4f67b35/shm
overlay 60G 7.2G 53G 12% /var/lib/docker/overlay2/2c804cf068a2a1fe5ef0480b1a9a04df599afd911cc0c82618a5215b57185259/merged
tmpfs 783M 0 783M 0% /run/user/0
[root@minmli-0111311node-1 ~]# lsblk -fs
NAME FSTYPE LABEL UUID MOUNTPOINT
vda1 xfs e6db7aea-4e85-4d5d-b9d4-f262eef3baab /boot
└─vda
atomicos-root xfs b10b2665-e2a5-4dd6-a1b7-57a708d11399 /sysroot
└─vda2 LVM2_member waOc6G-LE2o-HORr-qBHh-gIlP-eANU-vVdfxI
└─vda
[root@minmli-0111311node-1 ~]# docker info|grep Root.Dir
WARNING: You're not using the default seccomp profile
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Docker Root Dir: /var/lib/docker
# oc get --raw /api/v1/nodes/minmli-0111311node-1/proxy/metrics/cadvisor | grep 'container_fs_\(usage\|limit\)_bytes.*,name=""'
container_fs_limit_bytes{container_name="",device="/dev/mapper/atomicos-root",id="/",image="",name="",namespace="",pod_name=""} 6.4095256576e+10
container_fs_limit_bytes{container_name="",device="/dev/vda1",id="/",image="",name="",namespace="",pod_name=""} 3.1107072e+08
container_fs_limit_bytes{container_name="",device="shm",id="/",image="",name="",namespace="",pod_name=""} 6.7108864e+07
container_fs_limit_bytes{container_name="",device="tmpfs",id="/",image="",name="",namespace="",pod_name=""} 6.7108864e+07
container_fs_usage_bytes{container_name="",device="/dev/mapper/atomicos-root",id="/",image="",name="",namespace="",pod_name=""} 7.647141888e+09
container_fs_usage_bytes{container_name="",device="/dev/vda1",id="/",image="",name="",namespace="",pod_name=""} 1.1993088e+08
container_fs_usage_bytes{container_name="",device="shm",id="/",image="",name="",namespace="",pod_name=""} 0
container_fs_usage_bytes{container_name="",device="tmpfs",id="/",image="",name="",namespace="",pod_name=""} 0
# oc get --raw /api/v1/nodes/minmli-0111311node-1/proxy/metrics/cadvisor | grep 'container_fs_usage_bytes.*,name="[^"]' | head -1
container_fs_usage_bytes{container_name="POD",device="/dev/mapper/atomicos-root",id="/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod9f798941_53e7_11eb_9b15_fa163ee44674.slice/docker-22210c5f01cd9f7bf0aa0e224dc9ee5bcecaf009c4f6384aa9b38af4c4f67b35.scope",image="registry.access.stage.redhat.com/openshift3/ose-pod:v3.11.346",name="k8s_POD_httpd-1-sgqcp_httpd_9f798941-53e7-11eb-9b15-fa163ee44674_0",namespace="httpd",pod_name="httpd-1-sgqcp"} 61440
Also I attach the output of command "cat /proc/$(systemctl show --property MainPID atomic-openshift-node.service | sed 's/.*=//')/mountinfo"
Created attachment 1746225 [details]
/proc/self/mountinfo
[root@minmli-0111311node-1 ~]# runc exec atomic-openshift-node df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/atomicos-root 60G 7.2G 53G 12% / devtmpfs 3.8G 0 3.8G 0% /dev shm 64M 0 64M 0% /dev/shm tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup tmpfs 3.9G 2.5M 3.9G 1% /run tmpfs 3.9G 32K 3.9G 1% /var/lib/origin/openshift.local.volumes/pods/e745e6cb-53b8-11eb-9b1a-fa163ee44674/volumes/kubernetes.io~secret/sync-token-cdqq5 tmpfs 3.9G 32K 3.9G 1% /var/lib/origin/openshift.local.volumes/pods/e748d8df-53b8-11eb-9b1a-fa163ee44674/volumes/kubernetes.io~secret/sdn-token-xcswj tmpfs 3.9G 32K 3.9G 1% /var/lib/origin/openshift.local.volumes/pods/e74d5bfd-53b8-11eb-9b1a-fa163ee44674/volumes/kubernetes.io~secret/sdn-token-xcswj tmpfs 3.9G 0 3.9G 0% /rootfs/dev/shm shm 64M 0 64M 0% /rootfs/var/lib/docker/containers/9ab5201235a39c4118a2ce44e7948a722f67249566c15c53e8d95f6e89d58868/shm shm 64M 0 64M 0% /rootfs/var/lib/docker/containers/8e1a757cbafaa24fe0fcf8f26cc25508cc1a2f92adf8943c624a63987d8a464b/shm overlay 60G 7.2G 53G 12% /rootfs/var/lib/docker/overlay2/43fc458381b097bcd04478eb26586a9a14762dddcf1df4c0368029bf0cdd06c7/merged overlay 60G 7.2G 53G 12% /rootfs/var/lib/docker/overlay2/2b67f299295400066fc20808d846362ae6db9da805d18bc7df13e80ed8f0ba12/merged /dev/vda1 297M 115M 183M 39% /rootfs/boot tmpfs 64M 0 64M 0% /tmp tmpfs 3.9G 32K 3.9G 1% /var/lib/origin/openshift.local.volumes/pods/b0a50d3d-53b9-11eb-9b15-fa163ee44674/volumes/kubernetes.io~secret/node-exporter-token-b6c5z tmpfs 3.9G 8.0K 3.9G 1% /var/lib/origin/openshift.local.volumes/pods/b0a50d3d-53b9-11eb-9b15-fa163ee44674/volumes/kubernetes.io~secret/node-exporter-tls tmpfs 3.9G 32K 3.9G 1% /var/lib/origin/openshift.local.volumes/pods/36462dbe-53ba-11eb-9b15-fa163ee44674/volumes/kubernetes.io~secret/default-token-2bwb9 tmpfs 3.9G 32K 3.9G 1% /var/lib/origin/openshift.local.volumes/pods/49f462e9-53ba-11eb-9b15-fa163ee44674/volumes/kubernetes.io~secret/default-token-2bwb9 tmpfs 3.9G 32K 3.9G 1% /var/lib/origin/openshift.local.volumes/pods/9f798941-53e7-11eb-9b15-fa163ee44674/volumes/kubernetes.io~secret/default-token-m56p7 tmpfs 783M 0 783M 0% /run/user/0 on atomic host,
checked /var/lib/containers/atomic/atomic-openshift-node.0/config.json:
{
"type": "bind",
"source": "/var/lib/docker",
"destination": "/var/lib/docker",
"options": [
"bind",
"slave",
"rw",
"mode=755"
]
},
and verified as Comment 14, when image volume usage reach gc-high-threshold, the dangling images are deleted.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 3.11.374 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:0079 |