Bug 1580552
| Summary: | [3.7] Image Garbage collection trying to delete images in use by stopped containers | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Seth Jennings <sjenning> | ||||||
| Component: | Node | Assignee: | Seth Jennings <sjenning> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | DeShuai Ma <dma> | ||||||
| Severity: | unspecified | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 3.6.0 | CC: | aos-bugs, bleanhar, dma, dmoessne, jokerman, ktadimar, mmccomas, pdwyer, sjenning, vlaad | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | 3.7.z | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: |
Prevents image garbage collection from attempting to remove images in use by containers
|
Story Points: | --- | ||||||
| Clone Of: | 1577739 | ||||||||
| : | 1619477 (view as bug list) | Environment: | |||||||
| Last Closed: | 2018-08-28 14:10:54 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | 1577739, 1580554, 1580555 | ||||||||
| Bug Blocks: | 1580551, 1619477 | ||||||||
| Attachments: |
|
||||||||
|
Comment 3
DeShuai Ma
2018-05-28 09:20:15 UTC
Created attachment 1443058 [details]
node.log
//Image 1e5b9c3d90fe is not in the used list
May 28 04:45:06 ip-172-18-11-1.ec2.internal atomic-openshift-node[12185]: I0528 04:45:06.482677 12185 image_gc_manager.go:215] Adding image ID sha256:1e5b9c3d90fe478e4dde23a8bc25609e6ee71821049e15443577c7d6b0533814 to currentImages
May 28 04:45:06 ip-172-18-11-1.ec2.internal atomic-openshift-node[12185]: I0528 04:45:06.482684 12185 image_gc_manager.go:232] Image ID sha256:1e5b9c3d90fe478e4dde23a8bc25609e6ee71821049e15443577c7d6b0533814 has size 213938481
May 28 04:45:06 ip-172-18-11-1.ec2.internal atomic-openshift-node[12185]: I0528 04:45:06.482691 12185 image_gc_manager.go:215] Adding image ID sha256:8c811b4aec35f259572d0f79207bc0678df4c736eeec50bc9fec37ed936a472a to currentImages
May 28 04:45:06 ip-172-18-11-1.ec2.internal atomic-openshift-node[12185]: I0528 04:45:06.482699 12185 image_gc_manager.go:228] Setting Image ID sha256:8c811b4aec35f259572d0f79207bc0678df4c736eeec50bc9fec37ed936a472a lastUsed to 2018-05-28 04:45:06.482498101 -0400 EDT
May 28 04:45:06 ip-172-18-11-1.ec2.internal atomic-openshift-node[12185]: I0528 04:45:06.482707 12185 image_gc_manager.go:232] Image ID sha256:8c811b4aec35f259572d0f79207bc0678df4c736eeec50bc9fec37ed936a472a has size 1146369
May 28 04:45:06 ip-172-18-11-1.ec2.internal atomic-openshift-node[12185]: I0528 04:45:06.482714 12185 image_gc_manager.go:215] Adding image ID sha256:cb5a29f4d401d3c35db3dbdf175f7550c4df806a3bef8a2ca829cf7d8048e93c to currentImages
May 28 04:45:06 ip-172-18-11-1.ec2.internal atomic-openshift-node[12185]: I0528 04:45:06.482720 12185 image_gc_manager.go:232] Image ID sha256:cb5a29f4d401d3c35db3dbdf175f7550c4df806a3bef8a2ca829cf7d8048e93c has size 502531642
May 28 04:45:06 ip-172-18-11-1.ec2.internal atomic-openshift-node[12185]: I0528 04:45:06.482727 12185 image_gc_manager.go:215] Adding image ID sha256:a813b03690b5b20bbaaed50aae05f775d92f183af0a5a1b092f741274d24b4f8 to currentImages
May 28 04:45:06 ip-172-18-11-1.ec2.internal atomic-openshift-node[12185]: I0528 04:45:06.482733 12185 image_gc_manager.go:228] Setting Image ID sha256:a813b03690b5b20bbaaed50aae05f775d92f183af0a5a1b092f741274d24b4f8 lastUsed to 2018-05-28 04:45:06.482498101 -0400 EDT
May 28 04:45:06 ip-172-18-11-1.ec2.internal atomic-openshift-node[12185]: I0528 04:45:06.482744 12185 image_gc_manager.go:232] Image ID sha256:a813b03690b5b20bbaaed50aae05f775d92f183af0a5a1b092f741274d24b4f8 has size 445778042
May 28 04:45:06 ip-172-18-11-1.ec2.internal atomic-openshift-node[12185]: I0528 04:45:06.482751 12185 image_gc_manager.go:215] Adding image ID sha256:9244126592721800cd15a89fd33805e61594dbde79978d4dd5266246af1fe541 to currentImages
May 28 04:45:06 ip-172-18-11-1.ec2.internal atomic-openshift-node[12185]: I0528 04:45:06.482758 12185 image_gc_manager.go:228] Setting Image ID sha256:9244126592721800cd15a89fd33805e61594dbde79978d4dd5266246af1fe541 lastUsed to 2018-05-28 04:45:06.482498101 -0400 EDT
May 28 04:45:06 ip-172-18-11-1.ec2.internal atomic-openshift-node[12185]: I0528 04:45:06.482766 12185 image_gc_manager.go:232] Image ID sha256:9244126592721800cd15a89fd33805e61594dbde79978d4dd5266246af1fe541 has size 255974054
May 28 04:45:06 ip-172-18-11-1.ec2.internal atomic-openshift-node[12185]: I0528 04:45:06.482772 12185 image_gc_manager.go:215] Adding image ID sha256:6b369f7eed80958b1b23fe4db51851acd39c2adf39b0768fd8062596454dddc5 to currentImages
May 28 04:45:06 ip-172-18-11-1.ec2.internal atomic-openshift-node[12185]: I0528 04:45:06.482779 12185 image_gc_manager.go:232] Image ID sha256:6b369f7eed80958b1b23fe4db51851acd39c2adf39b0768fd8062596454dddc5 has size 793796739
May 28 04:45:06 ip-172-18-11-1.ec2.internal atomic-openshift-node[12185]: I0528 04:45:06.482786 12185 image_gc_manager.go:215] Adding image ID sha256:29d2f3226daf297b27f0240244f4e8d614fb63eeab8cd09d816e8b7b04d1c011 to currentImages
May 28 04:45:06 ip-172-18-11-1.ec2.internal atomic-openshift-node[12185]: I0528 04:45:06.482795 12185 image_gc_manager.go:232] Image ID sha256:29d2f3226daf297b27f0240244f4e8d614fb63eeab8cd09d816e8b7b04d1c011 has size 911486786
May 28 04:45:06 ip-172-18-11-1.ec2.internal atomic-openshift-node[12185]: I0528 04:45:06.482801 12185 image_gc_manager.go:215] Adding image ID sha256:d23bdf5b1b1b1afce5f1d0fd33e7ed8afbc084b594b9ccf742a5b27080d8a4a8 to currentImages
May 28 04:45:06 ip-172-18-11-1.ec2.internal atomic-openshift-node[12185]: I0528 04:45:06.482812 12185 image_gc_manager.go:232] Image ID sha256:d23bdf5b1b1b1afce5f1d0fd33e7ed8afbc084b594b9ccf742a5b27080d8a4a8 has size 643119603
May 28 04:45:06 ip-172-18-11-1.ec2.internal atomic-openshift-node[12185]: I0528 04:45:06.482823 12185 image_gc_manager.go:310] Image ID sha256:ed923b3e89b9a110616a1058b0ff24cdb1023f0e6300a35e846181446d6d1409 is being used
May 28 04:45:06 ip-172-18-11-1.ec2.internal atomic-openshift-node[12185]: I0528 04:45:06.482830 12185 image_gc_manager.go:310] Image ID sha256:513d2ad66e7dbeca8c8bd7cf29a4388e8f0c90633fce0990bdc18516acb38248 is being used
May 28 04:45:06 ip-172-18-11-1.ec2.internal atomic-openshift-node[12185]: I0528 04:45:06.482837 12185 image_gc_manager.go:310] Image ID sha256:8c811b4aec35f259572d0f79207bc0678df4c736eeec50bc9fec37ed936a472a is being used
May 28 04:45:06 ip-172-18-11-1.ec2.internal atomic-openshift-node[12185]: I0528 04:45:06.482844 12185 image_gc_manager.go:310] Image ID sha256:9244126592721800cd15a89fd33805e61594dbde79978d4dd5266246af1fe541 is being used
May 28 04:45:06 ip-172-18-11-1.ec2.internal atomic-openshift-node[12185]: I0528 04:45:06.482850 12185 image_gc_manager.go:310] Image ID sha256:907d249a376aed409c13071b4d44fbd16ce8d12478de97a475470547fd2d4ced is being used
May 28 04:45:06 ip-172-18-11-1.ec2.internal atomic-openshift-node[12185]: I0528 04:45:06.482856 12185 image_gc_manager.go:310] Image ID sha256:ca0d85b521501ded1f4fe5bba005333e340dfd51fce3abc46db8bbcb5aff2a66 is being used
May 28 04:45:06 ip-172-18-11-1.ec2.internal atomic-openshift-node[12185]: I0528 04:45:06.482864 12185 image_gc_manager.go:310] Image ID sha256:b88d359e4a43fde2501ffd1a32bd30b82b08844036a30bc6830ddb7dc6d5f49f is being used
May 28 04:45:06 ip-172-18-11-1.ec2.internal atomic-openshift-node[12185]: I0528 04:45:06.482871 12185 image_gc_manager.go:310] Image ID sha256:f9a33b375a7575394ec2c3d43fb82f53c5ad23c5c5fb958137e953b75d6d640a is being used
May 28 04:45:06 ip-172-18-11-1.ec2.internal atomic-openshift-node[12185]: I0528 04:45:06.482878 12185 image_gc_manager.go:310] Image ID sha256:a813b03690b5b20bbaaed50aae05f775d92f183af0a5a1b092f741274d24b4f8 is being used
May 28 04:45:06 ip-172-18-11-1.ec2.internal atomic-openshift-node[12185]: I0528 04:45:06.482884 12185 image_gc_manager.go:310] Image ID sha256:59c9157427c469d52a4f9139d535d1469789a0ce75def34c88ae31a01690e214 is being used
May 28 04:45:06 ip-172-18-11-1.ec2.internal atomic-openshift-node[12185]: I0528 04:45:06.482892 12185 image_gc_manager.go:324] Evaluating image ID sha256:1e5b9c3d90fe478e4dde23a8bc25609e6ee71821049e15443577c7d6b0533814 for possible garbage collection
May 28 04:45:06 ip-172-18-11-1.ec2.internal atomic-openshift-node[12185]: I0528 04:45:06.482900 12185 image_gc_manager.go:340] [imageGCManager]: Removing image "sha256:1e5b9c3d90fe478e4dde23a8bc25609e6ee71821049e15443577c7d6b0533814" to free 213938481 bytes
May 28 04:45:06 ip-172-18-11-1.ec2.internal atomic-openshift-node[12185]: E0528 04:45:06.498629 12185 remote_image.go:130] RemoveImage "sha256:1e5b9c3d90fe478e4dde23a8bc25609e6ee71821049e15443577c7d6b0533814" from image service failed: rpc error: code = 2 desc = Error response from daemon: {"message":"conflict: unable to delete 1e5b9c3d90fe (cannot be forced) - image is being used by running container 24a612cac38e"}
May 28 04:45:06 ip-172-18-11-1.ec2.internal atomic-openshift-node[12185]: E0528 04:45:06.498658 12185 kuberuntime_image.go:126] Remove image "sha256:1e5b9c3d90fe478e4dde23a8bc25609e6ee71821049e15443577c7d6b0533814" failed: rpc error: code = 2 desc = Error response from daemon: {"message":"conflict: unable to delete 1e5b9c3d90fe (cannot be forced) - image is being used by running container 24a612cac38e"}
The backport did not included the fix to exclude the infra/pause image from GC which is the source of the FailedQA Kube PR: https://github.com/kubernetes/kubernetes/pull/58229 Origin pick to 3.9+: https://github.com/openshift/origin/pull/18189 Probably should have included that pick in the PRs for 3.8 and earlier, hence why 3.9 QA verification passed. Working on backports for that as well. In comment 9 the verification env is rpm, Retest on container env, The imagegc try to remove itself with the "openshift3/node:v3.7.52" image. //Errors wanted to free 2956879462 bytes, but freed 3669516238 bytes space with errors in image deletion: rpc error: code = 2 desc = Error response from daemon: {"message":"conflict: unable to delete ad19201eaf4c (cannot be forced) - image is being used by running container 6c3f8ec3da61"} Verions oc v3.7.52 kubernetes v1.7.6+a08f5eeb62 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://172.16.120.8:8443 openshift v3.7.52 kubernetes v1.7.6+a08f5eeb62 //Steps to verify [root@host-172-16-120-8 ~]# oc describe node 172.16.120.97 Name: 172.16.120.97 Role: Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux failure-domain.beta.kubernetes.io/region=regionOne failure-domain.beta.kubernetes.io/zone=nova kubernetes.io/hostname=172.16.120.97 registry=enabled role=node router=enabled Annotations: volumes.kubernetes.io/controller-managed-attach-detach=true Taints: <none> CreationTimestamp: Tue, 05 Jun 2018 00:24:23 +0000 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- OutOfDisk False Tue, 05 Jun 2018 01:49:17 +0000 Tue, 05 Jun 2018 00:24:23 +0000 KubeletHasSufficientDisk kubelet has sufficient disk space available MemoryPressure False Tue, 05 Jun 2018 01:49:17 +0000 Tue, 05 Jun 2018 00:24:23 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Tue, 05 Jun 2018 01:49:17 +0000 Tue, 05 Jun 2018 00:24:23 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure Ready True Tue, 05 Jun 2018 01:49:17 +0000 Tue, 05 Jun 2018 00:24:33 +0000 KubeletReady kubelet is posting ready status Addresses: InternalIP: 172.16.120.97 ExternalIP: 10.8.249.105 Hostname: 172.16.120.97 Capacity: cpu: 4 memory: 8010112Ki pods: 250 Allocatable: cpu: 4 memory: 7907712Ki pods: 250 System Info: Machine ID: cc75ab370c954aa08f9bdd6cf053fa4f System UUID: CC75AB37-0C95-4AA0-8F9B-DD6CF053FA4F Boot ID: 1cd4574b-84b3-429b-aef5-d3d30ba677b6 Kernel Version: 3.10.0-693.17.1.el7.x86_64 OS Image: Red Hat Enterprise Linux Server 7.5 (Maipo) Operating System: linux Architecture: amd64 Container Runtime Version: docker://1.12.6 Kubelet Version: v1.7.6+a08f5eeb62 Kube-Proxy Version: v1.7.6+a08f5eeb62 ExternalID: cc75ab37-0c95-4aa0-8f9b-dd6cf053fa4f Non-terminated Pods: (5 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits --------- ---- ------------ ---------- --------------- ------------- default docker-registry-1-9rnlw 100m (2%) 0 (0%) 256Mi (3%) 0 (0%) default router-1-cp8n9 100m (2%) 0 (0%) 256Mi (3%) 0 (0%) install-test mongodb-1-8mr49 0 (0%) 0 (0%) 512Mi (6%) 512Mi (6%) openshift-ansible-service-broker asb-1-wx8mq 0 (0%) 0 (0%) 0 (0%) 0 (0%) openshift-template-service-broker apiserver-hbs65 0 (0%) 0 (0%) 0 (0%) 0 (0%) Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) CPU Requests CPU Limits Memory Requests Memory Limits ------------ ---------- --------------- ------------- 200m (5%) 0 (0%) 1Gi (13%) 512Mi (6%) Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 1h 1h 1 kubelet, 172.16.120.97 Normal Starting Starting kubelet. 1h 1h 1 kubelet, 172.16.120.97 Normal NodeAllocatableEnforced Updated Node Allocatable limit across pods 1h 1h 2 kubelet, 172.16.120.97 Normal NodeHasSufficientDisk Node 172.16.120.97 status is now: NodeHasSufficientDisk 1h 1h 2 kubelet, 172.16.120.97 Normal NodeHasSufficientMemory Node 172.16.120.97 status is now: NodeHasSufficientMemory 1h 1h 2 kubelet, 172.16.120.97 Normal NodeHasNoDiskPressure Node 172.16.120.97 status is now: NodeHasNoDiskPressure 1h 1h 1 kubelet, 172.16.120.97 Normal NodeReady Node 172.16.120.97 status is now: NodeReady 5m 5m 1 kubelet, 172.16.120.97 Normal Starting Starting kubelet. 5m 5m 1 kubelet, 172.16.120.97 Normal NodeAllocatableEnforced Updated Node Allocatable limit across pods 5m 5m 1 kubelet, 172.16.120.97 Normal NodeHasSufficientDisk Node 172.16.120.97 status is now: NodeHasSufficientDisk 5m 5m 1 kubelet, 172.16.120.97 Normal NodeHasSufficientMemory Node 172.16.120.97 status is now: NodeHasSufficientMemory 5m 5m 1 kubelet, 172.16.120.97 Normal NodeHasNoDiskPressure Node 172.16.120.97 status is now: NodeHasNoDiskPressure 22s 22s 1 kubelet, 172.16.120.97 Warning ImageGCFailed wanted to free 2956879462 bytes, but freed 3669516238 bytes space with errors in image deletion: rpc error: code = 2 desc = Error response from daemon: {"message":"conflict: unable to delete ad19201eaf4c (cannot be forced) - image is being used by running container 6c3f8ec3da61"} [root@host-172-16-120-97 ~]# docker ps |grep 6c3f8ec3da61 6c3f8ec3da61 openshift3/node:v3.7.52 "/usr/local/bin/origi" 5 minutes ago Up 5 minutes atomic-openshift-node [root@host-172-16-120-97 ~]# docker images|grep ad19201eaf4c registry.access.stage.redhat.com/openshift3/node v3.7.52 ad19201eaf4c 3 days ago 1.28 GB Created attachment 1447677 [details]
node.log with remove openshift3/node image
I think the last thing to close out this bug is simply to base the bug number where we are tracking the remaining work for containerized installs. From looking at the PRs and the comments this has been fixed on 3.7 for RPM installs since 3.7.52. Deshaui, do you have a BZ where we're tracking the fix for containerized installs? Those are still broken, right? (In reply to Brenton Leanhardt from comment #18) > Deshaui, do you have a BZ where we're tracking the fix for containerized > installs? Those are still broken, right? As comment 10, imagegc still can remove openshift image(openshift3/node:v3.7.52), in containerized as this image is used by node service, so it report error "conflict: unable to delete ad19201eaf4c (cannot be forced) - image is being used by running container 6c3f8ec3da61" As seth comment 13, we currently don't have a fix to exclude the openshift images Clone a new bug to trace the issue. https://bugzilla.redhat.com/show_bug.cgi?id=1619477 |