Bug 1287035 - Some images are not deleted successfully by garbage collector
Some images are not deleted successfully by garbage collector
Product: OpenShift Origin
Classification: Red Hat
Component: Pod (Show other bugs)
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Jan Chaloupka
Jianwei Hou
Depends On:
  Show dependency treegraph
Reported: 2015-12-01 06:14 EST by Jianwei Hou
Modified: 2015-12-01 07:33 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2015-12-01 07:33:38 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Jianwei Hou 2015-12-01 06:14:01 EST
Description of problem:
Trigger image garbage collection by creating big files to reach its image-gc-high-threshold, monitor the node log, found image was deletee, However 'docker images' show the image still exists.
See log: http://pastebin.test.redhat.com/331780

Version-Release number of selected component (if applicable):
openshift v1.1-264-gfdff20d
kubernetes v1.1.0-origin-1107-g4c8e6f4
etcd 2.1.2

How reproducible:

Steps to Reproduce:
1. In node-config.yaml, set image-gc-high-threshold to '30'
2. Start node
3. Create big files on the node to make the disk usage grow over 30%.
4. Monitor the node log:
I1201 10:41:25.455733    1517 image_manager.go:254] [ImageManager]: Removing image "33a68611c7343d78bdca68f0e680c28ca70bf871fdc8d4427191f1abcede64b5" to free 239376801 bytes
5. On node, run docker images to verify the image is removed.

Actual results:
After step 5: The image is not removed
[fedora@ip-172-18-14-28 ~]$ docker images --no-trunc|grep 33a68611c7343d78bdca68f0e680c28ca70bf871fdc8d4427191f1abcede64b5
openshift/origin-base                    latest              33a68611c7343d78bdca68f0e680c28ca70bf871fdc8d4427191f1abcede64b5   10 hours ago        239.4 MB

If you look at http://pastebin.test.redhat.com/331780, you will find the image 33a68611c7343d78bdca68f0e680c28ca70bf871fdc8d4427191f1abcede64b5 was being deleted more than once, which indicated the first deletion was not  successful. 

Expected results:
The image should be removed.

Additional info:
Comment 1 Jan Chaloupka 2015-12-01 07:33:38 EST
At first, GC determines how much memory has to be freed. Then it sorts images and starts removing them until enough of memory is fried. If an image is not deleted (from any reason), it is skipped and GC does as the image would never existed. I.e. its size is not counted into freed memory.

Thus in real the required amount of memory is actually freed. Just some older images are kept even if they were supposed to be deleted.

For example, an image is not deleted if there is running container that was run from the image. Most likely, for each image that was not removed, you will find its corresponding container.

If there are more failed removals of images, only the last error is reported. What we could do is to report each such error or conjoin all errors into one. However, I think this is not an issue. GC removes as much images as it can and reports the last error. From the error you can deduce there was an error but the number of deleted images stays the same. What can be deleted, is deleted. If it is not enough, GC will try it next time and maybe then it will succeed.

Note You need to log in before you can comment on or make changes to this bug.