Bug 1567657
Summary: | Very large repositories make no progress pruning images if any image has an issue | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Clayton Coleman <ccoleman> | ||||
Component: | ImageStreams | Assignee: | Michal Minar <miminar> | ||||
Status: | CLOSED ERRATA | QA Contact: | XiuJuan Wang <xiuwang> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 3.9.0 | CC: | aos-bugs, bparees, jokerman, mifiedle, miminar, mmccomas, wzheng | ||||
Target Milestone: | --- | ||||||
Target Release: | 3.11.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
Cause: Image pruning stopped on encountering any unexpected error while deleting blobs.
Consequence: In case of image deletion error, image pruning failed to delete any image object from etcd.
Fix: Images are now being pruned concurrently in separated jobs.
Result: Image pruning does not stop on a single unexpected blob deletion failure.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2018-10-11 07:19:09 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Clayton Coleman
2018-04-15 21:08:57 UTC
While fixing this i'd also like to add a "--ignore-invalid-references" flag which would default to false (current behavior) but when set to true, would allow pruning to proceed even if there are deploymentconfigs/etc that have invalid image references. I believe that today a single invalid image reference in a DC prevents pruning from proceeding because we're afraid that we might prune something the user was trying to reference. I'd be in favor of the iterative deletions: first image's blobs, than its layers ... then the image - while storting them by the number of unique layers they have (from the highest to lowest). In each iteration deleting just the blobs referenced only by the image to be deleted. I find retries much harder to design and reason about. Are we aware of all the errors that may happen now or in the future? What errors can be retried and what not? What errors we want to retry always? Which error just few times? How many times shall it be? @michal - Advice on how to verify this? How to force or simulate errors deleting image blobs? @mike making part of registry's storage read-only will do the job. The algorithm should attempt to prune all of it and not stop on the first permission error. I am trying to verify this bug with below steps: 1. Configure registry storage backend to emptyDir 2. Push image to internal registry 3. Locate the image location like below: /var/lib/origin/openshift.local.volumes/pods/82d40d38-acff-11e8-9e95-00163e008c9e/volumes/kubernetes.io~empty-dir 4. Change registry-storage to read-only as below: drwxrwsrwx. 3 root 1000000000 37 Aug 31 05:29 registry-storage to dr--r-Sr--. 3 root 1000000000 37 Aug 31 05:29 registry-storage 5.Try to prune images, and there is no error appears and image prune finishes in the end. Does this mean the bug has been fixed? If no, could you correct my step if there is any mistake? Thanks! Hi Wenjing, could you be more specific about the options you pass to the pruner and the output you see? There should definitely be some errors printed resulting from the blobs not being pruned. I just run below command to prune images: # oc adm prune images --token=dfc0j99FNURtMfZylGzGHGg6ziMWIvNGrL7E1I3D40g After I run below command, can see some warning and errors, but I am not sure whether they are related, I will attach the output in attachment: oc adm prune images --token=dfc0j99FNURtMfZylGzGHGg6ziMWIvNGrL7E1I3D40g --confirm=true --keep-tag-revisions=0 --keep-younger-than=0 Also I have below questions: what error am I supposed to see? the pushed images with read-only storage should not be pruned, right? Created attachment 1480073 [details]
output of oc adm prune
On the read-only storage, I see a lot of errors like this: Deleting layer link sha256:39fe8b1d3a9cb13a361204c23cf4e342d53184b4440492fa724f4aeb4eb1d64f in repository openshift/jboss-webserver30-tomcat7-openshift error deleting repository openshift/jboss-webserver30-tomcat7-openshift layer link sha256:39fe8b1d3a9cb13a361204c23cf4e342d53184b4440492fa724f4aeb4eb1d64f from the registry: 500 Internal Server Error Deleting layer link sha256:39fe8b1d3a9cb13a361204c23cf4e342d53184b4440492fa724f4aeb4eb1d64f in repository openshift/redhat-sso71-openshift error deleting repository openshift/redhat-sso71-openshift layer link sha256:39fe8b1d3a9cb13a361204c23cf4e342d53184b4440492fa724f4aeb4eb1d64f from the registry: 500 Internal Server Error Deleting layer link sha256:39fe8b1d3a9cb13a361204c23cf4e342d53184b4440492fa724f4aeb4eb1d64f in repository openshift/jboss-amq-62 error deleting repository openshift/jboss-amq-62 layer link sha256:39fe8b1d3a9cb13a361204c23cf4e342d53184b4440492fa724f4aeb4eb1d64f from the registry: 500 Internal Server Error Deleting layer link sha256:39fe8b1d3a9cb13a361204c23cf4e342d53184b4440492fa724f4aeb4eb1d64f in repository openshift/jboss-webserver31-tomcat8-openshift error deleting repository openshift/jboss-webserver31-tomcat8-openshift layer link sha256:39fe8b1d3a9cb13a361204c23cf4e342d53184b4440492fa724f4aeb4eb1d64f from the registry: 500 Internal Server Error Deleting layer link sha256:39fe8b1d3a9cb13a361204c23cf4e342d53184b4440492fa724f4aeb4eb1d64f in repository openshift/jboss-eap70-openshift error deleting repository openshift/jboss-eap70-openshift layer link sha256:39fe8b1d3a9cb13a361204c23cf4e342d53184b4440492fa724f4aeb4eb1d64f from the registry: 500 Internal Server Error Deleting layer link sha256:39fe8b1d3a9cb13a361204c23cf4e342d53184b4440492fa724f4aeb4eb1d64f in repository openshift/redhat-openjdk18-openshift error deleting repository openshift/redhat-openjdk18-openshift layer link sha256:39fe8b1d3a9cb13a361204c23cf4e342d53184b4440492fa724f4aeb4eb1d64f from the registry: 500 Internal Server Error Deleting layer link sha256:6b6ea5a6c4ac85e235d63e1326c3a6f624c8d83a1ae27429a34ebecd90cbe52c in repository openshift/redhat-openjdk18-openshift error deleting repository openshift/redhat-openjdk18-openshift layer link sha256:6b6ea5a6c4ac85e235d63e1326c3a6f624c8d83a1ae27429a34ebecd90cbe52c from the registry: 500 Internal Server Error Deleting blob sha256:6b6ea5a6c4ac85e235d63e1326c3a6f624c8d83a1ae27429a34ebecd90cbe52c error deleting blob sha256:6b6ea5a6c4ac85e235d63e1326c3a6f624c8d83a1ae27429a34ebecd90cbe52c from the registry: 500 Internal Server Error Deleting blob sha256:5ffd5b1ec8e4264cdd62a3063ee56e370a973e0777da9c8c6f3a5f12e22fe6d5 error deleting blob sha256:5ffd5b1ec8e4264cdd62a3063ee56e370a973e0777da9c8c6f3a5f12e22fe6d5 from the registry: 500 Internal Server Error Deleting manifest link sha256:5ffd5b1ec8e4264cdd62a3063ee56e370a973e0777da9c8c6f3a5f12e22fe6d5 in repository openshift/redhat-openjdk18-openshift error deleting manifest link sha256:5ffd5b1ec8e4264cdd62a3063ee56e370a973e0777da9c8c6f3a5f12e22fe6d5 from repository openshift/redhat-openjdk18-openshift: 500 Internal Server Error Deleted 42 objects out of 5213. Failed to delete 5171 objects. error: failed I have a recent oc client but an older release of docker-registry deployed (not sure if it still returns 500 or some other error), but the output should be quite similar. Your standard error output seems to be redirected somewhere else. Anyway, if the pruner doesn't stop at the first error and continues to delete the other blobs, it means the bug has been addressed. Please compare with the older version of oc client. When I use oc v3.9.40 oc to test with read-only repository, it stopped after a while (but not at the first error) since it doesn't have below ending which is different with oc v3.11.0-0.25.0: Deleted 728 objects out of 776. Failed to delete 48 objects. error: failed Can this be regarded as issues has been fixed? Please ignore my above comment #13, I tried several times and found oc v3.9.40 stuck at the first error like below: error pruning manifest sha256:5e8e0509e829bb8f990249135a36e81a3ecbe94294e7a185cc14616e5fad96bd in the repository docker-registry.default.svc:5000/sunny/myimage6: 500 Internal Server Error But oc v3.11.0-0.25.0 finishes pruning after the error appears: error deleting manifest link sha256:5e8e0509e829bb8f990249135a36e81a3ecbe94294e7a185cc14616e5fad96bd from repository sunny/myimage7: 500 Internal Server Error Deleting blob sha256:09cf45760aea204766c1668c497f8571c67fbfa8d81ec03e4293f3fa0b9945d6 W0903 02:30:09.020965 6113 prune.go:1681] Unable to prune layer https://docker-registry.default.svc:5000/v2/openshift/postgresql/blobs/sha256:8275392acc4a34b880bc61b1482eec5049e67ae82ddd10ac9450ad6fdfdf3b74, returned 404 Not Found Deleting blob sha256:7bd78273b66657ac8b3e800506047866ce94eea0b50e23ecdb76b0a8fbc5cdcc Deleting blob sha256:642d3edf81580395cbafe161ea49ff5d988134d3ff8fe2240a5e30dc884cfcc8 Deleting blob sha256:610da2480f27448225a79ca668b755d8a90ecd698d85044f3902e4461b9bbfe2 Deleting blob sha256:c196631bd9ac47f0e62cd3b0160159ccf34a88b47a9487a0c3dd3c55b457d607 Deleting manifest link sha256:642d3edf81580395cbafe161ea49ff5d988134d3ff8fe2240a5e30dc884cfcc8 in repository openshift/python So I will verify this bug now. Thanks for your help, Michal! > But oc v3.11.0-0.25.0 finishes pruning after the error appears:
Yes, that's what I meant. Glad it worked out.
Cheers.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2652 |