Bug 1408676 - [3.3+][Registry][Pruning] Orphaned blobs cannot be pruned
Summary: [3.3+][Registry][Pruning] Orphaned blobs cannot be pruned
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Image Registry
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 3.3.1
Assignee: Michal Minar
QA Contact: ge liu
URL:
Whiteboard:
Depends On:
Blocks: 1467340 1471844 1472438 1479340 1499314 1499315
TreeView+ depends on / blocked
 
Reported: 2016-12-26 10:57 UTC by Jaspreet Kaur
Modified: 2023-09-15 00:00 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: There was no way to prune orphaned blobs on integrated registry's storage. Consequence: The orphaned blobs could pile up and consume a considerable amount of free space. Fix: We provide a new low-level utility that is run inside of registry's container and removes the orphaned blobs. Result: Customers are now able to remove orphaned blobs retrieve storage space.
Clone Of:
: 1467340 1479340 1499314 1499315 (view as bug list)
Environment:
Last Closed: 2017-08-31 17:00:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:1828 0 normal SHIPPED_LIVE OpenShift Container Platform 3.5, 3.4, and 3.3 bug fix update 2017-08-31 20:59:56 UTC

Description Jaspreet Kaur 2016-12-26 10:57:25 UTC
Description of problem:   The problem occurs when an image is being pushed while `oc adm prune images` is running. It looks like `oc adm prune images` can delete blobs of images being uploaded since they are not yet referenced by a manifest (the manifest is uploaded last). The push operation finishes successfully but pull operations on the affected images fail with "unexpected EOF". The registry happily serves the missing blobs with HTTP status 200 and size 0. But the Docker client first downloads the manifest and expects to download blobs with the original size, hence the "unexpected EOF" error. 


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Log into the internal Docker registry
2. Run `oc adm prune images` in a loop
3. oc new-project test
4. oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-ex.git
5. oc scale dc ruby-ex --replicas=0
6. Wait for build to complete
7. Delete any Docker images of the `test/ruby-ex` and `centos/ruby-22-centos7` repositories using `docker rmi` to ensure they are pulled from the registry
8. docker pull 172.30.1.1:5000/test/ruby-ex

The pull operation will now fail with a high probability. Should it succeed delete the project and start again at step 3. In case of failure you can find the the manifest present in the registry but one or more blobs missing.

In most cases there is no easy way for end users to recover from the issue. The registry reports the blobs as already present since they are listed in the manifest. Therefore they will not be uploaded again by future push operations.

Actual results: Fails to pull image


Expected results: the image should have pulled without any issue.


Additional info:

Comment 2 Michal Minar 2017-01-10 15:05:07 UTC
@Jaspreet I suspect this is a duplicate of bug 1410434. Could you please verify there is (or isn't) a panic [1] in docker daemon when the pull fails with EOF? 

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1410434#c4

Comment 5 Michal Minar 2017-01-11 12:35:46 UTC
I was able to reproduce but with a different error. I did it like this:

1. run the following command in a loop:

    oadm prune images --confirm --keep-tag-revisions=1 \
        --keep-younger-than=1m --token=${token}

2. pushed the docker.io/golang:1.4.3 image (A) to 172.30.30.30:5000/pjoe/golang
3. built a new image (B) based on (A) adding a new data layer
4. And I pushed it again as 172.30.30.30:5000/pjoe/golang. The push succeeded.
   This shifted original golang image to the second position in a history of
   golang:latest imagestreamtag, making it a candidate for pruning.
5. removed all golang images from my docker daemon
6. pulled the golang image (B) back from the integrated registry:

    docker pull 172.30.30.30:5000/pjoe/golang
    Using default tag: latest
    Trying to pull repository 172.30.30.30:5000/pjoe/golang ...
    latest: Pulling from 172.30.30.30:5000/pjoe/golang
    7268d8f794c4: Downloading
    a3ed95caeb02: Download complete
    d9a49bc2b1b0: Download complete
    b965864d2d45: Download complete
    bad3f2daf720: Downloading
    61db11059a7f: Downloading
    25f4a8c55a9b: Downloading
    d45512a784a3: Downloading
    1bff6f993ec9: Download complete
    unknown blob

The pruner started producing following output slightly before the stop 4
finished:

    I0111 11:16:47.291762   26846 prune.go:257] Creating image pruner with keepYoungerThan=1m0s, keepTagRevisions=1, pruneOverSizeLimit=<nil>, allImages=false
    I0111 11:16:47.291982   26846 prune.go:384] Unable to find image "sha256:61a80f918027452d07b988ad8add8f5fcc9780e45cda63c15ff866adc51b678e" in graph (from tag="stg", revision=0, dockerImageReference=registry.ops.openshift.com/ops/oso-rhel7-zagg-web@sha256:61a80f918027452d07b988ad8add8f5fcc9780e45cda63c15ff866adc51b678e
    ) - skipping
    I0111 11:16:47.292014   26846 prune.go:384] Unable to find image "sha256:13f460af40143865d32b44ef1e3fbf44034d4beacc67a4487cd0ef96361e9aa6" in graph (from tag="latest", revision=0, dockerImageReference=172.30.30.30:5000/pjoe/golang@sha256:13f460af40143865d32b44ef1e3fbf44034d4beacc67a4487cd0ef96361e9aa6) - skipping
    I0111 11:16:47.292327   26846 prune.go:835] Using registry: 172.30.30.30:5000
    Deleting references from image streams to images ...
    STREAM        IMAGE                                                                     TAGS
    pjoe/golang   sha256:89247599498eb346ec7c331d6fda457df2adc659b1db4f4ca69145c198689bab   latest

    Deleting registry repository layer links ...
    REPO          LAYER LINK
    pjoe/golang   sha256:7268d8f794c449e593d3a48f62e7e22b7c3a4b6e615caaf9494ec3cb2d48f503
    pjoe/golang   sha256:61db11059a7f7b24e125090c65f70b544236ee47090e1deadc5962969092f776
    pjoe/golang   sha256:25f4a8c55a9b41b38beed2e3e9e0d43e76655173450a9d4302bd0de73628878d
    pjoe/golang   sha256:b965864d2d455f06e4ad8165d12456219dcaeed2e49b0f13ada623aa00d9e822
    pjoe/golang   sha256:bad3f2daf720952bee23d5dc4baf526bfaac8f0629de7db640058c3d8f632c3e
    pjoe/golang   sha256:d45512a784a33b701cb3b02b025dec57ccedeb84e8fc2d907d8cf9ade1801559
    pjoe/golang   sha256:d9a49bc2b1b0cdba4093d4ef5d276883a81a3141f05bdb46eb8bacb5b5d94acf

    Deleting registry layer blobs ...
    BLOB
    sha256:7268d8f794c449e593d3a48f62e7e22b7c3a4b6e615caaf9494ec3cb2d48f503
    sha256:61db11059a7f7b24e125090c65f70b544236ee47090e1deadc5962969092f776
    sha256:25f4a8c55a9b41b38beed2e3e9e0d43e76655173450a9d4302bd0de73628878d
    sha256:b965864d2d455f06e4ad8165d12456219dcaeed2e49b0f13ada623aa00d9e822
    sha256:bad3f2daf720952bee23d5dc4baf526bfaac8f0629de7db640058c3d8f632c3e
    sha256:d45512a784a33b701cb3b02b025dec57ccedeb84e8fc2d907d8cf9ade1801559
    sha256:d9a49bc2b1b0cdba4093d4ef5d276883a81a3141f05bdb46eb8bacb5b5d94acf

    Deleting registry repository manifest data ...
    W0111 11:18:02.125464   26846 prune.go:1029] Unable to prune layer http://172.30.30.30:5000/v2/pjoe/golang/manifests/sha256:89247599498eb346ec7c331d6fda457df2adc659b1db4f4ca69145c198689bab, returned 404 Not Found
    REPO          IMAGE
    pjoe/golang   sha256:89247599498eb346ec7c331d6fda457df2adc659b1db4f4ca69145c198689bab

    Deleting images from server ...
    IMAGE
    sha256:89247599498eb346ec7c331d6fda457df2adc659b1db4f4ca69145c198689bab

It stopped execution after the step 4 completed.

What happened under the hood:

1. registry received all the blobs of image B
2. pruner collected all the images (it found just A, not B)
3. registry received manifest and created image B in etcd
4. registry created update golang image stream
   - it shifted image A to index 1
   - it inserted image B at index 0
5. pruner collected all the image streams
   - the golang:latest istag contains references both images A and B
6. pruner marked image A as a candidate for pruning because
   - it's older than threshold
   - it occurs at index 1 in revision history of golang:latest istag, which is
     above the threshold
7. pruner removed image A and all its layers

Removing of image A is correct. It's not correct to remove its layers though since they are referenced by B. B is, however, not known to the pruner.

This could be solved by fetching images that were not found during a
processing of image streams. I'll open a PR. Nevertheless, it may be unrelated to customer's issue.

Comment 12 Jaspreet Kaur 2017-03-30 09:41:42 UTC
Hello,

We need to open this bugzilla till it is officially released.

Regards,
Jaspreet

Comment 18 Michal Minar 2017-05-15 09:19:40 UTC
Sorry for delay. I didn't make any progress on this and the pruning rework is still on the queue. As agreed on IRC, we will provide a temporary solution. I'll make a bash script that will be able to prune the no-longer referenced blobs from the registry storage in read-only mode.

Note that the script won't address any of 404 errors like:

   W0512 09:17:50.392748   96693 prune.go:972] Unable to prune layer http://172.30.128.186:5000/v2/zis-dev/angebot/blobs/sha256:... 

  As this needs to wait for the rework.

It will only be able to delete the blobs that the pruning command is not able to prune. So it will only reduce the occupied size of the registry storage.

Comment 27 Michal Minar 2017-07-13 19:33:04 UTC
Backport PR for 3.3: https://github.com/openshift/ose/pull/802

Comment 28 Michal Minar 2017-07-25 14:07:32 UTC
The PR has been merged.

Comment 30 ge liu 2017-08-02 02:22:54 UTC
The build have not ready for testing, change status to modify.

Comment 37 hgomes 2017-08-20 13:12:50 UTC
(In reply to  Michal Minar from comment #35)


1.
\"sha256:bad8bb8186a329f4c29d96f0be2348cf75533df34c41bf8448d7732d3e86efda\" in the graph\nI0722 02:15:10.959332   40540 imagepruner.go:430] Unable to find image 



2.
Failing on "pathwayapi/salesforcedev to remove references to image sha256:c265689de055fa1a0fca8bebf613a1120c4cbd6afafb1378dc93a2f8c7678f65:"
***********************
\nerror updating image stream pathwayapi/salesforcedev to remove references to image sha256:c265689de055fa1a0fca8bebf613a1120c4cbd6afafb1378dc93a2f8c7678f65: imagestreams \"salesforcedev\" cannot be updated: the object has been modified; please apply your changes to the latest version and try again\nerror updating image stream pathwayapi/salesforcedev to remove references to image

Comment 41 errata-xmlrpc 2017-08-31 17:00:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1828

Comment 42 Michal Minar 2017-09-06 15:58:39 UTC
(In reply to hgomes from comment #37)
> \"sha256:bad8bb8186a329f4c29d96f0be2348cf75533df34c41bf8448d7732d3e86efda\"
> in the graph\nI0722 02:15:10.959332   40540 imagepruner.go:430] Unable to
> find image 
Should be easy to fix. The dangling references to deleted images can be safely removed from image streams. The warning should still be printed for the first time the pruner hits them. Next time the pruner runs, these messages shall be gone.
>
> 2.
> Failing on "pathwayapi/salesforcedev to remove references to image
> sha256:c265689de055fa1a0fca8bebf613a1120c4cbd6afafb1378dc93a2f8c7678f65:"
> ***********************
> \nerror updating image stream pathwayapi/salesforcedev to remove references
> to image
> sha256:c265689de055fa1a0fca8bebf613a1120c4cbd6afafb1378dc93a2f8c7678f65:
> imagestreams \"salesforcedev\" cannot be updated: the object has been
> modified; please apply your changes to the latest version and try
> again\nerror updating image stream pathwayapi/salesforcedev to remove
> references to image

Already addressed by [1]. It just needs to be back-ported.

[1] https://github.com/openshift/origin/pull/15899

Could you please open a separate bugzilla where we can address the remaining issues?

Comment 43 Red Hat Bugzilla 2023-09-15 00:00:55 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.