Bug 1662132 - When using the oc new-app to create a new build, the builds are creating incomplete multipart uploads to S3 and incomplete uploads are not cleaned automatically
Summary: When using the oc new-app to create a new build, the builds are creating inco...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Build
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 3.9.z
Assignee: Oleg Bulatov
QA Contact: Wenjing Zheng
URL:
Whiteboard:
Depends On:
Blocks: 1668411 1668412 1668413
TreeView+ depends on / blocked
 
Reported: 2018-12-26 10:37 UTC by Aditya Deshpande
Modified: 2022-03-13 16:37 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: the cluster role system:image-pruner was required for all DELETE requests to the registry. Consequence: the regular client couldn't cancel its uploads and S3 multipart uploads was piling up. Fix: accept DELETE requests for uploads from clients who are allowed to write into them. Result: clients are able to cancel their uploads.
Clone Of:
: 1668411 1668412 1668413 (view as bug list)
Environment:
Last Closed: 2019-02-20 08:46:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Registry log from OCP v3.11.59 (624.07 KB, text/plain)
2019-02-12 08:47 UTC, Wenjing Zheng
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0331 0 None None None 2019-02-20 08:47:02 UTC

Description Aditya Deshpande 2018-12-26 10:37:03 UTC
Description of problem:

When using the oc new-app to create a new build, the builds are creating incomplete multipart uploads to S3 (as well as on other storage).
Build is completed successfully.
Also, The incomplete uploads are not cleaned. Currently, the incomplete multipart have to be deleted manually. If incomplete uploads get too many in s3 storage, the docker push does stop working with the HTTP error 500.


Version-Release number of selected component (if applicable):
OpenShift v3.9.51

How reproducible:
Reproducible with 'oc new-app --name e2e https://github.com/appuio/endtoend-docker-helloworld.git -n test'
It does also occur with the RHEL image registry.access.redhat.com/rhscl/httpd-24-rhel7:latest, but not always.

Actual results:
The incomplete uploads have seen in s3 storage which has to be cleaned manually.

Expected results:
There should not be any incomplete multipart uploads of image to s3 stoarge.

Comment 2 gabisoft 2018-12-28 16:27:22 UTC
Retested this and created an BuildConfig directly with the same result. There is no difference if ImageStreams are used or not.

apiVersion: build.openshift.io/v1
kind: BuildConfig
metadata:
  name: e2e-12
  namespace: test
spec:
  output:
    to:
      kind: DockerImage
      name: 'docker-registry.default.svc:5000/test/e2e-12'
  source:
    git:
      uri: 'https://github.com/appuio/endtoend-docker-helloworld.git'
    type: Git
  strategy:
    dockerStrategy:
      from:
        kind: ImageStreamTag
        name: 'httpd-24-centos7:latest'
    type: Docker

$ ./mc ls --recursive --incomplete bucket/registry
[2018-12-28 17:19:52 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/18550046-da65-4189-b96a-23b6998efe6a/data
[2018-12-28 17:19:56 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/1f89f3a9-9fda-4a84-b4a2-6f45ba4084c2/data
[2018-12-28 17:19:54 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/4b67e7af-ab05-40dc-9e02-ecb79bff2422/data
[2018-12-28 17:19:51 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/688ffddb-4633-49cb-84e5-85a4218b404d/data
[2018-12-28 17:19:52 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/6931175f-c3d4-4ca4-8e43-11dff2ec175e/data
[2018-12-28 17:19:51 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/768b3be9-a6df-4cf5-8dc3-60342705cfa1/data
[2018-12-28 17:19:51 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/7e0a3fba-eaeb-4363-a066-2036547ea6d1/data
[2018-12-28 17:19:50 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/8e91ccb4-90b2-4774-a8b5-8c0ff30ee0b9/data
[2018-12-28 17:19:53 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/916da2d6-5779-45f2-8ff3-94e458c4ccc8/data
[2018-12-28 17:19:49 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/a0697689-e978-4221-bc3e-4f1a4f6d477b/data
[2018-12-28 17:19:51 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/a726b6f0-ba4a-4cbc-aceb-176101a3b32b/data
[2018-12-28 17:19:51 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/cc87268e-1b2b-4b23-8134-b91a33f65cfe/data
[2018-12-28 17:19:52 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/e442d3dd-70bd-494b-811f-3d91752eb45c/data

Comment 4 Corey Daley 2019-01-02 15:46:48 UTC
There is a setting that can be enabled on the S3 bucket to abort incomplete multipart uploads (you can read more about it here: https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html#mpu-abort-incomplete-mpu-lifecycle-config)

We recommend that you set this option on the S3 bucket that is being used for the registry.

We have opened an issue on the github repository to enable this option by default on S3 buckets that the registry operator creates.
https://github.com/openshift/cluster-image-registry-operator/issues/128

Comment 5 gabisoft 2019-01-02 17:05:55 UTC
The problem with a S3 bucket based cleanup is, that this is different to implement on different S3 APIs. In this case, it is an Dell EMC ECS Storage. Ceph S3 is an other often used storage. So this should definitiv handled by the client of the docker-registry.

Comment 6 Corey Daley 2019-01-02 17:27:38 UTC
If those storage mediums are attempting to emulate the S3 apis and features, then it would be up to them to support them correctly/fully, or up to the user to be able to configure the incomplete multi-part uploads cleanup.  

Not every storage type (GCS/Azure/Filesystem) may support multi-part uploads, so those uploads would be dealt with appropriately by their driver.

Comment 7 gabisoft 2019-01-03 08:52:36 UTC
Because I have seen that similar also on other file-based storage backends (Gluster). I assume, that this is a general issue, which does occur from time to time. 

At least a "dockerregistry -prune delete" should remove those leftovers, but it does not.

Comment 19 gabisoft 2019-01-25 15:04:19 UTC
I did configure a lifecycle configuration on the S3 bucket, but it doesn't delete the invalid multipart uploads fast enough. So I still have to delete them manually:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<LifecycleConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
    <Rule>
        <ID>lifecycle-v2-abortmpu-per-day</ID>
        <Filter/>
        <Status>Enabled</Status>
        <AbortIncompleteMultipartUpload>
          <DaysAfterInitiation>1</DaysAfterInitiation>
        </AbortIncompleteMultipartUpload>
    </Rule>
</LifecycleConfiguration>

So thanks for the PR! Hopefully this will resolve the issue.

Comment 20 Oleg Bulatov 2019-01-28 13:03:33 UTC
https://github.com/openshift/image-registry/pull/151 (waiting to be merged)

Comment 21 Oleg Bulatov 2019-01-29 11:51:25 UTC
Merged.

Comment 26 Wenjing Zheng 2019-02-12 08:47:07 UTC
Created attachment 1533936 [details]
Registry log from OCP v3.11.59

Comment 28 Wenjing Zheng 2019-02-13 10:12:18 UTC
Thanks Oleg! I can reproduce with your steps and will verify this bug as below:

Verified with below version:
openshift v3.9.68
kubernetes v1.9.1+a0ce1bc657
etcd 3.2.16

Comment 34 errata-xmlrpc 2019-02-20 08:46:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0331


Note You need to log in before you can comment on or make changes to this bug.