Bug 1662132

Summary: When using the oc new-app to create a new build, the builds are creating incomplete multipart uploads to S3 and incomplete uploads are not cleaned automatically
Product: OpenShift Container Platform Reporter: Aditya Deshpande <adeshpan>
Component: BuildAssignee: Oleg Bulatov <obulatov>
Status: CLOSED ERRATA QA Contact: Wenjing Zheng <wzheng>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.9.0CC: adeshpan, agladkov, aos-bugs, bparees, gabisoft, obulatov, rkshirsa, wzheng
Target Milestone: ---   
Target Release: 3.9.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: the cluster role system:image-pruner was required for all DELETE requests to the registry. Consequence: the regular client couldn't cancel its uploads and S3 multipart uploads was piling up. Fix: accept DELETE requests for uploads from clients who are allowed to write into them. Result: clients are able to cancel their uploads.
Story Points: ---
Clone Of:
: 1668411 1668412 1668413 (view as bug list) Environment:
Last Closed: 2019-02-20 08:46:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1668411, 1668412, 1668413    
Attachments:
Description Flags
Registry log from OCP v3.11.59 none

Description Aditya Deshpande 2018-12-26 10:37:03 UTC
Description of problem:

When using the oc new-app to create a new build, the builds are creating incomplete multipart uploads to S3 (as well as on other storage).
Build is completed successfully.
Also, The incomplete uploads are not cleaned. Currently, the incomplete multipart have to be deleted manually. If incomplete uploads get too many in s3 storage, the docker push does stop working with the HTTP error 500.


Version-Release number of selected component (if applicable):
OpenShift v3.9.51

How reproducible:
Reproducible with 'oc new-app --name e2e https://github.com/appuio/endtoend-docker-helloworld.git -n test'
It does also occur with the RHEL image registry.access.redhat.com/rhscl/httpd-24-rhel7:latest, but not always.

Actual results:
The incomplete uploads have seen in s3 storage which has to be cleaned manually.

Expected results:
There should not be any incomplete multipart uploads of image to s3 stoarge.

Comment 2 gabisoft 2018-12-28 16:27:22 UTC
Retested this and created an BuildConfig directly with the same result. There is no difference if ImageStreams are used or not.

apiVersion: build.openshift.io/v1
kind: BuildConfig
metadata:
  name: e2e-12
  namespace: test
spec:
  output:
    to:
      kind: DockerImage
      name: 'docker-registry.default.svc:5000/test/e2e-12'
  source:
    git:
      uri: 'https://github.com/appuio/endtoend-docker-helloworld.git'
    type: Git
  strategy:
    dockerStrategy:
      from:
        kind: ImageStreamTag
        name: 'httpd-24-centos7:latest'
    type: Docker

$ ./mc ls --recursive --incomplete bucket/registry
[2018-12-28 17:19:52 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/18550046-da65-4189-b96a-23b6998efe6a/data
[2018-12-28 17:19:56 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/1f89f3a9-9fda-4a84-b4a2-6f45ba4084c2/data
[2018-12-28 17:19:54 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/4b67e7af-ab05-40dc-9e02-ecb79bff2422/data
[2018-12-28 17:19:51 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/688ffddb-4633-49cb-84e5-85a4218b404d/data
[2018-12-28 17:19:52 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/6931175f-c3d4-4ca4-8e43-11dff2ec175e/data
[2018-12-28 17:19:51 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/768b3be9-a6df-4cf5-8dc3-60342705cfa1/data
[2018-12-28 17:19:51 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/7e0a3fba-eaeb-4363-a066-2036547ea6d1/data
[2018-12-28 17:19:50 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/8e91ccb4-90b2-4774-a8b5-8c0ff30ee0b9/data
[2018-12-28 17:19:53 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/916da2d6-5779-45f2-8ff3-94e458c4ccc8/data
[2018-12-28 17:19:49 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/a0697689-e978-4221-bc3e-4f1a4f6d477b/data
[2018-12-28 17:19:51 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/a726b6f0-ba4a-4cbc-aceb-176101a3b32b/data
[2018-12-28 17:19:51 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/cc87268e-1b2b-4b23-8134-b91a33f65cfe/data
[2018-12-28 17:19:52 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/e442d3dd-70bd-494b-811f-3d91752eb45c/data

Comment 4 Corey Daley 2019-01-02 15:46:48 UTC
There is a setting that can be enabled on the S3 bucket to abort incomplete multipart uploads (you can read more about it here: https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html#mpu-abort-incomplete-mpu-lifecycle-config)

We recommend that you set this option on the S3 bucket that is being used for the registry.

We have opened an issue on the github repository to enable this option by default on S3 buckets that the registry operator creates.
https://github.com/openshift/cluster-image-registry-operator/issues/128

Comment 5 gabisoft 2019-01-02 17:05:55 UTC
The problem with a S3 bucket based cleanup is, that this is different to implement on different S3 APIs. In this case, it is an Dell EMC ECS Storage. Ceph S3 is an other often used storage. So this should definitiv handled by the client of the docker-registry.

Comment 6 Corey Daley 2019-01-02 17:27:38 UTC
If those storage mediums are attempting to emulate the S3 apis and features, then it would be up to them to support them correctly/fully, or up to the user to be able to configure the incomplete multi-part uploads cleanup.  

Not every storage type (GCS/Azure/Filesystem) may support multi-part uploads, so those uploads would be dealt with appropriately by their driver.

Comment 7 gabisoft 2019-01-03 08:52:36 UTC
Because I have seen that similar also on other file-based storage backends (Gluster). I assume, that this is a general issue, which does occur from time to time. 

At least a "dockerregistry -prune delete" should remove those leftovers, but it does not.

Comment 19 gabisoft 2019-01-25 15:04:19 UTC
I did configure a lifecycle configuration on the S3 bucket, but it doesn't delete the invalid multipart uploads fast enough. So I still have to delete them manually:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<LifecycleConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
    <Rule>
        <ID>lifecycle-v2-abortmpu-per-day</ID>
        <Filter/>
        <Status>Enabled</Status>
        <AbortIncompleteMultipartUpload>
          <DaysAfterInitiation>1</DaysAfterInitiation>
        </AbortIncompleteMultipartUpload>
    </Rule>
</LifecycleConfiguration>

So thanks for the PR! Hopefully this will resolve the issue.

Comment 20 Oleg Bulatov 2019-01-28 13:03:33 UTC
https://github.com/openshift/image-registry/pull/151 (waiting to be merged)

Comment 21 Oleg Bulatov 2019-01-29 11:51:25 UTC
Merged.

Comment 26 Wenjing Zheng 2019-02-12 08:47:07 UTC
Created attachment 1533936 [details]
Registry log from OCP v3.11.59

Comment 28 Wenjing Zheng 2019-02-13 10:12:18 UTC
Thanks Oleg! I can reproduce with your steps and will verify this bug as below:

Verified with below version:
openshift v3.9.68
kubernetes v1.9.1+a0ce1bc657
etcd 3.2.16

Comment 34 errata-xmlrpc 2019-02-20 08:46:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0331