1662132 – When using the oc new-app to create a new build, the builds are creating incomplete multipart uploads to S3 and incomplete uploads are not cleaned automatically

Bug 1662132 - When using the oc new-app to create a new build, the builds are creating incomplete multipart uploads to S3 and incomplete uploads are not cleaned automatically

Summary: When using the oc new-app to create a new build, the builds are creating inco...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Build
Sub Component:
Version:	3.9.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	3.9.z
Assignee:	Oleg Bulatov
QA Contact:	Wenjing Zheng
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1668411 1668412 1668413
TreeView+	depends on / blocked

Reported:	2018-12-26 10:37 UTC by Aditya Deshpande
Modified:	2022-03-13 16:37 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: the cluster role system:image-pruner was required for all DELETE requests to the registry. Consequence: the regular client couldn't cancel its uploads and S3 multipart uploads was piling up. Fix: accept DELETE requests for uploads from clients who are allowed to write into them. Result: clients are able to cancel their uploads.
Clone Of:
Clones:	1668411 1668412 1668413 (view as bug list)
Environment:
Last Closed:	2019-02-20 08:46:56 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Registry log from OCP v3.11.59 (624.07 KB, text/plain) 2019-02-12 08:47 UTC, Wenjing Zheng	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:0331	0	None	None	None	2019-02-20 08:47:02 UTC

Description Aditya Deshpande 2018-12-26 10:37:03 UTC

Description of problem:

When using the oc new-app to create a new build, the builds are creating incomplete multipart uploads to S3 (as well as on other storage).
Build is completed successfully.
Also, The incomplete uploads are not cleaned. Currently, the incomplete multipart have to be deleted manually. If incomplete uploads get too many in s3 storage, the docker push does stop working with the HTTP error 500.


Version-Release number of selected component (if applicable):
OpenShift v3.9.51

How reproducible:
Reproducible with 'oc new-app --name e2e https://github.com/appuio/endtoend-docker-helloworld.git -n test'
It does also occur with the RHEL image registry.access.redhat.com/rhscl/httpd-24-rhel7:latest, but not always.

Actual results:
The incomplete uploads have seen in s3 storage which has to be cleaned manually.

Expected results:
There should not be any incomplete multipart uploads of image to s3 stoarge.

Comment 2 gabisoft 2018-12-28 16:27:22 UTC

Retested this and created an BuildConfig directly with the same result. There is no difference if ImageStreams are used or not.

apiVersion: build.openshift.io/v1
kind: BuildConfig
metadata:
  name: e2e-12
  namespace: test
spec:
  output:
    to:
      kind: DockerImage
      name: 'docker-registry.default.svc:5000/test/e2e-12'
  source:
    git:
      uri: 'https://github.com/appuio/endtoend-docker-helloworld.git'
    type: Git
  strategy:
    dockerStrategy:
      from:
        kind: ImageStreamTag
        name: 'httpd-24-centos7:latest'
    type: Docker

$ ./mc ls --recursive --incomplete bucket/registry
[2018-12-28 17:19:52 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/18550046-da65-4189-b96a-23b6998efe6a/data
[2018-12-28 17:19:56 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/1f89f3a9-9fda-4a84-b4a2-6f45ba4084c2/data
[2018-12-28 17:19:54 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/4b67e7af-ab05-40dc-9e02-ecb79bff2422/data
[2018-12-28 17:19:51 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/688ffddb-4633-49cb-84e5-85a4218b404d/data
[2018-12-28 17:19:52 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/6931175f-c3d4-4ca4-8e43-11dff2ec175e/data
[2018-12-28 17:19:51 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/768b3be9-a6df-4cf5-8dc3-60342705cfa1/data
[2018-12-28 17:19:51 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/7e0a3fba-eaeb-4363-a066-2036547ea6d1/data
[2018-12-28 17:19:50 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/8e91ccb4-90b2-4774-a8b5-8c0ff30ee0b9/data
[2018-12-28 17:19:53 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/916da2d6-5779-45f2-8ff3-94e458c4ccc8/data
[2018-12-28 17:19:49 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/a0697689-e978-4221-bc3e-4f1a4f6d477b/data
[2018-12-28 17:19:51 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/a726b6f0-ba4a-4cbc-aceb-176101a3b32b/data
[2018-12-28 17:19:51 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/cc87268e-1b2b-4b23-8134-b91a33f65cfe/data
[2018-12-28 17:19:52 CET]     0B registry/docker/registry/v2/repositories/test/e2e-12/_uploads/e442d3dd-70bd-494b-811f-3d91752eb45c/data

Comment 4 Corey Daley 2019-01-02 15:46:48 UTC

There is a setting that can be enabled on the S3 bucket to abort incomplete multipart uploads (you can read more about it here: https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html#mpu-abort-incomplete-mpu-lifecycle-config)

We recommend that you set this option on the S3 bucket that is being used for the registry.

We have opened an issue on the github repository to enable this option by default on S3 buckets that the registry operator creates.
https://github.com/openshift/cluster-image-registry-operator/issues/128

Comment 5 gabisoft 2019-01-02 17:05:55 UTC

The problem with a S3 bucket based cleanup is, that this is different to implement on different S3 APIs. In this case, it is an Dell EMC ECS Storage. Ceph S3 is an other often used storage. So this should definitiv handled by the client of the docker-registry.

Comment 6 Corey Daley 2019-01-02 17:27:38 UTC

If those storage mediums are attempting to emulate the S3 apis and features, then it would be up to them to support them correctly/fully, or up to the user to be able to configure the incomplete multi-part uploads cleanup.  

Not every storage type (GCS/Azure/Filesystem) may support multi-part uploads, so those uploads would be dealt with appropriately by their driver.

Comment 7 gabisoft 2019-01-03 08:52:36 UTC

Because I have seen that similar also on other file-based storage backends (Gluster). I assume, that this is a general issue, which does occur from time to time. 

At least a "dockerregistry -prune delete" should remove those leftovers, but it does not.

Comment 16 Oleg Bulatov 2019-01-22 15:32:07 UTC

https://github.com/openshift/image-registry/pull/143

Comment 19 gabisoft 2019-01-25 15:04:19 UTC

I did configure a lifecycle configuration on the S3 bucket, but it doesn't delete the invalid multipart uploads fast enough. So I still have to delete them manually:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<LifecycleConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
    <Rule>
        <ID>lifecycle-v2-abortmpu-per-day</ID>
        <Filter/>
        <Status>Enabled</Status>
        <AbortIncompleteMultipartUpload>
          <DaysAfterInitiation>1</DaysAfterInitiation>
        </AbortIncompleteMultipartUpload>
    </Rule>
</LifecycleConfiguration>

So thanks for the PR! Hopefully this will resolve the issue.

Comment 20 Oleg Bulatov 2019-01-28 13:03:33 UTC

https://github.com/openshift/image-registry/pull/151 (waiting to be merged)

Comment 21 Oleg Bulatov 2019-01-29 11:51:25 UTC

Merged.

Comment 26 Wenjing Zheng 2019-02-12 08:47:07 UTC

Created attachment 1533936 [details]
Registry log from OCP v3.11.59

Comment 28 Wenjing Zheng 2019-02-13 10:12:18 UTC

Thanks Oleg! I can reproduce with your steps and will verify this bug as below:

Verified with below version:
openshift v3.9.68
kubernetes v1.9.1+a0ce1bc657
etcd 3.2.16

Comment 34 errata-xmlrpc 2019-02-20 08:46:56 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0331

Note You need to log in before you can comment on or make changes to this bug.