Bug 1655641 - Docker Registry AWS S3: Health Check will fail, if the Bucket is empty
Summary: Docker Registry AWS S3: Health Check will fail, if the Bucket is empty
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Image Registry
Version: 3.7.0
Hardware: x86_64
OS: Linux
Target Milestone: ---
: 3.11.z
Assignee: Oleg Bulatov
QA Contact: Wenjing Zheng
Depends On:
TreeView+ depends on / blocked
Reported: 2018-12-03 15:37 UTC by Gabriel Stein
Modified: 2019-04-11 05:38 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: the registry health check uses head requests for /. Consequence: when S3 doesn't have any objects, it gets PathNotFound. Fix: treat PathNotFound as a success. Result: health check works for empty buckets.
Clone Of:
Last Closed: 2019-04-11 05:38:23 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0636 None None None 2019-04-11 05:38:38 UTC

Description Gabriel Stein 2018-12-03 15:37:48 UTC
Description of problem:

Version-Release number of selected component (if applicable):

- OpenShift Container Platform 3.7
- ose-docker-registry:v3.7.72 
- AWS S3 Bucket 

How reproducible:

Steps to Reproduce:
1. Deploy a new registry using a AWS S3 Bucket
2. Make all the adjustments to make Registry possible to use S3 Bucket
3. Start some Pods using this registry

Actual results:

- First seconds customer receive a 200 return Code, then 503 error code

Expected results:

Use the images from this registry using the S3 Storage without problems

I will ask the customer for more logs tomorrow and I will upload here.

# Workaround:

1. Add the Env, to Disable the Storagedrive Healthcheck:
oc env dc/docker-registry REGISTRY_HEALTH_STORAGEDRIVER_ENABLED=false

2. Push an image into the registry

3. remove the Env from the Deploymentconfig:

4. Rollout the registry again, and everything is running well

# ToDo: Post all the logs!

Comment 1 Oleg Bulatov 2019-01-11 13:19:39 UTC
It looks like it was fixed in master: https://github.com/openshift/image-registry/blame/f3cb2b136514123ff11eab01fea2c25699cb2a5c/vendor/github.com/docker/distribution/registry/handlers/app.go#L375-L377

Ben, should we backport this to 3.11, 3.10, 3.9, 3.8 and 3.7?

Comment 2 Oleg Bulatov 2019-01-11 13:42:52 UTC

Comment 4 Ben Parees 2019-02-20 22:22:59 UTC
Oleg is this still on your radar to backport?

Comment 5 Oleg Bulatov 2019-02-21 10:29:59 UTC
Yes, it is.

Comment 14 Wenjing Zheng 2019-03-25 03:33:21 UTC
Thanks for your reply, Oleg! I tried to manually delete existing registry folder in openshift-qe bucket and enable helthcheck, below error appears when try to push image to registry:

Pushed 4/6 layers, 73% complete
Pushed 4/6 layers, 82% complete
Registry server Address: 
Registry server User Name: serviceaccount
Registry server Email: serviceaccount@example.org
Registry server Password: <<non-empty>>
error: build error: Failed to push image: unknown blob

No such issue after disable helthcheck.

Comment 15 Wenjing Zheng 2019-03-25 05:46:56 UTC
Here is log from docker registry:
time="2019-03-25T03:26:53.969524477Z" level=info msg="response completed with error" err.code="blob unknown" err.detail="sha256:efca033ff9b91f228bfa7292a16ba6ad3cf3f12e130d515173b9358a9974f267" err.message="blob unknown to registry" go.version=go1.9.4 http.request.host="docker-registry.default.svc:5000" http.request.id=e87d80e2-f40b-401e-b201-79b17af178a0 http.request.method=HEAD http.request.remoteaddr="" http.request.uri="/v2/default/ruby-ex/blobs/sha256:efca033ff9b91f228bfa7292a16ba6ad3cf3f12e130d515173b9358a9974f267" http.request.useragent="docker/1.13.1 go/go1.10.8 kernel/3.10.0-957.10.1.el7.x86_64 os/linux arch/amd64 UpstreamClient(go-dockerclient)" http.response.contenttype="application/json; charset=utf-8" http.response.duration=84.895517ms http.response.status=404 http.response.written=157 instance.id=3d36c0ef-506c-405f-b2e4-e9369a71236d openshift.auth.user="system:serviceaccount:default:builder" openshift.auth.userid=f87d3aa2-4ea7-11e9-a665-0e674847d292 vars.digest="sha256:efca033ff9b91f228bfa7292a16ba6ad3cf3f12e130d515173b9358a9974f267" vars.name=default/ruby-ex

Comment 16 Oleg Bulatov 2019-03-25 12:41:09 UTC
The error from the build is about storage integrity. There should be another log line with level=error. 404 for HEAD requests are OK.

To check this BZ you don't need to push anything to the registry. When the storage health check is enabled and it fails, you should get Unhealth event for the pod: "Liveness probe failed: ..." [1].

Health checks don't affect anything else.

[1]: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/

Comment 20 Wenjing Zheng 2019-03-26 10:31:08 UTC
Verify this bug with below version: openshift v3.11.98
openshift3/ose-docker-registry           v3.11               8be352e27714

Warning like this will appear:
time="2019-03-26T07:18:30.468524184Z" level=debug msg="s3aws.Stat(\"/\")" go.version=go1.9.4 instance.id=ba533907-eaa9-46b0-a997-d193c225af01 trace.duration=53.683535ms trace.file=/builddir/build/BUILD/atomic-openshift-dockerregistry-git-0.8276247/_output/local/go/src/github.com/openshift/image-registry/vendor/github.com/docker/distribution/registry/storage/driver/base/base.go trace.func="github.com/openshift/image-registry/vendor/github.com/docker/distribution/registry/storage/driver/base.(*Base).Stat" trace.id=3365cce9-1f51-4a6d-8fdc-020511510784 trace.line=137

Comment 22 errata-xmlrpc 2019-04-11 05:38:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.