Bug 1655641

Summary: Docker Registry AWS S3: Health Check will fail, if the Bucket is empty
Product: OpenShift Container Platform Reporter: Gabriel Stein <gferrazs>
Component: Image RegistryAssignee: Oleg Bulatov <obulatov>
Status: CLOSED ERRATA QA Contact: Wenjing Zheng <wzheng>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.7.0CC: aos-bugs, aos-storage-staff, bparees, jokerman, mmccomas, obulatov
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: the registry health check uses head requests for /. Consequence: when S3 doesn't have any objects, it gets PathNotFound. Fix: treat PathNotFound as a success. Result: health check works for empty buckets.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-04-11 05:38:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gabriel Stein 2018-12-03 15:37:48 UTC
Description of problem:

Version-Release number of selected component (if applicable):

- OpenShift Container Platform 3.7
- ose-docker-registry:v3.7.72 
- AWS S3 Bucket 

How reproducible:

Steps to Reproduce:
1. Deploy a new registry using a AWS S3 Bucket
2. Make all the adjustments to make Registry possible to use S3 Bucket
3. Start some Pods using this registry

Actual results:

- First seconds customer receive a 200 return Code, then 503 error code


Expected results:

Use the images from this registry using the S3 Storage without problems

I will ask the customer for more logs tomorrow and I will upload here.

# Workaround:

1. Add the Env, to Disable the Storagedrive Healthcheck:
oc env dc/docker-registry REGISTRY_HEALTH_STORAGEDRIVER_ENABLED=false

2. Push an image into the registry

3. remove the Env from the Deploymentconfig:
oc env dc/docker-registry REGISTRY_HEALTH_STORAGEDRIVER_ENABLED-

4. Rollout the registry again, and everything is running well

# ToDo: Post all the logs!

Comment 1 Oleg Bulatov 2019-01-11 13:19:39 UTC
It looks like it was fixed in master: https://github.com/openshift/image-registry/blame/f3cb2b136514123ff11eab01fea2c25699cb2a5c/vendor/github.com/docker/distribution/registry/handlers/app.go#L375-L377

Ben, should we backport this to 3.11, 3.10, 3.9, 3.8 and 3.7?

Comment 2 Oleg Bulatov 2019-01-11 13:42:52 UTC
https://github.com/docker/distribution/issues/2292

Comment 4 Ben Parees 2019-02-20 22:22:59 UTC
Oleg is this still on your radar to backport?

Comment 5 Oleg Bulatov 2019-02-21 10:29:59 UTC
Yes, it is.

Comment 14 Wenjing Zheng 2019-03-25 03:33:21 UTC
Thanks for your reply, Oleg! I tried to manually delete existing registry folder in openshift-qe bucket and enable helthcheck, below error appears when try to push image to registry:

Pushed 4/6 layers, 73% complete
Pushed 4/6 layers, 82% complete
Registry server Address: 
Registry server User Name: serviceaccount
Registry server Email: serviceaccount
Registry server Password: <<non-empty>>
error: build error: Failed to push image: unknown blob

No such issue after disable helthcheck.

Comment 15 Wenjing Zheng 2019-03-25 05:46:56 UTC
Here is log from docker registry:
time="2019-03-25T03:26:53.969524477Z" level=info msg="response completed with error" err.code="blob unknown" err.detail="sha256:efca033ff9b91f228bfa7292a16ba6ad3cf3f12e130d515173b9358a9974f267" err.message="blob unknown to registry" go.version=go1.9.4 http.request.host="docker-registry.default.svc:5000" http.request.id=e87d80e2-f40b-401e-b201-79b17af178a0 http.request.method=HEAD http.request.remoteaddr="10.128.0.1:60080" http.request.uri="/v2/default/ruby-ex/blobs/sha256:efca033ff9b91f228bfa7292a16ba6ad3cf3f12e130d515173b9358a9974f267" http.request.useragent="docker/1.13.1 go/go1.10.8 kernel/3.10.0-957.10.1.el7.x86_64 os/linux arch/amd64 UpstreamClient(go-dockerclient)" http.response.contenttype="application/json; charset=utf-8" http.response.duration=84.895517ms http.response.status=404 http.response.written=157 instance.id=3d36c0ef-506c-405f-b2e4-e9369a71236d openshift.auth.user="system:serviceaccount:default:builder" openshift.auth.userid=f87d3aa2-4ea7-11e9-a665-0e674847d292 vars.digest="sha256:efca033ff9b91f228bfa7292a16ba6ad3cf3f12e130d515173b9358a9974f267" vars.name=default/ruby-ex

Comment 16 Oleg Bulatov 2019-03-25 12:41:09 UTC
The error from the build is about storage integrity. There should be another log line with level=error. 404 for HEAD requests are OK.

To check this BZ you don't need to push anything to the registry. When the storage health check is enabled and it fails, you should get Unhealth event for the pod: "Liveness probe failed: ..." [1].

Health checks don't affect anything else.

[1]: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/

Comment 20 Wenjing Zheng 2019-03-26 10:31:08 UTC
Verify this bug with below version: openshift v3.11.98
openshift3/ose-docker-registry           v3.11               8be352e27714

Warning like this will appear:
time="2019-03-26T07:18:30.468524184Z" level=debug msg="s3aws.Stat(\"/\")" go.version=go1.9.4 instance.id=ba533907-eaa9-46b0-a997-d193c225af01 trace.duration=53.683535ms trace.file=/builddir/build/BUILD/atomic-openshift-dockerregistry-git-0.8276247/_output/local/go/src/github.com/openshift/image-registry/vendor/github.com/docker/distribution/registry/storage/driver/base/base.go trace.func="github.com/openshift/image-registry/vendor/github.com/docker/distribution/registry/storage/driver/base.(*Base).Stat" trace.id=3365cce9-1f51-4a6d-8fdc-020511510784 trace.line=137

Comment 22 errata-xmlrpc 2019-04-11 05:38:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0636