Bug 1655641

Summary:	Docker Registry AWS S3: Health Check will fail, if the Bucket is empty
Product:	OpenShift Container Platform	Reporter:	Gabriel Stein <gferrazs>
Component:	Image Registry	Assignee:	Oleg Bulatov <obulatov>
Status:	CLOSED ERRATA	QA Contact:	Wenjing Zheng <wzheng>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	3.7.0	CC:	aos-bugs, aos-storage-staff, bparees, jokerman, mmccomas, obulatov
Target Milestone:	---
Target Release:	3.11.z
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: the registry health check uses head requests for /. Consequence: when S3 doesn't have any objects, it gets PathNotFound. Fix: treat PathNotFound as a success. Result: health check works for empty buckets.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-04-11 05:38:23 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Gabriel Stein 2018-12-03 15:37:48 UTC

Description of problem:

Version-Release number of selected component (if applicable):

- OpenShift Container Platform 3.7
- ose-docker-registry:v3.7.72 
- AWS S3 Bucket 

How reproducible:

Steps to Reproduce:
1. Deploy a new registry using a AWS S3 Bucket
2. Make all the adjustments to make Registry possible to use S3 Bucket
3. Start some Pods using this registry

Actual results:

- First seconds customer receive a 200 return Code, then 503 error code


Expected results:

Use the images from this registry using the S3 Storage without problems

I will ask the customer for more logs tomorrow and I will upload here.

# Workaround:

1. Add the Env, to Disable the Storagedrive Healthcheck:
oc env dc/docker-registry REGISTRY_HEALTH_STORAGEDRIVER_ENABLED=false

2. Push an image into the registry

3. remove the Env from the Deploymentconfig:
oc env dc/docker-registry REGISTRY_HEALTH_STORAGEDRIVER_ENABLED-

4. Rollout the registry again, and everything is running well

# ToDo: Post all the logs!

Comment 1 Oleg Bulatov 2019-01-11 13:19:39 UTC

It looks like it was fixed in master: https://github.com/openshift/image-registry/blame/f3cb2b136514123ff11eab01fea2c25699cb2a5c/vendor/github.com/docker/distribution/registry/handlers/app.go#L375-L377

Ben, should we backport this to 3.11, 3.10, 3.9, 3.8 and 3.7?

Comment 2 Oleg Bulatov 2019-01-11 13:42:52 UTC

https://github.com/docker/distribution/issues/2292

Comment 4 Ben Parees 2019-02-20 22:22:59 UTC

Oleg is this still on your radar to backport?

Comment 5 Oleg Bulatov 2019-02-21 10:29:59 UTC

Yes, it is.

Comment 6 Oleg Bulatov 2019-02-27 16:11:11 UTC

https://github.com/openshift/image-registry/pull/167

Comment 14 Wenjing Zheng 2019-03-25 03:33:21 UTC

Thanks for your reply, Oleg! I tried to manually delete existing registry folder in openshift-qe bucket and enable helthcheck, below error appears when try to push image to registry:

Pushed 4/6 layers, 73% complete
Pushed 4/6 layers, 82% complete
Registry server Address: 
Registry server User Name: serviceaccount
Registry server Email: serviceaccount
Registry server Password: <<non-empty>>
error: build error: Failed to push image: unknown blob

No such issue after disable helthcheck.

Comment 15 Wenjing Zheng 2019-03-25 05:46:56 UTC

Here is log from docker registry:
time="2019-03-25T03:26:53.969524477Z" level=info msg="response completed with error" err.code="blob unknown" err.detail="sha256:efca033ff9b91f228bfa7292a16ba6ad3cf3f12e130d515173b9358a9974f267" err.message="blob unknown to registry" go.version=go1.9.4 http.request.host="docker-registry.default.svc:5000" http.request.id=e87d80e2-f40b-401e-b201-79b17af178a0 http.request.method=HEAD http.request.remoteaddr="10.128.0.1:60080" http.request.uri="/v2/default/ruby-ex/blobs/sha256:efca033ff9b91f228bfa7292a16ba6ad3cf3f12e130d515173b9358a9974f267" http.request.useragent="docker/1.13.1 go/go1.10.8 kernel/3.10.0-957.10.1.el7.x86_64 os/linux arch/amd64 UpstreamClient(go-dockerclient)" http.response.contenttype="application/json; charset=utf-8" http.response.duration=84.895517ms http.response.status=404 http.response.written=157 instance.id=3d36c0ef-506c-405f-b2e4-e9369a71236d openshift.auth.user="system:serviceaccount:default:builder" openshift.auth.userid=f87d3aa2-4ea7-11e9-a665-0e674847d292 vars.digest="sha256:efca033ff9b91f228bfa7292a16ba6ad3cf3f12e130d515173b9358a9974f267" vars.name=default/ruby-ex

Comment 16 Oleg Bulatov 2019-03-25 12:41:09 UTC

The error from the build is about storage integrity. There should be another log line with level=error. 404 for HEAD requests are OK.

To check this BZ you don't need to push anything to the registry. When the storage health check is enabled and it fails, you should get Unhealth event for the pod: "Liveness probe failed: ..." [1].

Health checks don't affect anything else.

[1]: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/

Comment 20 Wenjing Zheng 2019-03-26 10:31:08 UTC

Verify this bug with below version: openshift v3.11.98
openshift3/ose-docker-registry           v3.11               8be352e27714

Warning like this will appear:
time="2019-03-26T07:18:30.468524184Z" level=debug msg="s3aws.Stat(\"/\")" go.version=go1.9.4 instance.id=ba533907-eaa9-46b0-a997-d193c225af01 trace.duration=53.683535ms trace.file=/builddir/build/BUILD/atomic-openshift-dockerregistry-git-0.8276247/_output/local/go/src/github.com/openshift/image-registry/vendor/github.com/docker/distribution/registry/storage/driver/base/base.go trace.func="github.com/openshift/image-registry/vendor/github.com/docker/distribution/registry/storage/driver/base.(*Base).Stat" trace.id=3365cce9-1f51-4a6d-8fdc-020511510784 trace.line=137

Comment 22 errata-xmlrpc 2019-04-11 05:38:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0636