A moderate use registry is generating upwards of 2/gb of logs a day because of the debug log level. We should not be logging at debug by default in our production deploys. We should change the default in the image to :info or :warn.
Submitted pull request https://github.com/openshift/image-registry/pull/75
Copying the PR discussion to this bug: bparees: before we do this i'd love to know if there's a middle ground alternative because we have to debug registry push failures a lot (often not because of an actual issue in the registry that we could fix, but rather permission issues) and the debug info is super helpful when that happens. forcing people to redeploy their registry and recreate the failure so we can get logs is going to be painful. -------------------------------------------------------------------------------- ccoleman: Can we bump permission issues up a level so they show by default? We had 21Gb of logs from 10 days of runs - that’s high enough that I’d expect us to cause disk pressure in a lot of environments, which raises the possibility of other failures. -------------------------------------------------------------------------------- obulatov: I like the idea, but at first we need to fix how we log our errors. And unfortunately the upstream code needs to be fixed too. Otherwise we will not be able to explain the next "blob unknown to registry" error (did we tried to pull the blob from another registry? which one? and what about the local storage? etc.). The worst part: usually it's not the registry problem, but we still need to explain it. -------------------------------------------------------------------------------- agladov: All that is now in the log is necessary for investigations of various kinds of failures. Problems with the pullthrough, storage backend, authentication require the logging of different entities. In the current situation, all debug messages must be info, but all of them should remain in the log. If there are problems with the disk space or expiration, then we should think about using the log server (syslog). NAK -------------------------------------------------------------------------------- bparees: I'm guessing we don't have this level of control due to the docker distribution code, but it seems like a better solution would buffer the log messages and then only log the output if the overall request failed. Is such a thing possible for us here? --------------------------------------------------------------------------------
I would follow up on Alexey's suggestion... can we not get log rolling/expiration out of our underlying logging infrastructure?
*** Bug 1383816 has been marked as a duplicate of this bug. ***
current plan is to implement specific error logging in our registry code that rolls up all the failure information/context needed. being tracked in trello: https://trello.com/c/JmvVZj5L
Handing off to Alexey since he's working the trello card.
PR: https://github.com/openshift/image-registry/pull/110
I can still see some debug info for successful push and failed push as below with v3.11.0-0.24.0: time="2018-08-29T03:05:56.57009132Z" level=debug msg="authorizing request" go.version=go1.9.4 http.request.host="docker-registry.default.svc:5000" http.request.id=048e45db-8725-4e96-916c-77bd5cbaab80 http.request.method=HEAD http.request.remoteaddr="10.128.0.1:42438" http.request.uri="/v2/wzheng3/myimage/blobs/sha256:8c5a7da1afbc602695fcb2cd6445743cec5ff32053ea589ea9bd8773b7068185" http.request.useragent="docker/1.13.1 go/go1.9.2 kernel/3.10.0-862.11.6.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(linux\))" instance.id=6042c925-1c7e-470d-9a17-e4300616d3a8 vars.digest="sha256:8c5a7da1afbc602695fcb2cd6445743cec5ff32053ea589ea9bd8773b7068185" vars.name=wzheng3/myimage >>>>>>>>>>>>>>> time="2018-08-29T03:08:10.815634543Z" level=debug msg="authorizing request" go.version=go1.9.4 http.request.host="docker-registry.default.svc:5000" http.request.id=94300499-4323-4c6f-89cd-12b654c14bd0 http.request.method=GET http.request.remoteaddr="10.128.0.1:42828" http.request.uri=/v2/wzheng3/wonderful/manifests/latest http.request.useragent="docker/1.13.1 go/go1.9.2 kernel/3.10.0-862.11.6.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(linux\))" instance.id=6042c925-1c7e-470d-9a17-e4300616d3a8 vars.name=wzheng3/wonderful vars.reference=latest time="2018-08-29T03:08:10.823906181Z" level=debug msg="Origin auth: checking for access to repository:wzheng3/wonderful:pull" go.version=go1.9.4 http.request.host="docker-registry.default.svc:5000" http.request.id=94300499-4323-4c6f-89cd-12b654c14bd0 http.request.method=GET http.request.remoteaddr="10.128.0.1:42828" http.request.uri=/v2/wzheng3/wonderful/manifests/latest http.request.useragent="docker/1.13.1 go/go1.9.2 kernel/3.10.0-862.11.6.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(linux\))" instance.id=6042c925-1c7e-470d-9a17-e4300616d3a8 openshift.auth.user=wzheng openshift.auth.userid=39597686-ab37-11e8-a078-fa163e27bb7b vars.name=wzheng3/wonderful vars.reference=latest
Who you deployed image-registry ? Using ansible, oc cluster up or just upgrade existing installation ?
*** Bug 1623577 has been marked as a duplicate of this bug. ***
according to this issue, ansible is still setting it to debug: https://github.com/openshift/openshift-ansible/issues/9820
fix to set the log level https://github.com/openshift/openshift-ansible/pull/9821
(In reply to Alexey Gladkov from comment #11) > Who you deployed image-registry ? Using ansible, oc cluster up or just > upgrade existing installation ? Like Ben's comment 13, I am using ansible.
Ryan's PR merged. Thanks Ryan!
Verified with below version: openshift-ansible-3.11.0-0.28.0.git.0.730d4be.el7.noarch.rpm Log level is not debug by default now: time="2018-09-06T06:44:51.405801723Z" level=error msg="response completed with error" err.code="name unknown" err.message="repository name not known to registry" go.version=go1.9.4 http.request.host="docker-registry.default.svc:5000" http.request.id=543af9e7-1890-4468-a9ab-7c1f5e71d80b http.request.method=GET http.request.remoteaddr="10.2.4.1:38032" http.request.uri=/v2/wzheng1/123/manifests/latest http.request.useragent="docker/1.13.1 go/go1.9.2 kernel/3.10.0-862.11.6.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(linux\))" http.response.contenttype="application/json; charset=utf-8" http.response.duration=23.75832ms http.response.status=404 http.response.written=87 instance.id=62a18abb-5999-4ddf-a1b6-db20c1b30b17 openshift.auth.user=wzheng openshift.auth.userid=0a38c0d4-b19d-11e8-a3b1-0e9bc0025290 vars.name=wzheng1/123 vars.reference=latest time="2018-09-06T06:44:51.405898872Z" level=info msg=response go.version=go1.9.4 http.request.host="docker-registry.default.svc:5000" http.request.id=33ecc63c-80d7-4d4f-a250-49e7d1be4af6 http.request.method=GET http.request.remoteaddr="10.2.4.1:38032" http.request.uri=/v2/wzheng1/123/manifests/latest http.request.useragent="docker/1.13.1 go/go1.9.2 kernel/3.10.0-862.11.6.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(linux\))" http.response.contenttype="application/json; charset=utf-8" http.response.duration=23.896237ms http.response.status=404 http.response.written=87 instance.id=62a18abb-5999-4ddf-a1b6-db20c1b30b17
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2652