Bug 1567658

Summary: The default image registry log level should not be :debug
Product: OpenShift Container Platform Reporter: Clayton Coleman <ccoleman>
Component: Image RegistryAssignee: Alexey Gladkov <agladkov>
Status: CLOSED ERRATA QA Contact: Wenjing Zheng <wzheng>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.9.0CC: agladkov, aos-bugs, bparees, ccoleman, lmeyer, obulatov, rcook, wzheng
Target Milestone: ---   
Target Release: 3.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Feature: Reduce the number of logs. Reason: Registry generates 2GB logs per day. Result: Log level changed to "info".
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-11 07:19:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Clayton Coleman 2018-04-15 21:14:45 UTC
A moderate use registry is generating upwards of 2/gb of logs a day because of the debug log level.  We should not be logging at debug by default in our production deploys.  We should change the default in the image to :info or :warn.

Comment 1 Corey Daley 2018-04-16 03:23:29 UTC
Submitted pull request https://github.com/openshift/image-registry/pull/75

Comment 2 Ben Parees 2018-04-23 13:42:39 UTC
Copying the PR discussion to this bug:

bparees:

before we do this i'd love to know if there's a middle ground alternative because we have to debug registry push failures a lot (often not because of an actual issue in the registry that we could fix, but rather permission issues) and the debug info is super helpful when that happens.

forcing people to redeploy their registry and recreate the failure so we can get logs is going to be painful.
--------------------------------------------------------------------------------


ccoleman:
Can we bump permission issues up a level so they show by default? We had 21Gb of logs from 10 days of runs - that’s high enough that I’d expect us to cause disk pressure in a lot of environments, which raises the possibility of other failures.
--------------------------------------------------------------------------------

obulatov:
I like the idea, but at first we need to fix how we log our errors. And unfortunately the upstream code needs to be fixed too. Otherwise we will not be able to explain the next "blob unknown to registry" error (did we tried to pull the blob from another registry? which one? and what about the local storage? etc.). The worst part: usually it's not the registry problem, but we still need to explain it.
--------------------------------------------------------------------------------


agladov:

All that is now in the log is necessary for investigations of various kinds of failures. Problems with the pullthrough, storage backend, authentication require the logging of different entities. In the current situation, all debug messages must be info, but all of them should remain in the log.

If there are problems with the disk space or expiration, then we should think about using the log server (syslog).

NAK

--------------------------------------------------------------------------------


bparees:
I'm guessing we don't have this level of control due to the docker distribution code, but it seems like a better solution would buffer the log messages and then only log the output if the overall request failed. Is such a thing possible for us here?
--------------------------------------------------------------------------------

Comment 3 Ben Parees 2018-04-23 13:43:47 UTC
I would follow up on Alexey's suggestion...  can we not get log rolling/expiration out of our underlying logging infrastructure?

Comment 4 Michal Minar 2018-04-26 14:53:43 UTC
*** Bug 1383816 has been marked as a duplicate of this bug. ***

Comment 5 Ben Parees 2018-05-23 15:16:30 UTC
current plan is to implement specific error logging in our registry code that rolls up all the failure information/context needed.

being tracked in trello:
https://trello.com/c/JmvVZj5L

Comment 6 Ben Parees 2018-06-13 21:28:46 UTC
Handing off to Alexey since he's working the trello card.

Comment 7 Ben Parees 2018-07-10 17:00:10 UTC
PR: https://github.com/openshift/image-registry/pull/110

Comment 10 Wenjing Zheng 2018-08-29 03:11:23 UTC
I can still see some debug info for successful push and failed push as below with v3.11.0-0.24.0:
time="2018-08-29T03:05:56.57009132Z" level=debug msg="authorizing request" go.version=go1.9.4 http.request.host="docker-registry.default.svc:5000" http.request.id=048e45db-8725-4e96-916c-77bd5cbaab80 http.request.method=HEAD http.request.remoteaddr="10.128.0.1:42438" http.request.uri="/v2/wzheng3/myimage/blobs/sha256:8c5a7da1afbc602695fcb2cd6445743cec5ff32053ea589ea9bd8773b7068185" http.request.useragent="docker/1.13.1 go/go1.9.2 kernel/3.10.0-862.11.6.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(linux\))" instance.id=6042c925-1c7e-470d-9a17-e4300616d3a8 vars.digest="sha256:8c5a7da1afbc602695fcb2cd6445743cec5ff32053ea589ea9bd8773b7068185" vars.name=wzheng3/myimage
>>>>>>>>>>>>>>>
time="2018-08-29T03:08:10.815634543Z" level=debug msg="authorizing request" go.version=go1.9.4 http.request.host="docker-registry.default.svc:5000" http.request.id=94300499-4323-4c6f-89cd-12b654c14bd0 http.request.method=GET http.request.remoteaddr="10.128.0.1:42828" http.request.uri=/v2/wzheng3/wonderful/manifests/latest http.request.useragent="docker/1.13.1 go/go1.9.2 kernel/3.10.0-862.11.6.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(linux\))" instance.id=6042c925-1c7e-470d-9a17-e4300616d3a8 vars.name=wzheng3/wonderful vars.reference=latest
time="2018-08-29T03:08:10.823906181Z" level=debug msg="Origin auth: checking for access to repository:wzheng3/wonderful:pull" go.version=go1.9.4 http.request.host="docker-registry.default.svc:5000" http.request.id=94300499-4323-4c6f-89cd-12b654c14bd0 http.request.method=GET http.request.remoteaddr="10.128.0.1:42828" http.request.uri=/v2/wzheng3/wonderful/manifests/latest http.request.useragent="docker/1.13.1 go/go1.9.2 kernel/3.10.0-862.11.6.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(linux\))" instance.id=6042c925-1c7e-470d-9a17-e4300616d3a8 openshift.auth.user=wzheng openshift.auth.userid=39597686-ab37-11e8-a078-fa163e27bb7b vars.name=wzheng3/wonderful vars.reference=latest

Comment 11 Alexey Gladkov 2018-08-29 15:19:45 UTC
Who you deployed image-registry ? Using ansible, oc cluster up or just upgrade existing installation ?

Comment 12 Ben Parees 2018-08-29 16:41:03 UTC
*** Bug 1623577 has been marked as a duplicate of this bug. ***

Comment 13 Ben Parees 2018-08-29 16:41:30 UTC
according to this issue, ansible is still setting it to debug:
https://github.com/openshift/openshift-ansible/issues/9820

Comment 14 Ryan Cook 2018-08-29 16:42:30 UTC
fix to set the log level 

https://github.com/openshift/openshift-ansible/pull/9821

Comment 15 Wenjing Zheng 2018-08-30 05:11:07 UTC
(In reply to Alexey Gladkov from comment #11)
> Who you deployed image-registry ? Using ansible, oc cluster up or just
> upgrade existing installation ?

Like Ben's comment 13, I am using ansible.

Comment 16 Ben Parees 2018-08-30 14:17:02 UTC
Ryan's PR merged.  Thanks Ryan!

Comment 18 Wenjing Zheng 2018-09-06 06:52:31 UTC
Verified with below version:
openshift-ansible-3.11.0-0.28.0.git.0.730d4be.el7.noarch.rpm

Log level is not debug by default now:
time="2018-09-06T06:44:51.405801723Z" level=error msg="response completed with error" err.code="name unknown" err.message="repository name not known to registry" go.version=go1.9.4 http.request.host="docker-registry.default.svc:5000" http.request.id=543af9e7-1890-4468-a9ab-7c1f5e71d80b http.request.method=GET http.request.remoteaddr="10.2.4.1:38032" http.request.uri=/v2/wzheng1/123/manifests/latest http.request.useragent="docker/1.13.1 go/go1.9.2 kernel/3.10.0-862.11.6.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(linux\))" http.response.contenttype="application/json; charset=utf-8" http.response.duration=23.75832ms http.response.status=404 http.response.written=87 instance.id=62a18abb-5999-4ddf-a1b6-db20c1b30b17 openshift.auth.user=wzheng openshift.auth.userid=0a38c0d4-b19d-11e8-a3b1-0e9bc0025290 vars.name=wzheng1/123 vars.reference=latest 
time="2018-09-06T06:44:51.405898872Z" level=info msg=response go.version=go1.9.4 http.request.host="docker-registry.default.svc:5000" http.request.id=33ecc63c-80d7-4d4f-a250-49e7d1be4af6 http.request.method=GET http.request.remoteaddr="10.2.4.1:38032" http.request.uri=/v2/wzheng1/123/manifests/latest http.request.useragent="docker/1.13.1 go/go1.9.2 kernel/3.10.0-862.11.6.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \(linux\))" http.response.contenttype="application/json; charset=utf-8" http.response.duration=23.896237ms http.response.status=404 http.response.written=87 instance.id=62a18abb-5999-4ddf-a1b6-db20c1b30b17

Comment 20 errata-xmlrpc 2018-10-11 07:19:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652