Bug 1774353 - the readability of router pod logs is getting worse
Summary: the readability of router pod logs is getting worse
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.3.0
Assignee: Dan Mace
QA Contact: Hongan Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-20 05:43 UTC by Hongan Li
Modified: 2022-08-04 22:24 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-01-23 11:13:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift router pull 64 0 'None' 'closed' 'Bug 1774353: fix version reporting' 2019-11-25 00:44:13 UTC
Red Hat Product Errata RHBA-2020:0062 0 None None None 2020-01-23 11:13:36 UTC

Description Hongan Li 2019-11-20 05:43:14 UTC
Description of problem:
the readability of router pod logs is getting worse than before (eg. 4.2).

Version-Release number of selected component (if applicable):
4.3.0-0.nightly-2019-11-18-175710

How reproducible:
100%

Steps to Reproduce:
1. oc -n openshift-ingress logs router-default-xx-xx


Actual results:
$ oc -n openshift-ingress logs router-default-fbf97cb4-cwjhj
I1119 12:32:25.506693       1 template.go:293] router "level"=0 "msg"="starting router"  "version"={"major":"","minor":"","gitVersion":"v0.0.0-master+$Format:%h$","gitCommit":"$Format:%H$","gitTreeState":"","buildDate":"1970-01-01T00:00:00Z","goVersion":"go1.12.12","compiler":"gc","platform":"linux/amd64"}
I1119 12:32:25.511682       1 metrics.go:153] metrics "level"=0 "msg"="router health and metrics port listening on HTTP and HTTPS"  "address"="0.0.0.0:1936"
I1119 12:32:25.536140       1 router.go:294] template "level"=0 "msg"="watching for changes"  "path"="/etc/pki/tls/private"
E1119 12:32:25.540465       1 haproxy.go:395] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory
I1119 12:32:25.610304       1 router.go:548] template "level"=0 "msg"="router reloaded"  "output"=" - Proxy protocol on, checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
I1119 12:32:25.610372       1 router.go:257] router "level"=0 "msg"="router is including routes in all namespaces"  
I1119 12:32:25.916838       1 router.go:548] template "level"=0 "msg"="router reloaded"  "output"=" - Proxy protocol on, checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"

Expected results:
1. Comparing to 4.2, the logs is hard to read and needs improvement.
2. At least, the version should be correct instead of as below: 
"version"={"major":"","minor":"","gitVersion":"v0.0.0-master+$Format:%h$","gitCommit":"$Format:%H$","gitTreeState":"","buildDate":"1970-01-01T00:00:00Z","goVersion":"go1.12.12","compiler":"gc","platform":"linux/amd64"}

Additional info:
router logs in 4.2:

$ oc -n openshift-ingress logs router-default-5f6979c9df-2sgpg
I1120 02:11:39.017555       1 template.go:299] Starting template router (v4.2.5-201911121709)
I1120 02:11:39.034913       1 metrics.go:147] Router health and metrics port listening at 0.0.0.0:1936 on HTTP and HTTPS
I1120 02:11:39.044633       1 router.go:306] Watching "/etc/pki/tls/private" for changes
E1120 02:11:39.067224       1 haproxy.go:392] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory
I1120 02:11:39.179675       1 router.go:561] Router reloaded:
 - Checking http://localhost:80 ...
 - Health check ok : 0 retry attempt(s).
I1120 02:11:39.179790       1 router.go:255] Router is including routes in all namespaces
I1120 02:11:39.509543       1 router.go:561] Router reloaded:
 - Checking http://localhost:80 ...
 - Health check ok : 0 retry attempt(s).

Comment 1 Dan Mace 2019-11-20 12:53:58 UTC
We migrated the router to use logr/klog consistent with other projects, and that means we're moving to structured log messages. Generally, I think we should continue to reduce our reliance on log output, instead preferring to publish status, emit events, report metrics, etc. And for whatever logs must remain, optimize those for aggregation and machine analysis. The new log messages actually provide more context which are useful in filtering the logs. Unfortunately, this does mean at least some readability sacrifice for humans looking at the raw text.

I'm not sure what structured output would be better in this case. On one end of the spectrum is no structure, and perhaps on the other is pure JSON logging, which I think would be even less readable (not sure if you would agree).

I do agree the version string reporting is a regression of some kind that we should fix.

Adding a needinfo to make sure you agree that the log format is not actually a functional regression.

Thanks!

Comment 3 Hongan Li 2019-11-22 07:29:45 UTC
Thanks for your detail explanation, Dan.
Yes, agree with you that the log format is not a functional regression.

I also checked the latest build and the version string has been updated, see below


I1122 04:47:48.592128       1 template.go:297] router "level"=0 "msg"="starting router"  "version"="majorFromGit: \nminorFromGit: \ncommitFromGit: bc1bd948174bdc2b7194a3aa093b30adffa5cea2\nversionFromGit: 4.0.0-83-gbc1bd94\ngitTreeState: clean\nbuildDate: 2019-11-21T02:52:26Z\n"

I1122 04:47:49.222150       1 router.go:548] template "level"=0 "msg"="router reloaded"  "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"

But still has one question, is that possible to make the '\n' works as expected (insert a newline), or replace it with space or other delimiter?

Comment 4 Dan Mace 2019-11-22 14:11:48 UTC
Hongan,

Unfortunately in the case of the "router reloaded" message, the value of the "output" field in the log entry is actually raw haproxy stdout/stderror data, and not something I can clearly interpret and convert into log fields. Replacing the newlines in the raw output value with literal newlines in the log output would break the logging entry across lines and make the entry invalid for parsing.

If the router reload script returned structured data (e.g. JSON) then we could convert that into meaningful log fields, but I doubt we're going to be making that effort any time soon.

I do agree this output is not as visually nice as before, but I'm afraid further tweaking is just not something we can prioritize right now.

Thanks for talking through it with me.

Comment 5 Hongan Li 2019-11-25 00:46:32 UTC
That's make sense. Thank you Dan. So moving back to ON_QA.

Comment 6 Hongan Li 2019-11-25 02:40:46 UTC
verified with 4.3.0-0.nightly-2019-11-24-183610 and version has been updated.

Comment 8 errata-xmlrpc 2020-01-23 11:13:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062


Note You need to log in before you can comment on or make changes to this bug.