Bug 1774353

Summary: the readability of router pod logs is getting worse
Product: OpenShift Container Platform Reporter: Hongan Li <hongli>
Component: NetworkingAssignee: Dan Mace <dmace>
Networking sub component: router QA Contact: Hongan Li <hongli>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: aos-bugs
Version: 4.3.0   
Target Milestone: ---   
Target Release: 4.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-01-23 11:13:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Hongan Li 2019-11-20 05:43:14 UTC
Description of problem:
the readability of router pod logs is getting worse than before (eg. 4.2).

Version-Release number of selected component (if applicable):
4.3.0-0.nightly-2019-11-18-175710

How reproducible:
100%

Steps to Reproduce:
1. oc -n openshift-ingress logs router-default-xx-xx


Actual results:
$ oc -n openshift-ingress logs router-default-fbf97cb4-cwjhj
I1119 12:32:25.506693       1 template.go:293] router "level"=0 "msg"="starting router"  "version"={"major":"","minor":"","gitVersion":"v0.0.0-master+$Format:%h$","gitCommit":"$Format:%H$","gitTreeState":"","buildDate":"1970-01-01T00:00:00Z","goVersion":"go1.12.12","compiler":"gc","platform":"linux/amd64"}
I1119 12:32:25.511682       1 metrics.go:153] metrics "level"=0 "msg"="router health and metrics port listening on HTTP and HTTPS"  "address"="0.0.0.0:1936"
I1119 12:32:25.536140       1 router.go:294] template "level"=0 "msg"="watching for changes"  "path"="/etc/pki/tls/private"
E1119 12:32:25.540465       1 haproxy.go:395] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory
I1119 12:32:25.610304       1 router.go:548] template "level"=0 "msg"="router reloaded"  "output"=" - Proxy protocol on, checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
I1119 12:32:25.610372       1 router.go:257] router "level"=0 "msg"="router is including routes in all namespaces"  
I1119 12:32:25.916838       1 router.go:548] template "level"=0 "msg"="router reloaded"  "output"=" - Proxy protocol on, checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"

Expected results:
1. Comparing to 4.2, the logs is hard to read and needs improvement.
2. At least, the version should be correct instead of as below: 
"version"={"major":"","minor":"","gitVersion":"v0.0.0-master+$Format:%h$","gitCommit":"$Format:%H$","gitTreeState":"","buildDate":"1970-01-01T00:00:00Z","goVersion":"go1.12.12","compiler":"gc","platform":"linux/amd64"}

Additional info:
router logs in 4.2:

$ oc -n openshift-ingress logs router-default-5f6979c9df-2sgpg
I1120 02:11:39.017555       1 template.go:299] Starting template router (v4.2.5-201911121709)
I1120 02:11:39.034913       1 metrics.go:147] Router health and metrics port listening at 0.0.0.0:1936 on HTTP and HTTPS
I1120 02:11:39.044633       1 router.go:306] Watching "/etc/pki/tls/private" for changes
E1120 02:11:39.067224       1 haproxy.go:392] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory
I1120 02:11:39.179675       1 router.go:561] Router reloaded:
 - Checking http://localhost:80 ...
 - Health check ok : 0 retry attempt(s).
I1120 02:11:39.179790       1 router.go:255] Router is including routes in all namespaces
I1120 02:11:39.509543       1 router.go:561] Router reloaded:
 - Checking http://localhost:80 ...
 - Health check ok : 0 retry attempt(s).

Comment 1 Dan Mace 2019-11-20 12:53:58 UTC
We migrated the router to use logr/klog consistent with other projects, and that means we're moving to structured log messages. Generally, I think we should continue to reduce our reliance on log output, instead preferring to publish status, emit events, report metrics, etc. And for whatever logs must remain, optimize those for aggregation and machine analysis. The new log messages actually provide more context which are useful in filtering the logs. Unfortunately, this does mean at least some readability sacrifice for humans looking at the raw text.

I'm not sure what structured output would be better in this case. On one end of the spectrum is no structure, and perhaps on the other is pure JSON logging, which I think would be even less readable (not sure if you would agree).

I do agree the version string reporting is a regression of some kind that we should fix.

Adding a needinfo to make sure you agree that the log format is not actually a functional regression.

Thanks!

Comment 3 Hongan Li 2019-11-22 07:29:45 UTC
Thanks for your detail explanation, Dan.
Yes, agree with you that the log format is not a functional regression.

I also checked the latest build and the version string has been updated, see below


I1122 04:47:48.592128       1 template.go:297] router "level"=0 "msg"="starting router"  "version"="majorFromGit: \nminorFromGit: \ncommitFromGit: bc1bd948174bdc2b7194a3aa093b30adffa5cea2\nversionFromGit: 4.0.0-83-gbc1bd94\ngitTreeState: clean\nbuildDate: 2019-11-21T02:52:26Z\n"

I1122 04:47:49.222150       1 router.go:548] template "level"=0 "msg"="router reloaded"  "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"

But still has one question, is that possible to make the '\n' works as expected (insert a newline), or replace it with space or other delimiter?

Comment 4 Dan Mace 2019-11-22 14:11:48 UTC
Hongan,

Unfortunately in the case of the "router reloaded" message, the value of the "output" field in the log entry is actually raw haproxy stdout/stderror data, and not something I can clearly interpret and convert into log fields. Replacing the newlines in the raw output value with literal newlines in the log output would break the logging entry across lines and make the entry invalid for parsing.

If the router reload script returned structured data (e.g. JSON) then we could convert that into meaningful log fields, but I doubt we're going to be making that effort any time soon.

I do agree this output is not as visually nice as before, but I'm afraid further tweaking is just not something we can prioritize right now.

Thanks for talking through it with me.

Comment 5 Hongan Li 2019-11-25 00:46:32 UTC
That's make sense. Thank you Dan. So moving back to ON_QA.

Comment 6 Hongan Li 2019-11-25 02:40:46 UTC
verified with 4.3.0-0.nightly-2019-11-24-183610 and version has been updated.

Comment 8 errata-xmlrpc 2020-01-23 11:13:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062