Description of problem: the readability of router pod logs is getting worse than before (eg. 4.2). Version-Release number of selected component (if applicable): 4.3.0-0.nightly-2019-11-18-175710 How reproducible: 100% Steps to Reproduce: 1. oc -n openshift-ingress logs router-default-xx-xx Actual results: $ oc -n openshift-ingress logs router-default-fbf97cb4-cwjhj I1119 12:32:25.506693 1 template.go:293] router "level"=0 "msg"="starting router" "version"={"major":"","minor":"","gitVersion":"v0.0.0-master+$Format:%h$","gitCommit":"$Format:%H$","gitTreeState":"","buildDate":"1970-01-01T00:00:00Z","goVersion":"go1.12.12","compiler":"gc","platform":"linux/amd64"} I1119 12:32:25.511682 1 metrics.go:153] metrics "level"=0 "msg"="router health and metrics port listening on HTTP and HTTPS" "address"="0.0.0.0:1936" I1119 12:32:25.536140 1 router.go:294] template "level"=0 "msg"="watching for changes" "path"="/etc/pki/tls/private" E1119 12:32:25.540465 1 haproxy.go:395] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory I1119 12:32:25.610304 1 router.go:548] template "level"=0 "msg"="router reloaded" "output"=" - Proxy protocol on, checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n" I1119 12:32:25.610372 1 router.go:257] router "level"=0 "msg"="router is including routes in all namespaces" I1119 12:32:25.916838 1 router.go:548] template "level"=0 "msg"="router reloaded" "output"=" - Proxy protocol on, checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n" Expected results: 1. Comparing to 4.2, the logs is hard to read and needs improvement. 2. At least, the version should be correct instead of as below: "version"={"major":"","minor":"","gitVersion":"v0.0.0-master+$Format:%h$","gitCommit":"$Format:%H$","gitTreeState":"","buildDate":"1970-01-01T00:00:00Z","goVersion":"go1.12.12","compiler":"gc","platform":"linux/amd64"} Additional info: router logs in 4.2: $ oc -n openshift-ingress logs router-default-5f6979c9df-2sgpg I1120 02:11:39.017555 1 template.go:299] Starting template router (v4.2.5-201911121709) I1120 02:11:39.034913 1 metrics.go:147] Router health and metrics port listening at 0.0.0.0:1936 on HTTP and HTTPS I1120 02:11:39.044633 1 router.go:306] Watching "/etc/pki/tls/private" for changes E1120 02:11:39.067224 1 haproxy.go:392] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory I1120 02:11:39.179675 1 router.go:561] Router reloaded: - Checking http://localhost:80 ... - Health check ok : 0 retry attempt(s). I1120 02:11:39.179790 1 router.go:255] Router is including routes in all namespaces I1120 02:11:39.509543 1 router.go:561] Router reloaded: - Checking http://localhost:80 ... - Health check ok : 0 retry attempt(s).
We migrated the router to use logr/klog consistent with other projects, and that means we're moving to structured log messages. Generally, I think we should continue to reduce our reliance on log output, instead preferring to publish status, emit events, report metrics, etc. And for whatever logs must remain, optimize those for aggregation and machine analysis. The new log messages actually provide more context which are useful in filtering the logs. Unfortunately, this does mean at least some readability sacrifice for humans looking at the raw text. I'm not sure what structured output would be better in this case. On one end of the spectrum is no structure, and perhaps on the other is pure JSON logging, which I think would be even less readable (not sure if you would agree). I do agree the version string reporting is a regression of some kind that we should fix. Adding a needinfo to make sure you agree that the log format is not actually a functional regression. Thanks!
Thanks for your detail explanation, Dan. Yes, agree with you that the log format is not a functional regression. I also checked the latest build and the version string has been updated, see below I1122 04:47:48.592128 1 template.go:297] router "level"=0 "msg"="starting router" "version"="majorFromGit: \nminorFromGit: \ncommitFromGit: bc1bd948174bdc2b7194a3aa093b30adffa5cea2\nversionFromGit: 4.0.0-83-gbc1bd94\ngitTreeState: clean\nbuildDate: 2019-11-21T02:52:26Z\n" I1122 04:47:49.222150 1 router.go:548] template "level"=0 "msg"="router reloaded" "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n" But still has one question, is that possible to make the '\n' works as expected (insert a newline), or replace it with space or other delimiter?
Hongan, Unfortunately in the case of the "router reloaded" message, the value of the "output" field in the log entry is actually raw haproxy stdout/stderror data, and not something I can clearly interpret and convert into log fields. Replacing the newlines in the raw output value with literal newlines in the log output would break the logging entry across lines and make the entry invalid for parsing. If the router reload script returned structured data (e.g. JSON) then we could convert that into meaningful log fields, but I doubt we're going to be making that effort any time soon. I do agree this output is not as visually nice as before, but I'm afraid further tweaking is just not something we can prioritize right now. Thanks for talking through it with me.
That's make sense. Thank you Dan. So moving back to ON_QA.
verified with 4.3.0-0.nightly-2019-11-24-183610 and version has been updated.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062