Bug 2019532
| Summary: | Logger object in LSO does not log source location accurately | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Hemant Kumar <hekumar> | |
| Component: | Storage | Assignee: | Jonathan Dobson <jdobson> | |
| Storage sub component: | Local Storage Operator | QA Contact: | Chao Yang <chaoyang> | |
| Status: | CLOSED ERRATA | Docs Contact: | ||
| Severity: | medium | |||
| Priority: | medium | CC: | aos-bugs, jsafrane | |
| Version: | 4.10 | |||
| Target Milestone: | --- | |||
| Target Release: | 4.11.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | No Doc Update | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2050420 (view as bug list) | Environment: | ||
| Last Closed: | 2022-08-10 10:39:06 UTC | Type: | --- | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2050420, 2050883 | |||
|
Description
Hemant Kumar
2021-11-02 18:45:51 UTC
I think that *sometimes* the logger object does not log source information accurately. For example, I tried adding a logging statement in https://github.com/openshift/local-storage-operator/blob/master/diskmaker/controllers/lv/reconcile.go#L219 and it logged: 2021-11-02T18:33:16.207Z INFO diskmaker-controllers.Deleter controller/controller.go:298 Looking for released PVs to clean up {"Request.Namespace": "openshift-local-storage", "Request.Name": ""} 2021-11-02T18:33:16.207Z INFO diskmaker-controllers.LocalVolume controller/controller.go:298 ignoring pv {"request.namespace": "openshift-local-storage", "Request.Name": "local-disks", "pvname": "pv0018", "prov-by": "", "runtime-name": "local-volume-provisioner-crc-dzk9v-master-0-febf7407-d1ed-4a8a-a94b-77724408b477"} 2021-11-02T18:33:16.207Z INFO diskmaker-controllers.LocalVolume controller/controller.go:298 ignoring pv {"request.namespace": "openshift-local-storage", "Request.Name": "local-disks", "pvname": "pv0007", "prov-by": "", "runtime-name": "local-volume-provisioner-crc-dzk9v-master-0-febf7407-d1ed-4a8a-a94b-77724408b477"} I1102 18:33:16.207685 2670386 cache.go:55] Added pv "local-pv-77682331" to cache 2021-11-02T18:33:16.207Z INFO diskmaker-controllers.LocalVolume controller/controller.go:298 ignoring pv {"request.namespace": "openshift-local-storage", "Request.Name": "local-disks", "pvname": "pv0024", "prov-by": "", "runtime-name": "local-volume-provisioner-crc-dzk9v-master-0-febf7407-d1ed-4a8a-a94b-77724408b477"} 2021-11-02T18:33:16.207Z INFO diskmaker-controllers.LocalVolume controller/controller.go:298 ignoring pv {"request.namespace": "openshift-local-storage", "Request.Name": "local-disks", "pvname": "pv0030", "prov-by": "", "runtime-name": "local-volume-provisioner-crc-dzk9v-master-0-febf7407-d1ed-4a8a-a94b-77724408b477"} 2021-11-02T18:33:16.207Z INFO diskmaker-controllers.LocalVolume controller/controller.go:298 ignoring pv {"request.namespace": "openshift-local-storage", "Request.Name": "local-disks", "pvname": "pv0029", "prov-by": "", "runtime-name": "local-volume-provisioner-crc-dzk9v-master-0-febf7407-d1ed-4a8a-a94b-77724408b477"} 2021-11-02T18:33:16.207Z INFO diskmaker-controllers.LocalVolume controller/controller.go:298 ignoring pv {"request.namespace": "openshift-local-storage", "Request.Name": "local-disks", "pvname": "pv0006", "prov-by": "", "runtime-name": "local-volume-provisioner-crc-dzk9v-master-0-febf7407-d1ed-4a8a-a94b-77724408b477"} 2021-11-02T18:33:16.207Z INFO diskmaker-controllers.LocalVolume controller/controller.go:298 ignoring pv {"request.namespace": "openshift-local-storage", "Request.Name": "local-disks", "pvname": "pv0021", "prov-by": "", "runtime-name": "local-volume-provisioner-crc-dzk9v-master-0-febf7407-d1ed-4a8a-a94b-77724408b477"} 2021-11-02T18:33:16.207Z INFO diskmaker-controllers.LocalVolume controller/controller.go:298 ignoring pv {"request.namespace": "openshift-local-storage", "Request.Name": "local-disks", "pvname": "pv0020", "prov-by": "", "runtime-name": "local-volume-provisioner-crc-dzk9v-master-0-febf7407-d1ed-4a8a-a94b-77724408b477"} 2021-11-02T18:33:16.207Z INFO diskmaker-controllers.LocalVolume controller/controller.go:298 ignoring pv {"request.namespace": "openshift-local-storage", "Request.Name": "local-disks", "pvname": "pv0009", "prov-by": "", "runtime-name": "local-volume-provisioner-crc-dzk9v-master-0-febf7407-d1ed-4a8a-a94b-77724408b477"} Which seems strange to me. This will perhaps require investigation. Overall, I wonder if we should just move back to klog everywhere since that is what all OCP projects use. The caller being logged (controller/controller.go:298) is the function that calls our Reconcile() from controller-runtime, rather than the actual log statements within Reconcile(): https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go#L298 That incorrect caller is being logged due this PR: https://github.com/openshift/local-storage-operator/pull/267 Specifically these 2 lines: https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/main.go#L81 https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/diskmaker_manager/manager.go#L45 I think that should actually be zap.AddCallerSkip(1) instead of zaplog.AddCaller(). The part that doesn't make sense yet, is if I replace zaplog.AddCaller() with zap.AddCallerSkip(1), no caller gets logged at all. And further, we actually do have code in controller-runtime that adds zap.AddCallerSkip(1) by default: https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/vendor/sigs.k8s.io/controller-runtime/pkg/log/zap/zap.go#L234 So the question now is... if I just remove this line: https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/main.go#L81 why isn't the caller logged correctly due to the line here? https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/vendor/sigs.k8s.io/controller-runtime/pkg/log/zap/zap.go#L234 (In reply to Jonathan Dobson from comment #2) > I think that should actually be zap.AddCallerSkip(1) instead of > zaplog.AddCaller(). > The part that doesn't make sense yet, is if I replace zaplog.AddCaller() > with zap.AddCallerSkip(1), no caller gets logged at all. > And further, we actually do have code in controller-runtime that adds > zap.AddCallerSkip(1) by default One thing that I misunderstood: we actually do need zaplog.AddCaller() to enable logging callers in the first place. The call depth is a separate parameter, and that parameter is wrong. If I remove this line: https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/main.go#L81 Then the logr.Logger object ends up with: addCaller: (bool) false, callerSkip: (int) 2, callerSkip should be set to 1... so it seems that we're somehow ending up with 2 different places calling zap.AddCallerSkip(1) on the same logger. As a quick test, I changed ZapOpts to: ZapOpts: []zaplog.Option{zaplog.AddCaller(), zaplog.AddCallerSkip(-1)}, And added a test log message, which logged the caller correctly. {"level":"info","ts":1637780758.5367253,"logger":"controllers.NodeDaemon","caller":"nodedaemon/reconcile.go:63","msg":"jdd test","Request.Namespace":"openshift-local-storage","Request.Name":""} That -1 changes callerSkip from 2 to 1, which solves the issue, but it's a hack. So why is zap.AddCallerSkip(1) getting called twice? (In reply to Jonathan Dobson from comment #3) > So why is zap.AddCallerSkip(1) getting called twice? I added debug.PrintStack to AddCallerSkip(), and these were the places where it was provided a non-zero argument: https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/vendor/sigs.k8s.io/controller-runtime/pkg/log/zap/zap.go#L234 goroutine 1 [running]: runtime/debug.Stack(0x1ed08f0, 0xc0002fddd0, 0xc000590d00) /usr/lib/golang/src/runtime/debug/stack.go:24 +0x9f runtime/debug.PrintStack() /usr/lib/golang/src/runtime/debug/stack.go:16 +0x25 go.uber.org/zap.AddCallerSkip(0x1, 0x1ec45e0, 0xc000010020) /go/src/github.com/openshift/local-storage-operator/vendor/go.uber.org/zap/options.go:110 +0x25 sigs.k8s.io/controller-runtime/pkg/log/zap.NewRaw(0xc00020bcf8, 0x1, 0x1, 0xc0004a8320) /go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/log/zap/zap.go:234 +0x106 sigs.k8s.io/controller-runtime/pkg/log/zap.New(0xc00020bcf8, 0x1, 0x1, 0x1, 0x0) /go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/log/zap/zap.go:43 +0x3f main.main() /go/src/github.com/openshift/local-storage-operator/main.go:93 +0x3f4 https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/vendor/github.com/go-logr/zapr/zapr.go#L182 goroutine 1 [running]: runtime/debug.Stack(0x6, 0x8, 0xc000339920) /usr/lib/golang/src/runtime/debug/stack.go:24 +0x9f runtime/debug.PrintStack() /usr/lib/golang/src/runtime/debug/stack.go:16 +0x25 go.uber.org/zap.AddCallerSkip(0x1, 0xc00000ff50, 0x1ec5120) /go/src/github.com/openshift/local-storage-operator/vendor/go.uber.org/zap/options.go:110 +0x25 github.com/go-logr/zapr.newLoggerWithExtraSkip(0xc000339920, 0x1, 0x1, 0xc000339920) /go/src/github.com/openshift/local-storage-operator/vendor/github.com/go-logr/zapr/zapr.go:172 +0x2f github.com/go-logr/zapr.NewLogger(...) /go/src/github.com/openshift/local-storage-operator/vendor/github.com/go-logr/zapr/zapr.go:182 sigs.k8s.io/controller-runtime/pkg/log/zap.New(0xc00020bcf8, 0x1, 0x1, 0x1, 0x0) /go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/log/zap/zap.go:43 +0x56 main.main() /go/src/github.com/openshift/local-storage-operator/main.go:93 +0x3f4 Seems the issue is that LSO creates the logger via controller-runtime, which adds an extra layer and increments callerSkip accordingly. But then LSO uses the go-logr functions directly, instead of the controller-runtime wrapper functions. This causes the parent caller to be logged instead of the line where the log message actually is. It seems a little strange that these DelegatingLogger methods all return logr.Logger: https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/vendor/sigs.k8s.io/controller-runtime/pkg/log/deleg.go#L183 Nonetheless, the following change does at least log the correct caller by creating a new DelegatingLogger from the logr.Logger object: func (r *DaemonReconciler) Reconcile(ctx context.Context, request ctrl.Request) (ctrl.Result, error) { - logger := r.Log.WithValues("Request.Namespace", request.Namespace, "Request.Name", r equest.Name) + logger := log.NewDelegatingLogger(log.FromContext(ctx)) + logger.Info("jdd test") This causes us to use the DelegatingLogger wrapper function instead of the go-logr methods: https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/vendor/sigs.k8s.io/controller-runtime/pkg/log/deleg.go#L127 It's only when using the go-logr methods directly that callerSkip = 2 is incorrect. sample output line from:
+ logger := log.NewDelegatingLogger(log.FromContext(ctx))
+ logger.Info("jdd test")
{"level":"info","ts":1637802914.009081,"logger":"controller-runtime.manager.controller.localvolume","caller":"nodedaemon/reconcile.go:66","msg":"jdd test","reconciler group":"local.storage.openshift.io","reconciler kind":"LocalVolume","name":"","namespace":"openshift-local-storage"}
okay it is not that bad. If that is only available fix, lets go for it. Alright, I filed a PR for this issue, but I reverted to klog for now. Seems to be the simplest and most reliable option at the moment. https://github.com/openshift/local-storage-operator/pull/299 I stashed the NewDelegatingLogger workaround on a separate branch if we end up going that direction instead (I'm still open to this, but somewhat prefer klog after seeing both implementations): https://github.com/openshift/local-storage-operator/compare/master...dobsonj:bug2019532 controller-runtime uses go-logr while client-go uses klog, so we still get a mix of both in the log output no matter what, but I tried to at least make the LSO code use a single logger across the board. As for why controller-runtime is logging the incorrect source line, I filed an upstream issue for this: https://github.com/kubernetes-sigs/controller-runtime/issues/1737 Passed with local-storage-operator.4.10.0-202202101606 Passed with local-storage-operator.4.11.0-202202172124 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |