+++ This bug was initially created as a clone of Bug #2019532 +++ Logger object in LSO does not log source location accurately --- Additional comment from hekumar on 2021-11-02 18:48:55 UTC --- I think that *sometimes* the logger object does not log source information accurately. For example, I tried adding a logging statement in https://github.com/openshift/local-storage-operator/blob/master/diskmaker/controllers/lv/reconcile.go#L219 and it logged: 2021-11-02T18:33:16.207Z INFO diskmaker-controllers.Deleter controller/controller.go:298 Looking for released PVs to clean up {"Request.Namespace": "openshift-local-storage", "Request.Name": ""} 2021-11-02T18:33:16.207Z INFO diskmaker-controllers.LocalVolume controller/controller.go:298 ignoring pv {"request.namespace": "openshift-local-storage", "Request.Name": "local-disks", "pvname": "pv0018", "prov-by": "", "runtime-name": "local-volume-provisioner-crc-dzk9v-master-0-febf7407-d1ed-4a8a-a94b-77724408b477"} 2021-11-02T18:33:16.207Z INFO diskmaker-controllers.LocalVolume controller/controller.go:298 ignoring pv {"request.namespace": "openshift-local-storage", "Request.Name": "local-disks", "pvname": "pv0007", "prov-by": "", "runtime-name": "local-volume-provisioner-crc-dzk9v-master-0-febf7407-d1ed-4a8a-a94b-77724408b477"} I1102 18:33:16.207685 2670386 cache.go:55] Added pv "local-pv-77682331" to cache 2021-11-02T18:33:16.207Z INFO diskmaker-controllers.LocalVolume controller/controller.go:298 ignoring pv {"request.namespace": "openshift-local-storage", "Request.Name": "local-disks", "pvname": "pv0024", "prov-by": "", "runtime-name": "local-volume-provisioner-crc-dzk9v-master-0-febf7407-d1ed-4a8a-a94b-77724408b477"} 2021-11-02T18:33:16.207Z INFO diskmaker-controllers.LocalVolume controller/controller.go:298 ignoring pv {"request.namespace": "openshift-local-storage", "Request.Name": "local-disks", "pvname": "pv0030", "prov-by": "", "runtime-name": "local-volume-provisioner-crc-dzk9v-master-0-febf7407-d1ed-4a8a-a94b-77724408b477"} 2021-11-02T18:33:16.207Z INFO diskmaker-controllers.LocalVolume controller/controller.go:298 ignoring pv {"request.namespace": "openshift-local-storage", "Request.Name": "local-disks", "pvname": "pv0029", "prov-by": "", "runtime-name": "local-volume-provisioner-crc-dzk9v-master-0-febf7407-d1ed-4a8a-a94b-77724408b477"} 2021-11-02T18:33:16.207Z INFO diskmaker-controllers.LocalVolume controller/controller.go:298 ignoring pv {"request.namespace": "openshift-local-storage", "Request.Name": "local-disks", "pvname": "pv0006", "prov-by": "", "runtime-name": "local-volume-provisioner-crc-dzk9v-master-0-febf7407-d1ed-4a8a-a94b-77724408b477"} 2021-11-02T18:33:16.207Z INFO diskmaker-controllers.LocalVolume controller/controller.go:298 ignoring pv {"request.namespace": "openshift-local-storage", "Request.Name": "local-disks", "pvname": "pv0021", "prov-by": "", "runtime-name": "local-volume-provisioner-crc-dzk9v-master-0-febf7407-d1ed-4a8a-a94b-77724408b477"} 2021-11-02T18:33:16.207Z INFO diskmaker-controllers.LocalVolume controller/controller.go:298 ignoring pv {"request.namespace": "openshift-local-storage", "Request.Name": "local-disks", "pvname": "pv0020", "prov-by": "", "runtime-name": "local-volume-provisioner-crc-dzk9v-master-0-febf7407-d1ed-4a8a-a94b-77724408b477"} 2021-11-02T18:33:16.207Z INFO diskmaker-controllers.LocalVolume controller/controller.go:298 ignoring pv {"request.namespace": "openshift-local-storage", "Request.Name": "local-disks", "pvname": "pv0009", "prov-by": "", "runtime-name": "local-volume-provisioner-crc-dzk9v-master-0-febf7407-d1ed-4a8a-a94b-77724408b477"} Which seems strange to me. This will perhaps require investigation. Overall, I wonder if we should just move back to klog everywhere since that is what all OCP projects use. --- Additional comment from jdobson on 2021-11-24 01:20:55 UTC --- The caller being logged (controller/controller.go:298) is the function that calls our Reconcile() from controller-runtime, rather than the actual log statements within Reconcile(): https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go#L298 That incorrect caller is being logged due this PR: https://github.com/openshift/local-storage-operator/pull/267 Specifically these 2 lines: https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/main.go#L81 https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/diskmaker_manager/manager.go#L45 I think that should actually be zap.AddCallerSkip(1) instead of zaplog.AddCaller(). The part that doesn't make sense yet, is if I replace zaplog.AddCaller() with zap.AddCallerSkip(1), no caller gets logged at all. And further, we actually do have code in controller-runtime that adds zap.AddCallerSkip(1) by default: https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/vendor/sigs.k8s.io/controller-runtime/pkg/log/zap/zap.go#L234 So the question now is... if I just remove this line: https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/main.go#L81 why isn't the caller logged correctly due to the line here? https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/vendor/sigs.k8s.io/controller-runtime/pkg/log/zap/zap.go#L234 --- Additional comment from jdobson on 2021-11-24 19:19:52 UTC --- (In reply to Jonathan Dobson from comment #2) > I think that should actually be zap.AddCallerSkip(1) instead of > zaplog.AddCaller(). > The part that doesn't make sense yet, is if I replace zaplog.AddCaller() > with zap.AddCallerSkip(1), no caller gets logged at all. > And further, we actually do have code in controller-runtime that adds > zap.AddCallerSkip(1) by default One thing that I misunderstood: we actually do need zaplog.AddCaller() to enable logging callers in the first place. The call depth is a separate parameter, and that parameter is wrong. If I remove this line: https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/main.go#L81 Then the logr.Logger object ends up with: addCaller: (bool) false, callerSkip: (int) 2, callerSkip should be set to 1... so it seems that we're somehow ending up with 2 different places calling zap.AddCallerSkip(1) on the same logger. As a quick test, I changed ZapOpts to: ZapOpts: []zaplog.Option{zaplog.AddCaller(), zaplog.AddCallerSkip(-1)}, And added a test log message, which logged the caller correctly. {"level":"info","ts":1637780758.5367253,"logger":"controllers.NodeDaemon","caller":"nodedaemon/reconcile.go:63","msg":"jdd test","Request.Namespace":"openshift-local-storage","Request.Name":""} That -1 changes callerSkip from 2 to 1, which solves the issue, but it's a hack. So why is zap.AddCallerSkip(1) getting called twice? --- Additional comment from jdobson on 2021-11-24 19:57:46 UTC --- (In reply to Jonathan Dobson from comment #3) > So why is zap.AddCallerSkip(1) getting called twice? I added debug.PrintStack to AddCallerSkip(), and these were the places where it was provided a non-zero argument: https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/vendor/sigs.k8s.io/controller-runtime/pkg/log/zap/zap.go#L234 goroutine 1 [running]: runtime/debug.Stack(0x1ed08f0, 0xc0002fddd0, 0xc000590d00) /usr/lib/golang/src/runtime/debug/stack.go:24 +0x9f runtime/debug.PrintStack() /usr/lib/golang/src/runtime/debug/stack.go:16 +0x25 go.uber.org/zap.AddCallerSkip(0x1, 0x1ec45e0, 0xc000010020) /go/src/github.com/openshift/local-storage-operator/vendor/go.uber.org/zap/options.go:110 +0x25 sigs.k8s.io/controller-runtime/pkg/log/zap.NewRaw(0xc00020bcf8, 0x1, 0x1, 0xc0004a8320) /go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/log/zap/zap.go:234 +0x106 sigs.k8s.io/controller-runtime/pkg/log/zap.New(0xc00020bcf8, 0x1, 0x1, 0x1, 0x0) /go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/log/zap/zap.go:43 +0x3f main.main() /go/src/github.com/openshift/local-storage-operator/main.go:93 +0x3f4 https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/vendor/github.com/go-logr/zapr/zapr.go#L182 goroutine 1 [running]: runtime/debug.Stack(0x6, 0x8, 0xc000339920) /usr/lib/golang/src/runtime/debug/stack.go:24 +0x9f runtime/debug.PrintStack() /usr/lib/golang/src/runtime/debug/stack.go:16 +0x25 go.uber.org/zap.AddCallerSkip(0x1, 0xc00000ff50, 0x1ec5120) /go/src/github.com/openshift/local-storage-operator/vendor/go.uber.org/zap/options.go:110 +0x25 github.com/go-logr/zapr.newLoggerWithExtraSkip(0xc000339920, 0x1, 0x1, 0xc000339920) /go/src/github.com/openshift/local-storage-operator/vendor/github.com/go-logr/zapr/zapr.go:172 +0x2f github.com/go-logr/zapr.NewLogger(...) /go/src/github.com/openshift/local-storage-operator/vendor/github.com/go-logr/zapr/zapr.go:182 sigs.k8s.io/controller-runtime/pkg/log/zap.New(0xc00020bcf8, 0x1, 0x1, 0x1, 0x0) /go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/log/zap/zap.go:43 +0x56 main.main() /go/src/github.com/openshift/local-storage-operator/main.go:93 +0x3f4 --- Additional comment from jdobson on 2021-11-24 20:11:44 UTC --- Seems the issue is that LSO creates the logger via controller-runtime, which adds an extra layer and increments callerSkip accordingly. But then LSO uses the go-logr functions directly, instead of the controller-runtime wrapper functions. This causes the parent caller to be logged instead of the line where the log message actually is. --- Additional comment from jdobson on 2021-11-25 01:28:10 UTC --- It seems a little strange that these DelegatingLogger methods all return logr.Logger: https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/vendor/sigs.k8s.io/controller-runtime/pkg/log/deleg.go#L183 Nonetheless, the following change does at least log the correct caller by creating a new DelegatingLogger from the logr.Logger object: func (r *DaemonReconciler) Reconcile(ctx context.Context, request ctrl.Request) (ctrl.Result, error) { - logger := r.Log.WithValues("Request.Namespace", request.Namespace, "Request.Name", r equest.Name) + logger := log.NewDelegatingLogger(log.FromContext(ctx)) + logger.Info("jdd test") This causes us to use the DelegatingLogger wrapper function instead of the go-logr methods: https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/vendor/sigs.k8s.io/controller-runtime/pkg/log/deleg.go#L127 It's only when using the go-logr methods directly that callerSkip = 2 is incorrect. --- Additional comment from jdobson on 2021-11-30 15:55:35 UTC --- sample output line from: + logger := log.NewDelegatingLogger(log.FromContext(ctx)) + logger.Info("jdd test") {"level":"info","ts":1637802914.009081,"logger":"controller-runtime.manager.controller.localvolume","caller":"nodedaemon/reconcile.go:66","msg":"jdd test","reconciler group":"local.storage.openshift.io","reconciler kind":"LocalVolume","name":"","namespace":"openshift-local-storage"} --- Additional comment from hekumar on 2021-11-30 15:58:46 UTC --- okay it is not that bad. If that is only available fix, lets go for it. --- Additional comment from jdobson on 2021-12-03 00:52:55 UTC --- Alright, I filed a PR for this issue, but I reverted to klog for now. Seems to be the simplest and most reliable option at the moment. https://github.com/openshift/local-storage-operator/pull/299 I stashed the NewDelegatingLogger workaround on a separate branch if we end up going that direction instead (I'm still open to this, but somewhat prefer klog after seeing both implementations): https://github.com/openshift/local-storage-operator/compare/master...dobsonj:bug2019532 controller-runtime uses go-logr while client-go uses klog, so we still get a mix of both in the log output no matter what, but I tried to at least make the LSO code use a single logger across the board. As for why controller-runtime is logging the incorrect source line, I filed an upstream issue for this: https://github.com/kubernetes-sigs/controller-runtime/issues/1737
*** Bug 2050420 has been marked as a duplicate of this bug. ***
Passed with local-storage-operator.4.10.0-202202101606
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056