Bug 2028685 - LSO repeatedly reports errors while diskmaker-discovery pod is starting
Summary: LSO repeatedly reports errors while diskmaker-discovery pod is starting
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.10.0
Assignee: Roman Bednář
QA Contact: Chao Yang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-02 23:15 UTC by Jonathan Dobson
Modified: 2022-03-10 16:31 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-10 16:31:36 UTC
Target Upstream Version:
Embargoed:
jdobson: needinfo-
jdobson: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift local-storage-operator pull 303 0 None open Bug 2028685: LSO repeatedly reports errors while diskmaker-discovery pod is starting 2021-12-10 12:37:50 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:31:53 UTC

Description Jonathan Dobson 2021-12-02 23:15:34 UTC
I noticed this while testing changes for bug 2019532, so the log messages are different from master, but the basic problem appears to be applicable to master.

After enabling local volume discovery through the UI (Installed Operators -> Local Storage -> Local Volume Discovery -> Create Local Volume Discovery), I see the following log entries repeated in LSO (more than 10 times in this case):

I1202 22:38:37.116686       1 localvolumediscovery_controller.go:69] Reconciling LocalVolumeDiscovery, namespace = openshift-local-storage, name = auto-discover-devices
I1202 22:38:37.117090       1 exporter.go:99] Reconciling metrics service, namespace = openshift-local-storage, name = local-storage-discovery-metrics
I1202 22:38:37.312589       1 exporter.go:126] Reconciling metrics service monitor, namespace = openshift-local-storage, name = local-storage-discovery-metrics
I1202 22:38:37.784872       1 localvolumediscovery_controller.go:109] daemonset diskmaker-discovery changed with result updated
2021-12-02T22:38:37.794Z        ERROR   controller-runtime.manager.controller.localvolumediscovery  controller/controller.go:253    Reconciler error        {"reconciler group": "local.storage.openshift.io", "reconciler kind": "LocalVolumeDiscovery", "name": "auto-discover-devices", "namespace": "openshift-local-storage", "error": "running 0 out of 1 discovery daemons"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214
I1202 22:38:40.425801       1 localvolumediscovery_controller.go:69] Reconciling LocalVolumeDiscovery, namespace = openshift-local-storage, name = auto-discover-devices
I1202 22:38:40.426271       1 exporter.go:99] Reconciling metrics service, namespace = openshift-local-storage, name = local-storage-discovery-metrics
I1202 22:38:40.464581       1 exporter.go:126] Reconciling metrics service monitor, namespace = openshift-local-storage, name = local-storage-discovery-metrics
I1202 22:38:40.487867       1 localvolumediscovery_controller.go:109] daemonset diskmaker-discovery changed with result updated
2021-12-02T22:38:40.487Z        ERROR   controller-runtime.manager.controller.localvolumediscovery  controller/controller.go:253    Reconciler error        {"reconciler group": "local.storage.openshift.io", "reconciler kind": "LocalVolumeDiscovery", "name": "auto-discover-devices", "namespace": "openshift-local-storage", "error": "running 0 out of 1 discovery daemons"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214

These log messages are coming from this chunk of code, where we report an error in the reconciler if !(desiredDaemons == readyDaemons)

https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/controllers/localvolumediscovery/localvolumediscovery_controller.go#L127-L135

The diskmaker-discovery pod actually did start up:

diskmaker-discovery-qjh4p                                         2/2     Running     0     
           24m

The problem is that in the code above, each time we enter the reconcile loop, it reports an error back to the controller runtime framework. So until the diskmaker-discovery pod is running, it will continue to add those errors to the log in a loop.

This may be more of a "noisy log" problem than anything else, but LSO should not continue trying to update the daemonset every time and report an error while diskmaker-discovery is starting up.

Comment 1 Roman Bednář 2021-12-08 10:51:07 UTC
Log level is now changed from error to info. To verify this the operator can be run locally, I've documented it here if needed: https://github.com/openshift/local-storage-operator/blob/master/HACKING.md

BEFORE FIX:

2021-12-07T12:22:05.555+0100	INFO	metrics-exporter	localmetrics/exporter.go:68	Reconciling metrics exporter service
2021-12-07T12:22:05.664+0100	INFO	service-monitor	localmetrics/exporter.go:89	creating service monitor
2021-12-07T12:22:05.780+0100	ERROR	controller-runtime.manager.controller.localvolumediscovery	controller/controller.go:253	Reconciler error	{"reconciler group": "local.storage.openshift.io", "reconciler kind": "LocalVolumeDiscovery", "name": "auto-discover-devices", "namespace": "openshift-local-storage", "error": "running 0 out of 3 discovery daemons"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/home/rbednar/git/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/home/rbednar/git/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214
2021-12-07T12:22:16.020+0100	INFO	controllers.LocalVolumeDiscovery	controller/controller.go:298	Reconciling LocalVolumeDiscovery	{"Request.Namespace": "openshift-local-storage", "Request.Name": "auto-discover-devices"}
2021-12-07T12:22:16.021+0100	INFO	metrics-exporter	localmetrics/exporter.go:68	Reconciling metrics exporter service
2021-12-07T12:22:16.134+0100	INFO	service-monitor	localmetrics/exporter.go:89	creating service monitor
2021-12-07T12:22:16.247+0100	ERROR	controller-runtime.manager.controller.localvolumediscovery	controller/controller.go:253	Reconciler error	{"reconciler group": "local.storage.openshift.io", "reconciler kind": "LocalVolumeDiscovery", "name": "auto-discover-devices", "namespace": "openshift-local-storage", "error": "running 0 out of 3 discovery daemons"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/home/rbednar/git/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/home/rbednar/git/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214
2021-12-07T12:22:36.727+0100	INFO	controllers.LocalVolumeDiscovery	controller/controller.go:298	Reconciling LocalVolumeDiscovery	{"Request.Namespace": "openshift-local-storage", "Request.Name": "auto-discover-devices"}
2021-12-07T12:22:36.728+0100	INFO	metrics-exporter	localmetrics/exporter.go:68	Reconciling metrics exporter service
2021-12-07T12:22:36.840+0100	INFO	service-monitor	localmetrics/exporter.go:89	creating service monitor


AFTER FIX:

2021-12-07T15:00:34.499+0100	INFO	controller-runtime.manager.controller.localvolume	controller/controller.go:221	Starting workers	{"reconciler group": "local.storage.openshift.io", "reconciler kind": "LocalVolume", "worker count": 1}
2021-12-07T15:00:34.502+0100	INFO	metrics-exporter	localmetrics/exporter.go:68	Reconciling metrics exporter service
2021-12-07T15:00:34.817+0100	INFO	service-monitor	localmetrics/exporter.go:89	creating service monitor
2021-12-07T15:00:35.157+0100	INFO	controllers.LocalVolumeDiscovery	controller/controller.go:298	running 0 out of 3 discovery daemons	{"Request.Namespace": "openshift-local-storage", "Request.Name": "auto-discover-devices"}
2021-12-07T15:00:45.158+0100	INFO	controllers.LocalVolumeDiscovery	controller/controller.go:298	Reconciling LocalVolumeDiscovery	{"Request.Namespace": "openshift-local-storage", "Request.Name": "auto-discover-devices"}
2021-12-07T15:00:45.159+0100	INFO	metrics-exporter	localmetrics/exporter.go:68	Reconciling metrics exporter service
2021-12-07T15:00:45.266+0100	INFO	service-monitor	localmetrics/exporter.go:89	creating service monitor
2021-12-07T15:00:45.381+0100	INFO	controllers.LocalVolumeDiscovery	controller/controller.go:298	running 0 out of 3 discovery daemons	{"Request.Namespace": "openshift-local-storage", "Request.Name": "auto-discover-devices"}

Comment 4 Chao Yang 2021-12-14 03:00:22 UTC
2021-12-13T05:39:45.254Z	INFO	service-monitor	localmetrics/exporter.go:89	creating service monitor
2021-12-13T05:39:45.267Z	ERROR	controller-runtime.manager.controller.localvolumediscovery	controller/controller.go:253	Reconciler error	{"reconciler group": "local.storage.openshift.io", "reconciler kind": "LocalVolumeDiscovery", "name": "auto-discover-devices", "namespace": "local-storage", "error": "no discovery daemons are scheduled for running"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214
2021-12-13T05:39:45.267Z	INFO	controllers.LocalVolumeDiscovery	controller/controller.go:298	Reconciling LocalVolumeDiscovery	{"Request.Namespace": "local-storage", "Request.Name": "auto-discover-devices"}
2021-12-13T05:39:45.267Z	INFO	metrics-exporter	localmetrics/exporter.go:68	Reconciling metrics exporter service
2021-12-13T05:39:45.289Z	INFO	service-monitor	localmetrics/exporter.go:89	creating service monitor
2021-12-13T05:39:45.332Z	INFO	controllers.LocalVolumeDiscovery	controller/controller.go:298	running 0 out of 1 discovery daemons	{"Request.Namespace": "local-storage", "Request.Name": "auto-discover-devices"}
2021-12-13T05:39:45.332Z	INFO	controllers.LocalVolumeDiscovery	controller/controller.go:298	Reconciling LocalVolumeDiscovery	{"Request.Namespace": "local-storage", "Request.Name": "auto-discover-devices"}
2021-12-13T05:39:45.332Z	INFO	metrics-exporter	localmetrics/exporter.go:68	Reconciling metrics exporter service
2021-12-13T05:39:45.350Z	INFO	service-monitor	localmetrics/exporter.go:89	creating service monitor
2021-12-13T05:39:45.361Z	INFO	controllers.LocalVolumeDiscovery	controller/controller.go:298	running 0 out of 1 discovery daemons	{"Request.Namespace": "local-storage", "Request.Name": "auto-discover-devices"}
2021-12-13T05:39:55.332Z	INFO	controllers.LocalVolumeDiscovery	controller/controller.go:298	Reconciling LocalVolumeDiscovery	{"Request.Namespace": "local-storage", "Request.Name": "auto-discover-devices"}
2021-12-13T05:39:55.333Z	INFO	metrics-exporter	localmetrics/exporter.go:68	Reconciling metrics exporter service
2021-12-13T05:39:55.357Z	INFO	service-monitor	localmetrics/exporter.go:89	creating service monitor
2021-12-13T05:39:55.370Z	INFO	controllers.LocalVolumeDiscovery	controller/controller.go:298	running 0 out of 1 discovery daemons	{"Request.Namespace": "local-storage", "Request.Name": "auto-discover-devices"}

Comment 5 Chao Yang 2021-12-14 03:03:41 UTC
"2021-12-13T05:39:45.332Z	INFO	controllers.LocalVolumeDiscovery	controller/controller.go:298	running 0 out of 1 discovery daemons	{"Request.Namespace": "local-storage", "Request.Name": "auto-discover-devices"}"

Now it is changed from error to info.
But this info log will pop every time. Is it ok? @jdobson@jdobson

Comment 6 Roman Bednář 2021-12-14 09:01:58 UTC
Improving the behavior so that the operator can stop trying to update the daemonset under some condition (timeout?) would be non-trivial to implement. Considering the effort needed vs relatively small gain (reduced log noise) I think it's sufficient to just change the error to info and keep logging the message.

Comment 9 errata-xmlrpc 2022-03-10 16:31:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.