1890978 – [External] Improve error logging in ocs-operator

Bug 1890978 - [External] Improve error logging in ocs-operator

Summary: [External] Improve error logging in ocs-operator

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	ocs-operator
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	ODF 4.9.0
Assignee:	umanga
QA Contact:	shylesh
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-10-23 13:04 UTC by Rachael
Modified:	2024-06-13 23:16 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-12-13 17:44:23 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2021:5086	0	None	None	None	2021-12-13 17:44:44 UTC

Description Rachael 2020-10-23 13:04:01 UTC

Description of problem
======================

When an unreachable monitoring-endpoint is provided during OCS deployment in external mode, the ocs-operator logs an error message just once.


"level":"error","ts":"2020-10-23T08:03:09.344Z","logger":"controller_storagecluster","msg":"Monitoring Endpoint (1.2.3.4:9283) is not reachable","Request.Namespace":"openshift-storage","Request.Name":"ocs-external-storagecluster","error":"dial tcp 1.2.3.4:9283: i/o timeout","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/remote-source/app/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/openshift/ocs-operator/pkg/controller/storagecluster.validateMonitoringEndpoint\n\t/remote-source/app/pkg/controller/storagecluster/external_resources.go:398\ngithub.com/openshift/ocs-operator/pkg/controller/storagecluster


It would be good to have these error messages logged for each reconcile, to make debugging of the issue easier.

Raising the bug based on: https://bugzilla.redhat.com/show_bug.cgi?id=1888614#c9


Version of all relevant components
==================================

ocs-operator.v4.6.0-142.ci

Does this issue impact your ability to continue to work with the product?
=========================================================================

No

Is there any workaround?
========================

No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
========================================

1

Can this issue reproducible?
============================

Yes

Can this issue reproduce from the UI?
=====================================

If this is a regression
=======================

No

Steps to Reproduce
==================

1. Deploy an external mode cluster using an unreachable monitoring-endpoint
2. Check ocs-operator logs

Actual results
==============

The error message is logged once

Expected results
================

Error messages should be logged for each reconcile

Additional info
===============

Logs available here: http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/1888614/verification/

Comment 2 Jose A. Rivera 2020-10-23 14:39:41 UTC

This may make sense, but it is not critical for the product. Moving to OCS 4.7.

Comment 3 Jose A. Rivera 2021-02-08 15:24:31 UTC

This is still not critical for the product, though it should be done soon. Moving to OCS 4.8.

Comment 4 umanga 2021-06-01 08:06:53 UTC

Not critical enough to go into 4.8. Definitely fixing this for OCS 4.9 so providing devel_ack+.

Scope of the fix would be clear logs for monitoring endpoint. Larger refactor is outside the scope of this BZ.

Comment 5 Jose A. Rivera 2021-09-23 13:48:43 UTC

At this point we believe that this has been fixed over the course of a few PRs, and should already be in the DS builds. Moving to ON_QA.

Comment 15 errata-xmlrpc 2021-12-13 17:44:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Data Foundation 4.9.0 enhancement, security, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:5086

Note You need to log in before you can comment on or make changes to this bug.