Bug 1890978 - [External] Improve error logging in ocs-operator
Summary: [External] Improve error logging in ocs-operator
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ODF 4.9.0
Assignee: umanga
QA Contact: shylesh
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-23 13:04 UTC by Rachael
Modified: 2024-06-13 23:16 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-12-13 17:44:23 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2021:5086 0 None None None 2021-12-13 17:44:44 UTC

Description Rachael 2020-10-23 13:04:01 UTC
Description of problem
======================

When an unreachable monitoring-endpoint is provided during OCS deployment in external mode, the ocs-operator logs an error message just once.


"level":"error","ts":"2020-10-23T08:03:09.344Z","logger":"controller_storagecluster","msg":"Monitoring Endpoint (1.2.3.4:9283) is not reachable","Request.Namespace":"openshift-storage","Request.Name":"ocs-external-storagecluster","error":"dial tcp 1.2.3.4:9283: i/o timeout","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/remote-source/app/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/openshift/ocs-operator/pkg/controller/storagecluster.validateMonitoringEndpoint\n\t/remote-source/app/pkg/controller/storagecluster/external_resources.go:398\ngithub.com/openshift/ocs-operator/pkg/controller/storagecluster


It would be good to have these error messages logged for each reconcile, to make debugging of the issue easier.

Raising the bug based on: https://bugzilla.redhat.com/show_bug.cgi?id=1888614#c9


Version of all relevant components
==================================

ocs-operator.v4.6.0-142.ci

Does this issue impact your ability to continue to work with the product?
=========================================================================

No

Is there any workaround?
========================

No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
========================================

1

Can this issue reproducible?
============================

Yes

Can this issue reproduce from the UI?
=====================================

If this is a regression
=======================

No

Steps to Reproduce
==================

1. Deploy an external mode cluster using an unreachable monitoring-endpoint
2. Check ocs-operator logs

Actual results
==============

The error message is logged once

Expected results
================

Error messages should be logged for each reconcile

Additional info
===============

Logs available here: http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/1888614/verification/

Comment 2 Jose A. Rivera 2020-10-23 14:39:41 UTC
This may make sense, but it is not critical for the product. Moving to OCS 4.7.

Comment 3 Jose A. Rivera 2021-02-08 15:24:31 UTC
This is still not critical for the product, though it should be done soon. Moving to OCS 4.8.

Comment 4 umanga 2021-06-01 08:06:53 UTC
Not critical enough to go into 4.8. Definitely fixing this for OCS 4.9 so providing devel_ack+.

Scope of the fix would be clear logs for monitoring endpoint. Larger refactor is outside the scope of this BZ.

Comment 5 Jose A. Rivera 2021-09-23 13:48:43 UTC
At this point we believe that this has been fixed over the course of a few PRs, and should already be in the DS builds. Moving to ON_QA.

Comment 15 errata-xmlrpc 2021-12-13 17:44:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Data Foundation 4.9.0 enhancement, security, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:5086


Note You need to log in before you can comment on or make changes to this bug.