1886112 – log message flood with Reconciling StorageCluster","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"

Bug 1886112 - log message flood with Reconciling StorageCluster","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"

Summary: log message flood with Reconciling StorageCluster","Request.Namespace":"opens...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Container Storage
Classification:	Red Hat Storage
Component:	ocs-operator
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	OCS 4.7.0
Assignee:	umanga
QA Contact:	Warren
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-10-07 16:51 UTC by Vasu Kulkarni
Modified:	2023-09-15 00:49 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-05-19 09:15:04 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
ocs-operator-logs (18.08 MB, text/plain) 2020-10-07 16:55 UTC, Vasu Kulkarni	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2021:2041	0	None	None	None	2021-05-19 09:15:58 UTC

Description Vasu Kulkarni 2020-10-07 16:51:59 UTC

Description of problem (please be detailed as possible and provide log
snippests):

I am trying to debug one of the osd node that is down but the ocs-operator log message is flooded with below message and is making it diffcult to catch the actual error message

"level":"info","ts":"2020-10-07T16:20:39.974Z","logger":"controller_storagecluster","msg":"Reconciling StorageCluster","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":"2020-10-07T16:20:39.983Z","logger":"controller_storagecluster","msg":"No component operator reported negatively","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":"2020-10-07T16:20:39.992Z","logger":"controller_storagecluster","msg":"Reconciling StorageCluster","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":"2020-10-07T16:20:40.000Z","logger":"controller_storagecluster","msg":"No component operator reported negatively","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":"2020-10-07T16:20:40.009Z","logger":"controller_storagecluster","msg":"Reconciling StorageCluster","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":"2020-10-07T16:20:40.014Z","logger":"controller_storagecluster","msg":"No component operator reported negatively","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":"2020-10-07T16:20:44.606Z","logger":"controller_storagecluster","msg":"Reconciling StorageCluster","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":"2020-10-07T16:20:44.612Z","logger":"controller_storagecluster","msg":"No component operator reported negatively","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":"2020-10-07T16:20:44.621Z","logger":"controller_storagecluster","msg":"Reconciling StorageCluster","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":"2020-10-07T16:20:44.629Z","logger":"controller_storagecluster","msg":"No component operator reported negatively","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":"2020-10-07T16:20:48.105Z","logger":"controller_storagecluster","msg":"Reconciling StorageCluster","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":"2020-10-07T16:20:48.112Z","logger":"controller_storagecluster","msg":"No component operator reported negatively","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":"2020-10-07T16:20:48.121Z","logger":"controller_storagecluster","msg":"Reconciling StorageCluster","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":"2020-10-07T16:20:48.127Z","logger":"controller_storagecluster","msg":"No component operator reported negatively","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":"2020-10-07T16:20:49.094Z","logger":"controller_storagecluster","msg":"Reconciling StorageCluster","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":"2020-10-07T16:20:49.100Z","logger":"controller_storagecluster","msg":"No component operator reported negatively","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":"2020-10-07T16:20:49.113Z","logger":"controller_storagecluster","msg":"Reconciling StorageCluster","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":"2020-10-07T16:20:49.129Z","logger":"controller_storagecluster","msg":"No component operator reported negatively","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":"2020-10-07T16:20:57.406Z","logger":"controller_storagecluster","msg":"Reconciling StorageCluster","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":"2020-10-07T16:20:57.414Z","logger":"controller_storagecluster","msg":"No component operator reported negatively","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":"2020-10-07T16:20:57.425Z","logger":"controller_storagecluster","msg":"Reconciling StorageCluster","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":"2020-10-07T16:20:57.430Z","logger":"controller_storagecluster","msg":"No component operator reported negatively","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":"2020-10-07T16:21:06.631Z","logger":"controller_storagecluster","msg":"Reconciling StorageCluster","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}




Version of all relevant components (if applicable):

Client Version: 4.5.0-0.nightly-2020-08-15-052753
Server Version: 4.5.0-0.nightly-2020-09-20-185910
Kubernetes Version: v1.18.3+47c0e71


sh-4.4# rook version
rook: 4.5-49.f5afaeeb.release_4.5
go: go1.13.4


sh-4.4# ceph version
ceph version 14.2.8-91.el8cp (75b4845da7d469665bd48d1a49badcc3677bf5cd) nautilus (stable)



Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3


Can this issue reproducible?
not sure

Can this issue reproduce from the UI?
not sure

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. keep running ocs 4.5 nightly for couple of days > 4 days
2. observe the logs are flooded with above message

Comment 2 Vasu Kulkarni 2020-10-07 16:55:01 UTC

Created attachment 1719806 [details]
ocs-operator-logs

Comment 3 Vasu Kulkarni 2020-10-07 17:09:31 UTC

$ grep "Reconciling StorageCluster" ocs-operator-logs.txt | wc
  44281   88562 8723357

$ wc ocs-operator-logs.txt 
   89101   325554 18958957 ocs-operator-logs.txt

=~ 50% filled with same lines

Comment 4 Travis Nielsen 2020-10-07 20:17:20 UTC

This is the OCS operator log, rather than rook.

Comment 5 Jose A. Rivera 2020-10-09 14:14:34 UTC

This is expected behavior, since we periodically reconcile the StorageCluster. While I have a PR in the works that should serve to at least reduce this chatter, though it will never (and should never) be completely eliminated: https://github.com/openshift/ocs-operator/pull/796

Still, this is not an urgent matter, moving this to OCS 4.7.

Comment 6 Vasu Kulkarni 2020-10-09 17:45:52 UTC

I think it is important to reduce the frequency of these messages so that its helpful to debug and make space for other important logs that will help us debug better(like OSD that went into init stage and has missing logs). The current messages(attached operator logs) isn't really helpful so I believe we should fix this sooner.

Comment 7 N Balachandran 2020-11-16 11:11:52 UTC

Jose, would it make sense to only log every 10th reconcile message or similar? That would reduce the number of messages and would be an easy change.

Comment 8 umanga 2020-11-18 15:34:05 UTC

(In reply to N Balachandran from comment #7)
> Jose, would it make sense to only log every 10th reconcile message or
> similar? That would reduce the number of messages and would be an easy
> change.

That's how the logging is configured currently.
Problem is that the logs itself aren't formatted in a way to make it easier for the logger to filter these out.
We did a small refactor for Error level logs. We need to do the same for Info and Warnings.

Comment 9 Mudit Agarwal 2021-02-02 12:54:16 UTC

Umanga, are we targeting it for 4.7?

Comment 10 umanga 2021-02-03 14:22:24 UTC

In OCS 4.6 we did https://github.com/openshift/ocs-operator/pull/717 and https://github.com/openshift/ocs-operator/pull/666 to try and reduce the logs.
In OCS 4.7 some of these updates are removed and we depend on default Kubernetes aware logger configuration.

So, this should already be testable. If this still doesn't meet the expectation, we'll take it as a part of bigger enhancement for logging in 4.8.

There is also https://bugzilla.redhat.com/show_bug.cgi?id=1890978 which talks about logging in every reconcile (more frequent) which seems opposite of what's expected in this BZ.

Comment 11 Mudit Agarwal 2021-02-03 14:39:39 UTC

Moving to ON_QA based on the above comment, if we need more than what is done than better to open a new BZ.

Comment 12 Martin Bukatovic 2021-02-03 21:56:23 UTC

Explanation of the fix from today's bug triage meeting:

Logging frequency was reduced, number of the particular log lines listed in this bug report should go down (where there were 20 lines, we expect about 4 after the change).

QE will test this bug based on this explanation.

Comment 15 Warren 2021-02-05 23:54:26 UTC

It looks like about 1/8 of the lines are now Reconciling StorageCluster messages.  So I changed this to verified.

(venv) wusui@localhost:~/Downloads/logs_1612481699/home/jenkins/current-cluster-dir/logs/failed_testcase_ocs_logs_1612435608$ find . -type f | grep current.log | grep ocs-operator | xargs wc
   25797   129167  6753438 ./test_monitoring_reporting_ok_when_idle_ocs_logs/ocs_must_gather/registry-redhat-io-ocs4-ocs-must-gather-rhel8-sha256-b3e2cbb55f8cfa82480b70e73576ed049cb4e9de984c46f247896951a1746184/namespaces/openshift-storage/pods/ocs-operator-66b79fb54f-fcncf/ocs-operator/ocs-operator/logs/current.log
    5438    27247  1425039 ./test_create_new_project_to_verify_logging_ocs_logs/ocs_must_gather/registry-redhat-io-ocs4-ocs-must-gather-rhel8-sha256-b3e2cbb55f8cfa82480b70e73576ed049cb4e9de984c46f247896951a1746184/namespaces/openshift-storage/pods/ocs-operator-66b79fb54f-fcncf/ocs-operator/ocs-operator/logs/current.log
    2137    10678   557983 ./test_monitoring_enabled_ocs_logs/ocs_must_gather/registry-redhat-io-ocs4-ocs-must-gather-rhel8-sha256-b3e2cbb55f8cfa82480b70e73576ed049cb4e9de984c46f247896951a1746184/namespaces/openshift-storage/pods/ocs-operator-66b79fb54f-fcncf/ocs-operator/ocs-operator/logs/current.log
   41643   208420 10893114 ./test_pvc_expand_expanded_pvc_ocs_logs/ocs_must_gather/registry-redhat-io-ocs4-ocs-must-gather-rhel8-sha256-b3e2cbb55f8cfa82480b70e73576ed049cb4e9de984c46f247896951a1746184/namespaces/openshift-storage/pods/ocs-operator-66b79fb54f-fcncf/ocs-operator/ocs-operator/logs/current.log
   14752    73878  3863131 ./test_namespace_bucket_creation_with_rgw_ocs_logs/ocs_must_gather/registry-redhat-io-ocs4-ocs-must-gather-rhel8-sha256-b3e2cbb55f8cfa82480b70e73576ed049cb4e9de984c46f247896951a1746184/namespaces/openshift-storage/pods/ocs-operator-66b79fb54f-fcncf/ocs-operator/ocs-operator/logs/current.log
   47635   238380 12458610 ./test_must_gather[OTHERS]_ocs_logs/ocs_must_gather/registry-redhat-io-ocs4-ocs-must-gather-rhel8-sha256-b3e2cbb55f8cfa82480b70e73576ed049cb4e9de984c46f247896951a1746184/namespaces/openshift-storage/pods/ocs-operator-66b79fb54f-fcncf/ocs-operator/ocs-operator/logs/current.log
  137402   687770 35951315 total
(venv) wusui@localhost:~/Downloads/logs_1612481699/home/jenkins/current-cluster-dir/logs/failed_testcase_ocs_logs_1612435608$ find . -type f | grep current.log | grep ocs-operator | xargs grep "Reconciling StorageCluster" | wc
  17145   51435 8846713

Comment 16 Warren 2021-02-10 00:31:39 UTC

It looks like the changes for https://github.com/openshift/ocs-operator/pull/717 fix this.

Comment 19 errata-xmlrpc 2021-05-19 09:15:04 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2041

Comment 20 Red Hat Bugzilla 2023-09-15 00:49:19 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days

Note You need to log in before you can comment on or make changes to this bug.