Bug 1891098

Summary:	Configure "ceph health detail" to run periodically and log output to cluster log.
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Christina Meno <gmeno>
Component:	RADOS	Assignee:	Prashant Dhange <pdhange>
Status:	CLOSED ERRATA	QA Contact:	Pawan <pdhiran>
Severity:	high	Docs Contact:	Amrita <asakthiv>
Priority:	high
Version:	4.1	CC:	akupczyk, asakthiv, assingh, bhubbard, ceph-eng-bugs, ceph-qe-bugs, dzafman, gsitlani, kchai, nojha, pdhange, pdhiran, rzarzyns, sseshasa, tserlin, vereddy, vumrao
Target Milestone:	---
Target Release:	4.2
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	ceph-14.2.11-79.el8cp, ceph-14.2.11-79.el7cp	Doc Type:	Enhancement
Doc Text:	.Ceph health details are logged in the cluster log Previously, the cluster log did not have the Ceph health details , so it was difficult to conclude on the root cause of the issue. With this release, the Ceph health details are logged in the cluster log which enables the review of the issues that might arise in the cluster.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-01-12 14:58:09 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1890121

Description Christina Meno 2020-10-23 19:02:10 UTC

Description of problem:
We don't have detailed cluster health/sanity information available for customers and support to review when problems with the cluster arise.

Version-Release number of selected component (if applicable):
4.1

Comment 1 RHEL Program Management 2020-10-23 19:02:17 UTC

Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 3 Prashant Dhange 2020-10-30 06:11:56 UTC

Should we log health detail on health check failure as well as every mon_health_to_clog_interval ? Logging health detail too frequently does not make sense as it will make cluster log grow rapidly in case cluster is unhealthy.

Comment 5 Vikhyat Umrao 2020-10-30 11:26:43 UTC

(In reply to Prashant Dhange from comment #3)
> Should we log health detail on health check failure as well as every
> mon_health_to_clog_interval ? Logging health detail too frequently does not
> make sense as it will make cluster log grow rapidly in case cluster is
> unhealthy.

I think the idea is to log health detail during the health warn/err. Maybe logging every mon_health_to_clog_interval is not necessary? let us wait for Neha's inputs.

Comment 6 Neha Ojha 2020-10-30 16:39:23 UTC

(In reply to Prashant Dhange from comment #3)
> Should we log health detail on health check failure as well as every
> mon_health_to_clog_interval ? Logging health detail too frequently does not
> make sense as it will make cluster log grow rapidly in case cluster is
> unhealthy.

I don't think we need to log every mon_health_to_clog_interval as well. Let's discuss implementation details in the PR.

Comment 7 Yaniv Kaul 2020-11-10 20:43:42 UTC

devel-ack+ please?

Comment 18 errata-xmlrpc 2021-01-12 14:58:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Ceph Storage 4.2 Security and Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0081