1891098 – Configure "ceph health detail" to run periodically and log output to cluster log.

Bug 1891098 - Configure "ceph health detail" to run periodically and log output to cluster log.

Summary: Configure "ceph health detail" to run periodically and log output to cluster ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RADOS
Sub Component:
Version:	4.1
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.2
Assignee:	Prashant Dhange
QA Contact:	Pawan
Docs Contact:	Amrita
URL:
Whiteboard:
Depends On:
Blocks:	1890121
TreeView+	depends on / blocked

Reported:	2020-10-23 19:02 UTC by Christina Meno
Modified:	2021-06-09 16:16 UTC (History)
CC List:	17 users (show)
Fixed In Version:	ceph-14.2.11-79.el8cp, ceph-14.2.11-79.el7cp
Doc Type:	Enhancement
Doc Text:	.Ceph health details are logged in the cluster log Previously, the cluster log did not have the Ceph health details , so it was difficult to conclude on the root cause of the issue. With this release, the Ceph health details are logged in the cluster log which enables the review of the issues that might arise in the cluster.
Clone Of:
Environment:
Last Closed:	2021-01-12 14:58:09 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	48042	None	None	None	2020-10-29 16:41:14 UTC
Github	ceph ceph pull 37902	None	closed	mon: Log "ceph health detail" periodically in cluster log	2021-01-25 09:38:38 UTC
Github	ceph ceph pull 38118	None	closed	nautilus: mon: Log "ceph health detail" periodically in cluster log	2021-01-25 09:38:39 UTC
Red Hat Product Errata	RHSA-2021:0081	None	None	None	2021-01-12 14:58:33 UTC

Description Christina Meno 2020-10-23 19:02:10 UTC

Description of problem:
We don't have detailed cluster health/sanity information available for customers and support to review when problems with the cluster arise.

Version-Release number of selected component (if applicable):
4.1

Comment 1 RHEL Program Management 2020-10-23 19:02:17 UTC

Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 3 Prashant Dhange 2020-10-30 06:11:56 UTC

Should we log health detail on health check failure as well as every mon_health_to_clog_interval ? Logging health detail too frequently does not make sense as it will make cluster log grow rapidly in case cluster is unhealthy.

Comment 5 Vikhyat Umrao 2020-10-30 11:26:43 UTC

(In reply to Prashant Dhange from comment #3)
> Should we log health detail on health check failure as well as every
> mon_health_to_clog_interval ? Logging health detail too frequently does not
> make sense as it will make cluster log grow rapidly in case cluster is
> unhealthy.

I think the idea is to log health detail during the health warn/err. Maybe logging every mon_health_to_clog_interval is not necessary? let us wait for Neha's inputs.

Comment 6 Neha Ojha 2020-10-30 16:39:23 UTC

(In reply to Prashant Dhange from comment #3)
> Should we log health detail on health check failure as well as every
> mon_health_to_clog_interval ? Logging health detail too frequently does not
> make sense as it will make cluster log grow rapidly in case cluster is
> unhealthy.

I don't think we need to log every mon_health_to_clog_interval as well. Let's discuss implementation details in the PR.

Comment 7 Yaniv Kaul 2020-11-10 20:43:42 UTC

devel-ack+ please?

Comment 18 errata-xmlrpc 2021-01-12 14:58:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Ceph Storage 4.2 Security and Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0081

Note You need to log in before you can comment on or make changes to this bug.