2087529 – [17.2.0] Large cephadm Health Detail log MSG causing MON to lose the quorum

Bug 2087529 - [17.2.0] Large cephadm Health Detail log MSG causing MON to lose the quorum

Summary: [17.2.0] Large cephadm Health Detail log MSG causing MON to lose the quorum

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Cephadm
Sub Component:
Version:	6.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	6.0
Assignee:	Adam King
QA Contact:	Rajendra Khambadkar
Docs Contact:	Masauso Lungu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-05-17 23:39 UTC by Vikhyat Umrao
Modified:	2023-03-20 18:57 UTC (History)
CC List:	7 users (show)
Fixed In Version:	ceph-17.2.3-1.el9cp
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-03-20 18:56:27 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	54132	None	None	None	2022-05-17 23:39:17 UTC
Red Hat Issue Tracker	RHCEPH-4322	None	None	None	2022-05-17 23:44:06 UTC
Red Hat Product Errata	RHBA-2023:1360	None	None	None	2023-03-20 18:57:02 UTC

Description Vikhyat Umrao 2022-05-17 23:39:17 UTC

Description of problem:
[17.2.0] Large cephadm Health Detail log MSG causing MON to lose the quorum

Version-Release number of selected component (if applicable):
Upstream quincy

In a recent occurrence on one of the Ceph upstream clusters, the LRC one went into DU mode due to a bug https://tracker.ceph.com/issues/54132.

Issues caused by this problem:

- SSH errors logged massive hexdump output (cephadm bug already fixed)
- These logs got stored as part of ‘ceph health detail’ as well
- Periodically we dumped ‘ceph health detail’ to the cluster log (going through paxos, each mon db, etc.) causing the mons to lose quorum and become unresponsive

WORKAROUND:
           Set option mon_health_detail_to_clog to false. 

If MONs are in quorum:

ceph config set mon mon_health_detail_to_clog false

If MONs are not in quorum:
It needs to be changed in /var/lib/ceph/$fsid/mon.$hostname/config and restart of monitor service.

Comment 1 Vikhyat Umrao 2022-05-17 23:40:53 UTC

Quincy backport - https://github.com/ceph/ceph/pull/46055

Comment 21 errata-xmlrpc 2023-03-20 18:56:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 6.0 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:1360

Note You need to log in before you can comment on or make changes to this bug.