1593031 – MDSMonitor sends redundant health messages to the cluster log

Bug 1593031 - MDSMonitor sends redundant health messages to the cluster log

Summary: MDSMonitor sends redundant health messages to the cluster log

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	CephFS
Sub Component:
Version:	3.0
Hardware:	All
OS:	All
Priority:	medium
Severity:	low
Target Milestone:	z5
Target Release:	3.0
Assignee:	Patrick Donnelly
QA Contact:	subhash
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-06-19 22:10 UTC by Patrick Donnelly
Modified:	2018-08-09 18:27 UTC (History)
CC List:	5 users (show)
Fixed In Version:	RHEL: ceph-12.2.4-32 Ubuntu: ceph_12.2.4-36redhat1xenial
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-08-09 18:27:11 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	24331	0	None	None	None	2018-06-19 22:10:42 UTC
Red Hat Product Errata	RHBA-2018:2375	0	None	None	None	2018-08-09 18:27:42 UTC

Description Patrick Donnelly 2018-06-19 22:10:43 UTC

Description of problem:

Health warnings have separate logging infrastructure which automatically logs warnings/errors. The MDSMonitor is also sending these messages to the error/warning cluster logs.

Version-Release number of selected component (if applicable):

3.0

How reproducible:

100%

Steps to Reproduce:

Create a health warning:

1. in one ceph-fuse client: dd if=/dev/urandom of=foo bs=4096
2. kill -9 client1
3. in second client: ls foo

You should see a warning.

Actual results:

2018-06-19 18:07:31.318692 mon.a mon.0 127.0.0.1:40580/0 103 : cluster [WRN] MDS health message (mds.0): 1 slow requests are blocked > 30 sec
2018-06-19 18:07:32.165361 mon.a mon.0 127.0.0.1:40580/0 104 : cluster [WRN] Health check failed: 1 MDSs report slow requests (MDS_SLOW_REQUEST)


Expected results:

2018-06-19 18:07:32.165361 mon.a mon.0 127.0.0.1:40580/0 104 : cluster [WRN] Health check failed: 1 MDSs report slow requests (MDS_SLOW_REQUEST)

Comment 11 subhash 2018-08-02 11:29:42 UTC

Hi Patrick,

Followed the steps from desc to reproduce.

checked the logs on mon node at /var/log/ceph/{clustername}.log

2018-08-02 11:00:00.000152 mon.magna021 mon.0 10.8.128.21:6789/0 235200 : cluster [WRN] overall HEALTH_WARN 1 pools have many more objects per pg than average; application not enabled on 3 pool(s)
2018-08-02 11:20:08.098977 mon.magna021 mon.0 10.8.128.21:6789/0 236277 : cluster [WRN] Health check failed: 1 MDSs report slow requests (MDS_SLOW_REQUEST)
2018-08-02 11:20:07.147113 mds.magna030 mds.0 10.8.128.30:6800/2859325189 11 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.689844 secs
2018-08-02 11:20:07.147122 mds.magna030 mds.0 10.8.128.30:6800/2859325189 12 : cluster [WRN] slow request 30.689844 seconds old, received at 2018-08-02 11:19:36.457197: client_request(client.114589302:46 lookup #0x1/bar 2018-08-02 11:19:53.770012 caller_uid=1000, caller_gid=1148{6,1000,1148,10102,}) currently failed to rdlock, waiting
2018-08-02 11:20:24.023987 mon.magna021 mon.0 10.8.128.21:6789/0 236306 : cluster [INF] MDS health message cleared (mds.0): 1 slow requests are blocked > 30 sec
2018-08-02 11:20:25.048642 mon.magna021 mon.0 10.8.128.21:6789/0 236307 : cluster [INF] Health check cleared: MDS_SLOW_REQUEST (was: 1 MDSs report slow requests)
2018-08-02 11:24:14.779027 mds.magna033 mds.1 10.8.128.33:6800/2324001472 6 : cluster [WRN] evicting unresponsive client magna034.ceph.redhat.com (114566210), after 300.020233 seconds


will this be behaviour be enuf to verify this bug?the results are as expected,but i see addition two warnings from mds ,kindly clarify if those log messages are as expected if so i'll move this to verified state

----
ceph version 12.2.4-45redhat1xenial (9b4e0526cc76b0b086888b0e1f747cbebd990d56) luminous (stable)

Comment 14 errata-xmlrpc 2018-08-09 18:27:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2375

Note You need to log in before you can comment on or make changes to this bug.