Bug 1593031 - MDSMonitor sends redundant health messages to the cluster log
Summary: MDSMonitor sends redundant health messages to the cluster log
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: CephFS
Version: 3.0
Hardware: All
OS: All
Target Milestone: z5
: 3.0
Assignee: Patrick Donnelly
QA Contact: subhash
Depends On:
TreeView+ depends on / blocked
Reported: 2018-06-19 22:10 UTC by Patrick Donnelly
Modified: 2018-08-09 18:27 UTC (History)
5 users (show)

Fixed In Version: RHEL: ceph-12.2.4-32 Ubuntu: ceph_12.2.4-36redhat1xenial
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2018-08-09 18:27:11 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Ceph Project Bug Tracker 24331 None None None 2018-06-19 22:10:42 UTC
Red Hat Product Errata RHBA-2018:2375 None None None 2018-08-09 18:27:42 UTC

Description Patrick Donnelly 2018-06-19 22:10:43 UTC
Description of problem:

Health warnings have separate logging infrastructure which automatically logs warnings/errors. The MDSMonitor is also sending these messages to the error/warning cluster logs.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:

Create a health warning:

1. in one ceph-fuse client: dd if=/dev/urandom of=foo bs=4096
2. kill -9 client1
3. in second client: ls foo

You should see a warning.

Actual results:

2018-06-19 18:07:31.318692 mon.a mon.0 103 : cluster [WRN] MDS health message (mds.0): 1 slow requests are blocked > 30 sec
2018-06-19 18:07:32.165361 mon.a mon.0 104 : cluster [WRN] Health check failed: 1 MDSs report slow requests (MDS_SLOW_REQUEST)

Expected results:

2018-06-19 18:07:32.165361 mon.a mon.0 104 : cluster [WRN] Health check failed: 1 MDSs report slow requests (MDS_SLOW_REQUEST)

Comment 11 subhash 2018-08-02 11:29:42 UTC
Hi Patrick,

Followed the steps from desc to reproduce.

checked the logs on mon node at /var/log/ceph/{clustername}.log

2018-08-02 11:00:00.000152 mon.magna021 mon.0 235200 : cluster [WRN] overall HEALTH_WARN 1 pools have many more objects per pg than average; application not enabled on 3 pool(s)
2018-08-02 11:20:08.098977 mon.magna021 mon.0 236277 : cluster [WRN] Health check failed: 1 MDSs report slow requests (MDS_SLOW_REQUEST)
2018-08-02 11:20:07.147113 mds.magna030 mds.0 11 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.689844 secs
2018-08-02 11:20:07.147122 mds.magna030 mds.0 12 : cluster [WRN] slow request 30.689844 seconds old, received at 2018-08-02 11:19:36.457197: client_request(client.114589302:46 lookup #0x1/bar 2018-08-02 11:19:53.770012 caller_uid=1000, caller_gid=1148{6,1000,1148,10102,}) currently failed to rdlock, waiting
2018-08-02 11:20:24.023987 mon.magna021 mon.0 236306 : cluster [INF] MDS health message cleared (mds.0): 1 slow requests are blocked > 30 sec
2018-08-02 11:20:25.048642 mon.magna021 mon.0 236307 : cluster [INF] Health check cleared: MDS_SLOW_REQUEST (was: 1 MDSs report slow requests)
2018-08-02 11:24:14.779027 mds.magna033 mds.1 6 : cluster [WRN] evicting unresponsive client magna034.ceph.redhat.com (114566210), after 300.020233 seconds

will this be behaviour be enuf to verify this bug?the results are as expected,but i see addition two warnings from mds ,kindly clarify if those log messages are as expected if so i'll move this to verified state

ceph version 12.2.4-45redhat1xenial (9b4e0526cc76b0b086888b0e1f747cbebd990d56) luminous (stable)

Comment 14 errata-xmlrpc 2018-08-09 18:27:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.