Description of problem: Health warnings have separate logging infrastructure which automatically logs warnings/errors. The MDSMonitor is also sending these messages to the error/warning cluster logs. Version-Release number of selected component (if applicable): 3.0 How reproducible: 100% Steps to Reproduce: Create a health warning: 1. in one ceph-fuse client: dd if=/dev/urandom of=foo bs=4096 2. kill -9 client1 3. in second client: ls foo You should see a warning. Actual results: 2018-06-19 18:07:31.318692 mon.a mon.0 127.0.0.1:40580/0 103 : cluster [WRN] MDS health message (mds.0): 1 slow requests are blocked > 30 sec 2018-06-19 18:07:32.165361 mon.a mon.0 127.0.0.1:40580/0 104 : cluster [WRN] Health check failed: 1 MDSs report slow requests (MDS_SLOW_REQUEST) Expected results: 2018-06-19 18:07:32.165361 mon.a mon.0 127.0.0.1:40580/0 104 : cluster [WRN] Health check failed: 1 MDSs report slow requests (MDS_SLOW_REQUEST)
Hi Patrick, Followed the steps from desc to reproduce. checked the logs on mon node at /var/log/ceph/{clustername}.log 2018-08-02 11:00:00.000152 mon.magna021 mon.0 10.8.128.21:6789/0 235200 : cluster [WRN] overall HEALTH_WARN 1 pools have many more objects per pg than average; application not enabled on 3 pool(s) 2018-08-02 11:20:08.098977 mon.magna021 mon.0 10.8.128.21:6789/0 236277 : cluster [WRN] Health check failed: 1 MDSs report slow requests (MDS_SLOW_REQUEST) 2018-08-02 11:20:07.147113 mds.magna030 mds.0 10.8.128.30:6800/2859325189 11 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.689844 secs 2018-08-02 11:20:07.147122 mds.magna030 mds.0 10.8.128.30:6800/2859325189 12 : cluster [WRN] slow request 30.689844 seconds old, received at 2018-08-02 11:19:36.457197: client_request(client.114589302:46 lookup #0x1/bar 2018-08-02 11:19:53.770012 caller_uid=1000, caller_gid=1148{6,1000,1148,10102,}) currently failed to rdlock, waiting 2018-08-02 11:20:24.023987 mon.magna021 mon.0 10.8.128.21:6789/0 236306 : cluster [INF] MDS health message cleared (mds.0): 1 slow requests are blocked > 30 sec 2018-08-02 11:20:25.048642 mon.magna021 mon.0 10.8.128.21:6789/0 236307 : cluster [INF] Health check cleared: MDS_SLOW_REQUEST (was: 1 MDSs report slow requests) 2018-08-02 11:24:14.779027 mds.magna033 mds.1 10.8.128.33:6800/2324001472 6 : cluster [WRN] evicting unresponsive client magna034.ceph.redhat.com (114566210), after 300.020233 seconds will this be behaviour be enuf to verify this bug?the results are as expected,but i see addition two warnings from mds ,kindly clarify if those log messages are as expected if so i'll move this to verified state ---- ceph version 12.2.4-45redhat1xenial (9b4e0526cc76b0b086888b0e1f747cbebd990d56) luminous (stable)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2375