Bug 1624527 - MDS spams is_laggy message at log level 1
Summary: MDS spams is_laggy message at log level 1
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: CephFS
Version: 3.0
Hardware: All
OS: All
urgent
high
Target Milestone: z1
: 3.1
Assignee: Patrick Donnelly
QA Contact: Ramakrishnan Periyasamy
Bara Ancincova
URL:
Whiteboard:
: 1626912 (view as bug list)
Depends On:
Blocks: 1584264
TreeView+ depends on / blocked
 
Reported: 2018-08-31 22:33 UTC by Patrick Donnelly
Modified: 2021-12-10 17:24 UTC (History)
9 users (show)

Fixed In Version: RHEL: ceph-12.2.5-46.el7cp Ubuntu: ceph_12.2.5-31redhat1
Doc Type: Bug Fix
Doc Text:
.The "is_laggy" messages no longer cause the debug log to grow to several GB per day When the MDS detected that the connection to Monitors was laggy due to missing beacon acks, the MDS logged "is_laggy" messages to the debug log at level 1. Consequently, these messages caused the debug log to grow to several GB per day. With this update, the MDS outputs the log message once for each event of lagginess.
Clone Of:
Environment:
Last Closed: 2018-11-09 00:59:32 UTC
Embargoed:


Attachments (Terms of Use)
Hotfix_1624527_report (24.19 KB, text/plain)
2018-09-08 05:41 UTC, Ramakrishnan Periyasamy
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 35718 0 None None None 2018-09-26 23:28:47 UTC
Ceph Project Bug Tracker 35859 0 None None None 2018-09-10 20:49:56 UTC
Red Hat Issue Tracker RHCEPH-2697 0 None None None 2021-12-10 17:24:20 UTC
Red Hat Product Errata RHBA-2018:3530 0 None None None 2018-11-09 01:00:29 UTC

Description Patrick Donnelly 2018-08-31 22:33:25 UTC
Description of problem:

When the MDS detects that the mds-mon connection is laggy due to missing beacon acks, it spams the debug log with "is_laggy" messages at level 1:

> 2018-08-31 07:15:26.991594 7f6209c1d700  1 mds.beacon.storagem4-ngn1 is_laggy 15.000025 > 15 since last acked beacon
> 2018-08-31 07:15:26.991602 7f6209c1d700  1 mds.beacon.storagem4-ngn1 is_laggy 15.000032 > 15 since last acked beacon
> 2018-08-31 07:15:26.991603 7f6209c1d700  1 mds.beacon.storagem4-ngn1 is_laggy 15.000034 > 15 since last acked beacon
> 2018-08-31 07:15:26.991633 7f6209c1d700  1 mds.beacon.storagem4-ngn1 is_laggy 15.000063 > 15 since last acked beacon
> 2018-08-31 07:15:26.991641 7f6209c1d700  1 mds.beacon.storagem4-ngn1 is_laggy 15.000071 > 15 since last acked beacon

How reproducible:

100% when connection with mons is partitioned.

Comment 21 Ramakrishnan Periyasamy 2018-09-08 05:41:59 UTC
Created attachment 1481699 [details]
Hotfix_1624527_report

Hotfix_1624527_report

Comment 24 Patrick Donnelly 2018-09-10 21:04:17 UTC
*** Bug 1626912 has been marked as a duplicate of this bug. ***

Comment 34 Ramakrishnan Periyasamy 2018-09-12 17:17:50 UTC
Manual Testing of Hotfix is completed. Yet to get Automation CI results from shreekar, as of now 3 -4 tests are pending, so far no failures observed.

Shreekar will update the CI run link once automation is complete.

Comment 47 Ramakrishnan Periyasamy 2018-10-22 09:09:32 UTC
Thanks Patrick.

Moving this bug to verified.

Comment 49 errata-xmlrpc 2018-11-09 00:59:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3530


Note You need to log in before you can comment on or make changes to this bug.