Bug 1624527

Summary: MDS spams is_laggy message at log level 1
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Patrick Donnelly <pdonnell>
Component: CephFSAssignee: Patrick Donnelly <pdonnell>
Status: CLOSED ERRATA QA Contact: Ramakrishnan Periyasamy <rperiyas>
Severity: high Docs Contact: Bara Ancincova <bancinco>
Priority: urgent    
Version: 3.0CC: ceph-eng-bugs, john.spray, mmuir, nobody+410372, pdonnell, rperiyas, tchandra, tserlin, vumrao
Target Milestone: z1   
Target Release: 3.1   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: RHEL: ceph-12.2.5-46.el7cp Ubuntu: ceph_12.2.5-31redhat1 Doc Type: Bug Fix
Doc Text:
.The "is_laggy" messages no longer cause the debug log to grow to several GB per day When the MDS detected that the connection to Monitors was laggy due to missing beacon acks, the MDS logged "is_laggy" messages to the debug log at level 1. Consequently, these messages caused the debug log to grow to several GB per day. With this update, the MDS outputs the log message once for each event of lagginess.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-09 00:59:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1584264    
Attachments:
Description Flags
Hotfix_1624527_report none

Description Patrick Donnelly 2018-08-31 22:33:25 UTC
Description of problem:

When the MDS detects that the mds-mon connection is laggy due to missing beacon acks, it spams the debug log with "is_laggy" messages at level 1:

> 2018-08-31 07:15:26.991594 7f6209c1d700  1 mds.beacon.storagem4-ngn1 is_laggy 15.000025 > 15 since last acked beacon
> 2018-08-31 07:15:26.991602 7f6209c1d700  1 mds.beacon.storagem4-ngn1 is_laggy 15.000032 > 15 since last acked beacon
> 2018-08-31 07:15:26.991603 7f6209c1d700  1 mds.beacon.storagem4-ngn1 is_laggy 15.000034 > 15 since last acked beacon
> 2018-08-31 07:15:26.991633 7f6209c1d700  1 mds.beacon.storagem4-ngn1 is_laggy 15.000063 > 15 since last acked beacon
> 2018-08-31 07:15:26.991641 7f6209c1d700  1 mds.beacon.storagem4-ngn1 is_laggy 15.000071 > 15 since last acked beacon

How reproducible:

100% when connection with mons is partitioned.

Comment 21 Ramakrishnan Periyasamy 2018-09-08 05:41:59 UTC
Created attachment 1481699 [details]
Hotfix_1624527_report

Hotfix_1624527_report

Comment 24 Patrick Donnelly 2018-09-10 21:04:17 UTC
*** Bug 1626912 has been marked as a duplicate of this bug. ***

Comment 34 Ramakrishnan Periyasamy 2018-09-12 17:17:50 UTC
Manual Testing of Hotfix is completed. Yet to get Automation CI results from shreekar, as of now 3 -4 tests are pending, so far no failures observed.

Shreekar will update the CI run link once automation is complete.

Comment 47 Ramakrishnan Periyasamy 2018-10-22 09:09:32 UTC
Thanks Patrick.

Moving this bug to verified.

Comment 49 errata-xmlrpc 2018-11-09 00:59:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3530