Bug 1664468 - MDS hangs and is removed when doing a significant shrink of a large cache
Summary: MDS hangs and is removed when doing a significant shrink of a large cache
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: CephFS
Version: 3.1
Hardware: All
OS: All
high
high
Target Milestone: z1
: 3.2
Assignee: Patrick Donnelly
QA Contact: Persona non grata
Bara Ancincova
URL:
Whiteboard:
Depends On:
Blocks: 1629656
TreeView+ depends on / blocked
 
Reported: 2019-01-08 22:07 UTC by Patrick Donnelly
Modified: 2019-03-07 15:51 UTC (History)
8 users (show)

Fixed In Version: RHEL: ceph-2:12.2.8-72.el7cp Ubuntu: ceph_12.2.8-58redhat1
Doc Type: Bug Fix
Doc Text:
.Shrinking large MDS cache no longer causes the MDS daemon to appear to hang Previously, an attempt to shrink a large Metadata Server (MDS) cache caused the primary MDS daemon to become unresponsive. Consequently, Monitors removed the unresponsive MDS and a standby MDS became the primary MDS. With this update, shrinking large MDS cache no longer causes the primary MDS daemon to hang.
Clone Of:
Environment:
Last Closed: 2019-03-07 15:51:27 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 38102 0 None None None 2019-02-04 15:36:41 UTC
Ceph Project Bug Tracker 38104 0 None None None 2019-02-04 15:36:41 UTC
Ceph Project Bug Tracker 38132 0 None None None 2019-02-04 02:23:45 UTC
Red Hat Product Errata RHBA-2019:0475 0 None None None 2019-03-07 15:51:36 UTC

Internal Links: 1669628

Description Patrick Donnelly 2019-01-08 22:07:06 UTC
Description of problem:

When the MDS is doing a significant shrink of a large cache, e.g. 96GB -> 64GB, the MDS will spin trying to trim cached objects and asking clients to recall caps. The monitors will remove the MDS because it's missing heartbeat beacons.

Version-Release number of selected component (if applicable):

3.0

How reproducible:

100%

Steps to Reproduce:
1. Create a file system and fill it with ~10 million files. Then have 4-5 clients load those files into memory.
2. Reduce the MDS cache using the `config set` admin socket command.

Actual results:

MDS will be removed from the MDSMap and a standby will take over.

Expected results:

The MDS slowly reduces its cache without service interruption.

Comment 12 errata-xmlrpc 2019-03-07 15:51:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0475


Note You need to log in before you can comment on or make changes to this bug.