Bug 1714810
| Summary: | MDS may hang during up:rejoin while iterating inodes | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Patrick Donnelly <pdonnell> |
| Component: | CephFS | Assignee: | Yan, Zheng <zyan> |
| Status: | CLOSED ERRATA | QA Contact: | subhash <vpoliset> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 3.1 | CC: | ceph-eng-bugs, ceph-qe-bugs, edonnell, pdhange, roemerso, sweil, tchandra, tserlin, zyan |
| Target Milestone: | rc | ||
| Target Release: | 3.3 | ||
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | |||
| Fixed In Version: | RHEL: ceph-12.2.12-23.el7cp Ubuntu: ceph_12.2.12-19redhat1xenial | Doc Type: | Bug Fix |
| Doc Text: |
.Heartbeat packets are reset as expected
Previously, the Ceph Metadata Server (MDS) did not reset heartbeat packets when it was busy in a large loops. This prevented the MDS from sending a beacon to the Monitor. With this update, the Monitor replaces the busy MDS, and the heartbeat packets are reset when the MDS is busy in a large loop.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-08-21 15:11:09 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1726135 | ||
|
Description
Patrick Donnelly
2019-05-28 23:01:06 UTC
Might be related to bz1614498. That fix would have been in their cluster. Support comment to customer: Hi Shri, The mds was hung during up:rejoin and also mds was being removed from MDSmap seems to be because of it was busy iterating over inodes, refer BZ [1] and patch [2] which was cherrypicked in luminous. This has been fixed in rhcs 3.1 i.e in ceph-mds-12.2.5-59.el7cp.x86_64. We have opened BZ [3] to investigate why OSDs were flapping when mds_beacon_grace to 3600. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1714810 [2] https://github.com/ceph/ceph/pull/21366 [3] https://bugzilla.redhat.com/show_bug.cgi?id=1714848 Let us know if you have any further queries. ------------------------------- Customer response: Hi, we have specific hotfix version on our CEPH servers, does it have fixes for 1 &2? ceph version 12.2.4-42.2.hotfix.nvidia.el7cp (2ae8fcd75c666ffc9badac24707996801ac24fd0) luminous (stable) Thanks Shri ------------------------------ Can engineering assist in validating the customer query above related to their specific hotfix version? (In reply to Bob Emerson from comment #3) > Hi, > we have specific hotfix version on our CEPH servers, does it have fixes for > 1 &2? > > ceph version 12.2.4-42.2.hotfix.nvidia.el7cp > (2ae8fcd75c666ffc9badac24707996801ac24fd0) luminous (stable) Yes, that release has a942cc479c0df10cefe08d1eefac8bee20a39a2e (the fix from [2]). This must be a different problem. *** Bug 1713527 has been marked as a duplicate of this bug. *** It's not easy to reproduce. need to setup lots of cephfs client, each client opens lots of files, then restart mds. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:2538 |