Bug 1601138
| Summary: | MDS stuck in up:resolve during many concurrent failovers | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Patrick Donnelly <pdonnell> |
| Component: | CephFS | Assignee: | Yan, Zheng <zyan> |
| Status: | CLOSED ERRATA | QA Contact: | Vasu Kulkarni <vakulkar> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 3.0 | CC: | anharris, ceph-eng-bugs, edonnell, john.spray, pdonnell, rperiyas, tchandra, tserlin, vumrao |
| Target Milestone: | z5 | ||
| Target Release: | 3.0 | ||
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | |||
| Fixed In Version: | RHEL: ceph-12.2.4-39.el7cp Ubuntu: ceph_12.2.4-44redhat1 | Doc Type: | Bug Fix |
| Doc Text: |
Previously, in cluster configurations with multiple active metadata servers, it was possible for an MDS to become stuck in "up:resolve" state during recovery. This would generally happen in scenarios involving concurrent recovery and active MDSs becoming stuck on long running operations like balancing metadata load. The MDS could only be resolved by restarting it. With this update, the underlying code has been fixed to resolve the underlying issue where an MDS could miss updates from the Monitors that indicated another MDS failed. MDSs no longer becomes stuck in "up:resolve" and continue recovery.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-08-09 18:27:13 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Patrick Donnelly
2018-07-14 03:43:30 UTC
Cherry-picked https://github.com/ceph/ceph/pull/23169 and ran on downstream, looks good http://pulpito.ceph.redhat.com/vasu-2018-07-27_19:33:34-fs-luminous-distro-basic-argo/ Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2375 |