Bug 2235753
Summary: | Support snapshot crash consistency across clients | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Greg Farnum <gfarnum> |
Component: | CephFS | Assignee: | Patrick Donnelly <pdonnell> |
Status: | CLOSED ERRATA | QA Contact: | sumr |
Severity: | high | Docs Contact: | Akash Raj <akraj> |
Priority: | high | ||
Version: | 6.0 | CC: | akraj, amk, ceph-eng-bugs, cephqe-warriors, hyelloji, lusov, ngangadh, pdonnell, tserlin |
Target Milestone: | --- | ||
Target Release: | 7.1 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | ceph-18.2.1-58.el9cp | Doc Type: | Enhancement |
Doc Text: |
.CephFS supports quiescing of subvolumes or directory trees
Previously, multiple clients would interleave reads and writes across a consistent snapshot barrier where out-of-band communication existed between clients. This communication led to clients wrongly believing they have reached a checkpoint that is mutually recoverable via a snapshot.
With this enhancement, CephFS supports quiescing of subvolumes or directory trees to enable the execution of crash-consistent snapshots. Clients are now forced to quiesce all I/O before the MDS executes the snapshot. This enforces a checkpoint across all clients of the subtree.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2024-06-13 14:20:57 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 2267614, 2298578, 2298579 |
Description
Greg Farnum
2023-08-29 15:46:15 UTC
Please specify the severity of this bug. Severity is defined here: https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity. Our approach to the problem is that the consistency of our snapshots boils down to whether clients are allowed to perform write IOs while snapshots are being taken. After analyzing the requirements we concluded that we should create and expose a new API to manage a pause of IO to a set of file system paths. We refer to this as a "quiesce set" database. By exposing such an API we cover a variety of enterprise use cases. Given an active pause, one can schedule any maintenance that should appear atomic to the enterprise apps operating within the quiesced roots. This enables consistent FS snapshots by scheduling them during the pause, but it also allows to achieve consistent snapshots across the FS and RBD volumes by running the RBD snapshot(s) while an FS pause is active and then running the FS snapshots before releasing the pause. As Greg suggested initially, our first implementation of the pause involves revoking write capabilities from the clients. This approach is backward compatible with all existing clients, but it has an overhead of the caps ping pong and redundant write cache flushes. NB: applications are required to issue flushes if they want to get crash consistency guarantees from the system, so this latter overhead is not a complete waste, but since it's asynchronous to the application it will have some performance impact in the general case. Another drawback is that the MDS servers will have to deal with the added pressure of the pending IOs due to the clients trying to claim the capabilities back while the pause is active. We also considered a new client quiesce protocol that would avoid all of the overheads and implement the pause on the client side. This will require client-side changes and hence will have to go into a later release, subject to future planning. The overall design is detailed in this slide deck: https://docs.google.com/presentation/d/1wE3-e9AAme7Q3qmeshUSthJoQGw7-fKTrtS9PsdAIVo/edit#slide=id.p Ongoing work is tracked by the subtasks of the feature ticket: https://tracker.ceph.com/issues/63663 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Critical: Red Hat Ceph Storage 7.1 security, enhancements, and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:3925 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |