Bug 2235753 - Support snapshot crash consistency across clients
Summary: Support snapshot crash consistency across clients
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: CephFS
Version: 6.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 7.1
Assignee: Patrick Donnelly
QA Contact: sumr
Akash Raj
URL:
Whiteboard:
Depends On:
Blocks: 2267614 2298578 2298579
TreeView+ depends on / blocked
 
Reported: 2023-08-29 15:46 UTC by Greg Farnum
Modified: 2024-11-16 04:25 UTC (History)
9 users (show)

Fixed In Version: ceph-18.2.1-58.el9cp
Doc Type: Enhancement
Doc Text:
.CephFS supports quiescing of subvolumes or directory trees Previously, multiple clients would interleave reads and writes across a consistent snapshot barrier where out-of-band communication existed between clients. This communication led to clients wrongly believing they have reached a checkpoint that is mutually recoverable via a snapshot. With this enhancement, CephFS supports quiescing of subvolumes or directory trees to enable the execution of crash-consistent snapshots. Clients are now forced to quiesce all I/O before the MDS executes the snapshot. This enforces a checkpoint across all clients of the subtree.
Clone Of:
Environment:
Last Closed: 2024-06-13 14:20:57 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 62083 0 None None None 2023-08-29 15:46:14 UTC
Red Hat Issue Tracker RHCEPH-7283 0 None None None 2023-08-29 15:51:12 UTC
Red Hat Product Errata RHSA-2024:3925 0 None None None 2024-06-13 14:21:03 UTC

Description Greg Farnum 2023-08-29 15:46:15 UTC
Right now, when you create a CephFS snapshot, the MDS simply sends all affected clients a message saying the snapshot now exists.

This is incredibly fast, but does not maintain crash consistency if multiple clients are accessing the snapshotted tree.

We need to build a mechanism which lets us take crash-consistent snapshots.


I suspect the simplest mechanism will be to just revoke all client exclusive and write caps, then take the snapshot, and then let clients take back whatever caps they want. It has the huge advantage of not requiring any client updates. But we'll need to see how much work that is to implement.

Comment 1 RHEL Program Management 2023-08-29 15:46:24 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 5 Leonid Usov 2023-12-26 13:55:48 UTC
Our approach to the problem is that the consistency of our snapshots boils down to whether clients are allowed to perform write IOs while snapshots are being taken. After analyzing the requirements we concluded that we should create and expose a new API to manage a pause of IO to a set of file system paths. We refer to this as a "quiesce set" database.

By exposing such an API we cover a variety of enterprise use cases. Given an active pause, one can schedule any maintenance that should appear atomic to the enterprise apps operating within the quiesced roots. This enables consistent FS snapshots by scheduling them during the pause, but it also allows to achieve consistent snapshots across the FS and RBD volumes by running the RBD snapshot(s) while an FS pause is active and then running the FS snapshots before releasing the pause.

As Greg suggested initially, our first implementation of the pause involves revoking write capabilities from the clients. This approach is backward compatible with all existing clients, but it has an overhead of the caps ping pong and redundant write cache flushes. NB: applications are required to issue flushes if they want to get crash consistency guarantees from the system, so this latter overhead is not a complete waste, but since it's asynchronous to the application it will have some performance impact in the general case. Another drawback is that the MDS servers will have to deal with the added pressure of the pending IOs due to the clients trying to claim the capabilities back while the pause is active. 

We also considered a new client quiesce protocol that would avoid all of the overheads and implement the pause on the client side. This will require client-side changes and hence will have to go into a later release, subject to future planning.

The overall design is detailed in this slide deck: https://docs.google.com/presentation/d/1wE3-e9AAme7Q3qmeshUSthJoQGw7-fKTrtS9PsdAIVo/edit#slide=id.p
Ongoing work is tracked by the subtasks of the feature ticket: https://tracker.ceph.com/issues/63663

Comment 20 errata-xmlrpc 2024-06-13 14:20:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Critical: Red Hat Ceph Storage 7.1 security, enhancements, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:3925

Comment 21 Red Hat Bugzilla 2024-11-16 04:25:09 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.