Bug 1906627 - In-use snapshot can be prematurely deleted if replay is backlogged
Summary: In-use snapshot can be prematurely deleted if replay is backlogged
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RBD-Mirror
Version: 4.2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.2z2
Assignee: Ilya Dryomov
QA Contact: Harish Munjulur
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-11 00:41 UTC by Jason Dillaman
Modified: 2021-06-15 17:13 UTC (History)
8 users (show)

Fixed In Version: ceph-14.2.11-162.el8cp, ceph-14.2.11-162.el7cp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-06-15 17:13:09 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 48553 0 None None None 2020-12-11 00:41:24 UTC
Red Hat Product Errata RHSA-2021:2445 0 None None None 2021-06-15 17:13:33 UTC

Description Jason Dillaman 2020-12-11 00:41:25 UTC
Description of problem:
The default limits permit three mirror snapshots per image -- at which point the "limit - 1" mirror snapshot will be removed (oldest->newest ordering). Normally the rbd-mirror daemon will delete all but the most-recent snapshot once it has performed its sync. However, if the limit is reached while rbd-mirror is syncing between the oldest and next oldest snapshot, the next oldest snapshot will be removed while its in-use potentially leading to data corruption. 

Version-Release number of selected component (if applicable):
4.2

How reproducible:
100% under a loaded system with new snapshots being generated

Steps to Reproduce:
1. load the system so that snapshot pruning is occuring

Actual results:
Potential for data corruption if the OSDs can act on the removed snapshot before the delta-sync completes. In upstream, it can lead to an assertion failure due to other bug fixes.

Expected results:
An in-use snapshot will not be removed.

Additional info:

Comment 12 Harish Munjulur 2021-06-10 12:12:18 UTC
Thanks for the comments Ilya will move to QA verified. 

QA did not see any backlogged snapshots while creating hundreds of snapshots and checking the status. Hence moving to Verified.

Comment 14 errata-xmlrpc 2021-06-15 17:13:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Ceph Storage 4.2 Security and Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2445


Note You need to log in before you can comment on or make changes to this bug.