Bug 1906627

Summary: In-use snapshot can be prematurely deleted if replay is backlogged
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Jason Dillaman <jdillama>
Component: RBD-MirrorAssignee: Ilya Dryomov <idryomov>
Status: CLOSED ERRATA QA Contact: Harish Munjulur <hmunjulu>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.2CC: ceph-eng-bugs, ceph-qe-bugs, gpatta, hmunjulu, idryomov, mmurthy, tserlin, vereddy
Target Milestone: ---   
Target Release: 4.2z2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-14.2.11-162.el8cp, ceph-14.2.11-162.el7cp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-06-15 17:13:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jason Dillaman 2020-12-11 00:41:25 UTC
Description of problem:
The default limits permit three mirror snapshots per image -- at which point the "limit - 1" mirror snapshot will be removed (oldest->newest ordering). Normally the rbd-mirror daemon will delete all but the most-recent snapshot once it has performed its sync. However, if the limit is reached while rbd-mirror is syncing between the oldest and next oldest snapshot, the next oldest snapshot will be removed while its in-use potentially leading to data corruption. 

Version-Release number of selected component (if applicable):
4.2

How reproducible:
100% under a loaded system with new snapshots being generated

Steps to Reproduce:
1. load the system so that snapshot pruning is occuring

Actual results:
Potential for data corruption if the OSDs can act on the removed snapshot before the delta-sync completes. In upstream, it can lead to an assertion failure due to other bug fixes.

Expected results:
An in-use snapshot will not be removed.

Additional info:

Comment 12 Harish Munjulur 2021-06-10 12:12:18 UTC
Thanks for the comments Ilya will move to QA verified. 

QA did not see any backlogged snapshots while creating hundreds of snapshots and checking the status. Hence moving to Verified.

Comment 14 errata-xmlrpc 2021-06-15 17:13:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Ceph Storage 4.2 Security and Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2445