Bug 2210790

Summary: [rbd-mirror] : reusage of thick-provisioned image did not mirror properly despite copied snap id matches : snapshot-based mirroring doesn't propagate discards
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Vasishta <vashastr>
Component: RBD-MirrorAssignee: Ilya Dryomov <idryomov>
Status: NEW --- QA Contact: Sunil Angadi <sangadi>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.1CC: ceph-eng-bugs, cephqe-warriors, idryomov, jdurgin, nibalach, sangadi, tserlin
Target Milestone: ---   
Target Release: 9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vasishta 2023-05-29 13:52:33 UTC
Description of problem:

We have 2 clusters with two-way rbd mirroring configured between mirror_pool in both the clusters (mirror pool info and status from both sites - http://pastebin.test.redhat.com/1101267 )

I created a **thick-provisioned** 10G image - mirror_pool/cr4m_p9_b2_1 - on the site - 1d1a43ac-f3f5-11ed-9e26-b49691cee2a0
image info on both sites - http://pastebin.test.redhat.com/1101269

Mirroring was enabled on the image - snapshot based and snapshot schedule was added at image level for 3 minutes.
mirror image status at both clusters - http://pastebin.test.redhat.com/1101271

I mapped the image and created xfs on top of the image and wrote a 100 MB file.
Despite primary snapshot ID matching with copied secondary snapshot
there was a mismatch in rbd du at sites.
Primary
#  rbd du mirror_pool/cr4m_p9_b2_1  --debug-rbd 0
NAME          PROVISIONED  USED
cr4m_p9_b2_1       10 GiB  168 MiB
Secondary
~]#  rbd du mirror_pool/cr4m_p9_b2_1  --debug-rbd 0
NAME          PROVISIONED  USED
cr4m_p9_b2_1       10 GiB  10 GiB

Tried to export images at both clusters and observed that primary downloaded a 168 Mb file and secondary was trying to download 10G file

Version-Release number of selected component (if applicable):
 "rbd-mirror": {
        "ceph version 17.2.6-69.el9cp (d62b1a5d46b7355ca8b5056f78b7ebe3581e0d53) quincy (stable)": 1
    },

How reproducible:
Observed once

Steps to Reproduce:
(Clearly mentioned in description)

Actual results:
Deletions of data in primary images not reflected to secondary (?)

Expected results:
Mirroring should ensure that images in both primary and secondary are the same.

Additional info: