Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 2363632

Summary: Data unavailability on failover during in-complete mirror group snapshot process (N-1 behavior on group rollback)
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: aarsharm
Component: RBD-MirrorAssignee: Prasanna Kumar Kalever <prasanna.kalever>
Status: CLOSED UPSTREAM QA Contact: aarsharm
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 8.1CC: ceph-eng-bugs, cephqe-warriors, idryomov, mmurthy, prasanna.kalever
Target Milestone: ---   
Target Release: 9.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2374515 (view as bug list) Environment:
Last Closed: 2026-03-04 09:52:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description aarsharm 2025-05-02 08:18:54 UTC
Description of problem:
File Integrity compromised after failover triggered during in-complete snapshot operation.
The expectancy for consistency groups is to rollback to previous complete snapshot, but somehow i am not seeing that happen in failover.

Have tried 3 options, listed below:
Partial Group Replication for verifying consistency.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Option 1: Reverting from user/manual group snapshot to system group snapshot
1) Create 5 Images (2-Small each of size 1G, 3 large each of size 10G)
[ceph: root@tala001 /]#  rbd create image_small1 --size 1G --pool pool_1 --debug_rbd 0
 rbd create image_small2 --size 1G --pool pool_1 --debug_rbd 0
 rbd create image_large1 --size 10G --pool pool_1 --debug_rbd 0
 rbd create image_large2 --size 10G --pool pool_1 --debug_rbd 0
 rbd create image_large3 --size 10G --pool pool_1 --debug_rbd 0
[ceph: root@tala001 /]#

2) Create group
[ceph: root@tala001 /]# rbd group create --pool pool_1 --group group_1 --debug_rbd 0
[ceph: root@tala001 /]#

3) Add all images to group
[ceph: root@tala001 /]# rbd group image add --image image_small1 --group group_1 --group-pool pool_1 --image-pool pool_1 --debug_rbd 0
rbd group image add --image image_small2 --group group_1 --group-pool pool_1 --image-pool pool_1 --debug_rbd 0
rbd group image add --image image_large1 --group group_1 --group-pool pool_1 --image-pool pool_1 --debug_rbd 0
rbd group image add --image image_large2 --group group_1 --group-pool pool_1 --image-pool pool_1 --debug_rbd 0
rbd group image add --image image_large3 --group group_1 --group-pool pool_1 --image-pool pool_1 --debug_rbd 0
[ceph: root@tala001 /]#

4) Write 10% IO on the all images
[ceph: root@tala001 /]#  rbd bench --io-type write --io-threads 16 --io-total 100M --io-pattern rand --io-size 4096 pool_1/image_small1 --debug_rbd 0
 rbd bench --io-type write --io-threads 16 --io-total 100M --io-pattern rand --io-size 4096 pool_1/image_small2 --debug_rbd 0
 rbd bench --io-type write --io-threads 16 --io-total 1G --io-pattern rand --io-size 4096 pool_1/image_large1 --debug_rbd 0
 rbd bench --io-type write --io-threads 16 --io-total 1G --io-pattern rand --io-size 4096 pool_1/image_large2 --debug_rbd 0
 rbd bench --io-type write --io-threads 16 --io-total 1G --io-pattern rand --io-size 4096 pool_1/image_large3 --debug_rbd 0
bench  type write io_size 4096 io_threads 16 bytes 104857600 pattern random
  SEC       OPS   OPS/SEC   BYTES/SEC
elapsed: 0   ops: 25600   ops/sec: 63681.2   bytes/sec: 249 MiB/s
bench  type write io_size 4096 io_threads 16 bytes 104857600 pattern random
  SEC       OPS   OPS/SEC   BYTES/SEC
elapsed: 0   ops: 25600   ops/sec: 66320.9   bytes/sec: 259 MiB/s
bench  type write io_size 4096 io_threads 16 bytes 1073741824 pattern random
  SEC       OPS   OPS/SEC   BYTES/SEC
    1     59744   59819.5   234 MiB/s
    2    121728   60902.1   238 MiB/s
    3    176368   58813.9   230 MiB/s
    4    231040   57778.1   226 MiB/s
elapsed: 4   ops: 262144   ops/sec: 57651.7   bytes/sec: 225 MiB/s
bench  type write io_size 4096 io_threads 16 bytes 1073741824 pattern random
  SEC       OPS   OPS/SEC   BYTES/SEC
    1     52896   52964.7   207 MiB/s
    2    110848   55459.4   217 MiB/s
    3    168960   56343.8   220 MiB/s
    4    227040   56777.9   222 MiB/s
elapsed: 4   ops: 262144   ops/sec: 54375.1   bytes/sec: 212 MiB/s
bench  type write io_size 4096 io_threads 16 bytes 1073741824 pattern random
  SEC       OPS   OPS/SEC   BYTES/SEC
    1     50224     50290   196 MiB/s
    2    106928   53498.4   209 MiB/s
    3    162752   54273.8   212 MiB/s
    4    219200   54817.4   214 MiB/s
elapsed: 4   ops: 262144   ops/sec: 55118.3   bytes/sec: 215 MiB/s
[ceph: root@tala001 /]#

5) Enable mirroring on group
[ceph: root@tala001 /]# rbd mirror group enable --group group_1 --pool pool_1 --debug_rbd 0
Mirroring enabled
[ceph: root@tala001 /]#

6) Wait for system group snapshot to complete on site-b
site-a:
[ceph: root@tala001 /]# rbd mirror group status pool_1/group_1 --debug_rbd 0 | grep "syncing" | wc
      0       0       0
[ceph: root@tala001 /]#
Site-b:
[ceph: root@tala005 /]# rbd mirror group status pool_1/group_1 --debug_rbd 0 | grep "syncing" | wc
      0       0       0
[ceph: root@tala005 /]#

7) Check md5sum of all images on both site-a and site-b should match
Site-a:
[ceph: root@tala001 /]# rbd export pool_1/image_small1 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_1/image_small2 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_1/image_large1 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_1/image_large2 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_1/image_large3 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt
Exporting image: 100% complete...done.
b60925d09bf1896ef9603ed7bd2b690f  file1.txt
Exporting image: 100% complete...done.
cb9569fe9df63eda7b0c46e6c9e4e526  file1.txt
Exporting image: 100% complete...done.
ccc1b513e7d67c510ad0b08f73ac9d35  file1.txt
Exporting image: 100% complete...done.
43c7e57673fe2c86a4b7b200cf3090f5  file1.txt
Exporting image: 100% complete...done.
4c8ce577e80528d47a96832f61c2288a  file1.txt
[ceph: root@tala001 /]#


site-b:
[ceph: root@tala005 /]# rbd export pool_1/image_small1 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_1/image_small2 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_1/image_large1 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_1/image_large2 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_1/image_large3 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt
Exporting image: 100% complete...done.
b60925d09bf1896ef9603ed7bd2b690f  file1.txt
Exporting image: 100% complete...done.
cb9569fe9df63eda7b0c46e6c9e4e526  file1.txt
Exporting image: 100% complete...done.
ccc1b513e7d67c510ad0b08f73ac9d35  file1.txt
Exporting image: 100% complete...done.
43c7e57673fe2c86a4b7b200cf3090f5  file1.txt
Exporting image: 100% complete...done.
4c8ce577e80528d47a96832f61c2288a  file1.txt
[ceph: root@tala005 /]# 


8) Write 90% of the data on site-a
[ceph: root@tala001 /]#  rbd bench --io-type write --io-threads 16 --io-total 900M --io-pattern rand --io-size 4096 pool_1/image_small1 --debug_rbd 0
 rbd bench --io-type write --io-threads 16 --io-total 900M --io-pattern rand --io-size 4096 pool_1/image_small2 --debug_rbd 0
 rbd bench --io-type write --io-threads 16 --io-total 9G --io-pattern rand --io-size 4096 pool_1/image_large1 --debug_rbd 0
 rbd bench --io-type write --io-threads 16 --io-total 9G --io-pattern rand --io-size 4096 pool_1/image_large2 --debug_rbd 0
 rbd bench --io-type write --io-threads 16 --io-total 9G --io-pattern rand --io-size 4096 pool_1/image_large3 --debug_rbd 0
bench  type write io_size 4096 io_threads 16 bytes 943718400 pattern random
  SEC       OPS   OPS/SEC   BYTES/SEC
    1     52624   52692.4   206 MiB/s
    2    104672   52369.9   205 MiB/s
    3    157280   52449.2   205 MiB/s
    4    213648     53429   209 MiB/s
elapsed: 4   ops: 230400   ops/sec: 53382.4   bytes/sec: 209 MiB/s
bench  type write io_size 4096 io_threads 16 bytes 943718400 pattern random
  SEC       OPS   OPS/SEC   BYTES/SEC
    1     55328   55399.1   216 MiB/s
    2    107664   53866.6   210 MiB/s
    3    160544   53537.5   209 MiB/s
    4    215584   53913.2   211 MiB/s
elapsed: 4   ops: 230400   ops/sec: 53781.2   bytes/sec: 210 MiB/s
bench  type write io_size 4096 io_threads 16 bytes 9663676416 pattern random
  SEC       OPS   OPS/SEC   BYTES/SEC
    1     48128   48191.9   188 MiB/s
    2    103280   51673.5   202 MiB/s
    3    158176   52747.9   206 MiB/s
    4    230176   57562.1   225 MiB/s
    5    308784     61772   241 MiB/s
    6    363328   63039.6   246 MiB/s
    7    420192     63382   248 MiB/s
    8    477360   63836.4   249 MiB/s
    9    533360   60636.4   237 MiB/s
   10    588976   56038.1   219 MiB/s
   11    646560   56646.1   221 MiB/s
   12    703440   56649.3   221 MiB/s
   13    759520   56431.7   220 MiB/s
   14    814736   56274.9   220 MiB/s
   15    872000   56604.5   221 MiB/s
   16    929008   56489.3   221 MiB/s
   17    985248   56361.3   220 MiB/s
   18   1040816   56258.9   220 MiB/s
   19   1096672   56386.9   220 MiB/s
   20   1152400   56079.7   219 MiB/s
   21   1206016   55401.3   216 MiB/s
   22   1259584   54866.9   214 MiB/s
   23   1312432   54322.9   212 MiB/s
   24   1367632   54191.7   212 MiB/s
   25   1421552   53830.1   210 MiB/s
   26   1476256   54047.7   211 MiB/s
   27   1531952   54473.3   213 MiB/s
   28   1587904   55094.1   215 MiB/s
   29   1643312   55135.7   215 MiB/s
   30   1698560   55401.3   216 MiB/s
   31   1753360   55420.5   216 MiB/s
   32   1808624   55334.1   216 MiB/s
   33   1861024   54623.7   213 MiB/s
   34   1914144   54166.1   212 MiB/s
   35   1967744   53836.5   210 MiB/s
   36   2023392   54006.1   211 MiB/s
   37   2078192   53913.3   211 MiB/s
   38   2134096   54614.1   213 MiB/s
   39   2189664   55103.7   215 MiB/s
   40   2244272   55305.3   216 MiB/s
   41   2297952   54911.7   214 MiB/s
   42   2351584   54678.1   214 MiB/s
elapsed: 42   ops: 2359296   ops/sec: 55992.1   bytes/sec: 219 MiB/s
bench  type write io_size 4096 io_threads 16 bytes 9663676416 pattern random
  SEC       OPS   OPS/SEC   BYTES/SEC
    1     64608   64688.3   253 MiB/s
    2    130096   65088.2   254 MiB/s
    3    190080   63386.1   248 MiB/s
    4    252576   63163.4   247 MiB/s
    5    326448   65305.5   255 MiB/s
    6    396352   66348.4   259 MiB/s
    7    450816   64143.6   251 MiB/s
    8    505680   63119.6   247 MiB/s
    9    562240   61932.4   242 MiB/s
   10    616720   58054.1   227 MiB/s
   11    672288   55186.9   216 MiB/s
   12    727968   55430.1   217 MiB/s
   13    784480   55759.7   218 MiB/s
   14    841280   55807.7   218 MiB/s
   15    898176   56290.9   220 MiB/s
   16    956032   56748.5   222 MiB/s
   17   1013424   57090.9   223 MiB/s
   18   1071440   57391.7   224 MiB/s
   19   1128784   57500.5   225 MiB/s
   20   1186144   57593.3   225 MiB/s
   21   1259360   60665.2   237 MiB/s
   22   1341680   65650.8   256 MiB/s
   23   1415168   68745.2   269 MiB/s
   24   1498992   74041.2   289 MiB/s
   25   1583344   79439.5   310 MiB/s
   26   1665584   81244.3   317 MiB/s
   27   1740688   79801.1   312 MiB/s
   28   1815248   80015.5   313 MiB/s
   29   1887296   77660.3   303 MiB/s
   30   1963520   76034.8   297 MiB/s
   31   2047760   76434.8   299 MiB/s
   32   2126208   77103.5   301 MiB/s
   33   2201728   77295.5   302 MiB/s
   34   2281968   78933.9   308 MiB/s
elapsed: 35   ops: 2359296   ops/sec: 67252.4   bytes/sec: 263 MiB/s
bench  type write io_size 4096 io_threads 16 bytes 9663676416 pattern random
  SEC       OPS   OPS/SEC   BYTES/SEC
    1     59424   59499.2   232 MiB/s
    2    131872   65976.6   258 MiB/s
    3    206256   68779.9   269 MiB/s
    4    284000   71021.3   277 MiB/s
    5    361872   72391.7   283 MiB/s
    6    439136     75942   297 MiB/s
    7    516496   76924.3   300 MiB/s
    8    598272   78402.7   306 MiB/s
    9    678592   78917.9   308 MiB/s
   10    748208   77266.7   302 MiB/s
   11    812816   74735.6   292 MiB/s
   12    881936   73087.6   285 MiB/s
   13    947792   69903.6   273 MiB/s
   14   1010944     66470   260 MiB/s
   15   1075872   65532.4   256 MiB/s
   16   1141376   65711.6   257 MiB/s
   17   1208304   65273.2   255 MiB/s
   18   1275376   65516.4   256 MiB/s
   19   1342224   66255.6   259 MiB/s
   20   1405984     66022   258 MiB/s
   21   1468864   65497.2   256 MiB/s
   22   1529216     64182   251 MiB/s
   23   1599440   64812.4   253 MiB/s
   24   1672800   66114.8   258 MiB/s
   25   1746288   68060.4   266 MiB/s
   26   1820112   70249.2   274 MiB/s
   27   1893232   72802.8   284 MiB/s
   28   1965600   73231.6   286 MiB/s
   29   2036240   72687.6   284 MiB/s
   30   2109280     72598   284 MiB/s
   31   2177328   71442.8   279 MiB/s
   32   2251104     71574   280 MiB/s
   33   2325984   72076.4   282 MiB/s
elapsed: 33   ops: 2359296   ops/sec: 70548.5   bytes/sec: 276 MiB/s
[ceph: root@tala001 /]#

9) Check md5sum of all files on site-a
Site-a:
[ceph: root@tala001 /]# rbd export pool_1/image_small1 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_1/image_small2 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_1/image_large1 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_1/image_large2 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_1/image_large3 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt
Exporting image: 100% complete...done.
f5f9c66265ce628301c9028755684f9e  file1.txt
Exporting image: 100% complete...done.
7222ade8d98194a1f2e6d34dbf420432  file1.txt
Exporting image: 100% complete...done.
58c591442230893c36e3525435a18951  file1.txt
Exporting image: 100% complete...done.
8c2eca3b46c1641450bf9adc51b7bb86  file1.txt
Exporting image: 100% complete...done.
122c2e7b6aefb73fd86442d57c2718ad  file1.txt
[ceph: root@tala001 /]#

10) Create manual mirror group snapshot
[ceph: root@tala001 /]# rbd group snap list --group group_1 --pool pool_1 --debug_rbd 0
ID             NAME                                                                STATE     NAMESPACE
2aee8a5b76b4b  .mirror.primary.fc12c7f4-fe10-4a20-96b4-73ec95410ee2.2aee8a5b76b4b  complete  mirror (primary peer_uuids:[5738c991-3a0c-49b8-9174-6d7e60ae2719])
[ceph: root@tala001 /]# rbd mirror group snapshot -p pool_1 --group group_1 --debug_rbd 0
Snapshot ID: 2af60e0a4263a
[ceph: root@tala001 /]# rbd group snap list --group group_1 --pool pool_1 --debug_rbd 0
ID             NAME                                                                STATE     NAMESPACE
2aee8a5b76b4b  .mirror.primary.fc12c7f4-fe10-4a20-96b4-73ec95410ee2.2aee8a5b76b4b  complete  mirror (primary peer_uuids:[5738c991-3a0c-49b8-9174-6d7e60ae2719])
2af60e0a4263a  .mirror.primary.fc12c7f4-fe10-4a20-96b4-73ec95410ee2.2af60e0a4263a  complete  mirror (primary peer_uuids:[5738c991-3a0c-49b8-9174-6d7e60ae2719])
[ceph: root@tala001 /]#

11) When smaller 2 files gets synced to site-b. Check md5sum of both smaller files on site-b (it should match with md5sum of step-9)
[ceph: root@tala001 /]# rbd mirror group status pool_1/group_1 --debug_rbd 0 | grep "syncing" | wc
      5      15    1612
[ceph: root@tala001 /]# rbd mirror group status pool_1/group_1 --debug_rbd 0 | grep "syncing" | wc
      5      15    1612
[ceph: root@tala001 /]# rbd mirror group status pool_1/group_1 --debug_rbd 0 | grep "syncing" | wc
      5      15    1612
[ceph: root@tala001 /]# rbd mirror group status pool_1/group_1 --debug_rbd 0 | grep "syncing" | wc
      5      15    1612
[ceph: root@tala001 /]# rbd mirror group status pool_1/group_1 --debug_rbd 0 | grep "syncing" | wc
      5      15    1612
[ceph: root@tala001 /]# rbd mirror group status pool_1/group_1 --debug_rbd 0 | grep "syncing" | wc
      5      15    1612
[ceph: root@tala001 /]# rbd mirror group status pool_1/group_1 --debug_rbd 0 | grep "syncing" | wc
      5      15    1612
[ceph: root@tala001 /]# rbd mirror group status pool_1/group_1 --debug_rbd 0 | grep "syncing" | wc
      3       9    1053 <<<<<<<<<<< Here 2 smaller files are synced that's why count has changed to 3 from 5
[ceph: root@tala001 /]#

site-b:
[ceph: root@tala005 /]# rbd export pool_1/image_small1 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_1/image_small2 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

Exporting image: 100% complete...done.
f5f9c66265ce628301c9028755684f9e  file1.txt
Exporting image: 100% complete...done.
7222ade8d98194a1f2e6d34dbf420432  file1.txt
[ceph: root@tala005 /]#

12) While larger images are still in progress, do force promote on site-b
[ceph: root@tala005 /]# rbd group snap list --pool pool_1 --group group_1 --debug_rbd 0
ID             NAME                                                                    STATE       NAMESPACE                                                       
2aee8a5b76b4b  .mirror.non-primary.fc12c7f4-fe10-4a20-96b4-73ec95410ee2.2aee8a5b76b4b    complete  mirror (non-primary peer_uuids:[] ee75d1c2-4d1e-4915-be63-4f3d8dd36a53:2aee8a5b76b4b)
2af60e0a4263a  .mirror.non-primary.fc12c7f4-fe10-4a20-96b4-73ec95410ee2.2af60e0a4263a  incomplete  mirror (non-primary peer_uuids:[] ee75d1c2-4d1e-4915-be63-4f3d8dd36a53:2af60e0a4263a)
[ceph: root@tala005 /]# rbd mirror group promote --pool pool_1 --group group_1 --force --debug_rbd 0
Group promoted to primary
[ceph: root@tala005 /]# rbd mirror group status pool_1/group_1 --debug_rbd 0
group_1:
  global_id:   fc12c7f4-fe10-4a20-96b4-73ec95410ee2
  state:       up+stopped
  description: local group is primary
  service:     tala005.jwqeia on tala005
  last_update: 2025-04-30 12:17:05
  images:
    image:       12/2e595353-56f9-4e59-bb9e-97e42d0f5a4c
    state:       up+stopped
    description: local image is primary

    image:       12/52309cf9-d56a-490e-bef2-837fb398f40d
    state:       up+stopped
    description: local image is primary

    image:       12/6febbdf6-e56e-40e3-8117-769a402e0d0d
    state:       up+stopped
    description: local image is primary

    image:       12/856a6b7d-6b6c-4ef7-b6fb-e561f6ff1d0e
    state:       up+stopped
    description: local image is primary

    image:       12/febbcc3e-71d7-4ec1-81b8-3a3c74a91c9d
    state:       up+stopped
    description: local image is primary
  peer_sites:
    name: site-a
    state: up+stopped
    description: local group is primary
    last_update: 2025-04-30 12:17:09
    images:
      image:       12/2e595353-56f9-4e59-bb9e-97e42d0f5a4c
      state:       up+stopped
      description: local image is primary

      image:       12/52309cf9-d56a-490e-bef2-837fb398f40d
      state:       up+stopped
      description: local image is primary

      image:       12/6febbdf6-e56e-40e3-8117-769a402e0d0d
      state:       up+stopped
      description: local image is primary

      image:       12/856a6b7d-6b6c-4ef7-b6fb-e561f6ff1d0e
      state:       up+stopped
      description: local image is primary

      image:       12/febbcc3e-71d7-4ec1-81b8-3a3c74a91c9d
      state:       up+stopped
      description: local image is primary
  snapshots:
    .mirror.primary.fc12c7f4-fe10-4a20-96b4-73ec95410ee2.1cb6fa778e08e
[ceph: root@tala005 /]#

13) check md5sum on site-b for all 5 files. (The expectancy is it should roll back to the consistent point i.e. Step-7)
site-b:  (md5sum coming of empty files)
[ceph: root@tala005 /]# rbd export pool_1/image_small1 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_1/image_small2 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_1/image_large1 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_1/image_large2 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_1/image_large3 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt
Exporting image: 100% complete...done.
cd573cfaace07e7949bc0c46028904ff  file1.txt
Exporting image: 100% complete...done.
cd573cfaace07e7949bc0c46028904ff  file1.txt
Exporting image: 100% complete...done.
2dd26c4d4799ebd29fa31e48d49e8e53  file1.txt
Exporting image: 100% complete...done.
2dd26c4d4799ebd29fa31e48d49e8e53  file1.txt
Exporting image: 100% complete...done.
2dd26c4d4799ebd29fa31e48d49e8e53  file1.txt
[ceph: root@tala005 /]#

Here I am seeing the md5sum of both smaller files are i.e "cd573cfaace07e7949bc0c46028904ff" and larger files as "2dd26c4d4799ebd29fa31e48d49e8e53", mostly this md5sum are of empty files. But I am expecting it to have md5sum of all 5 files similar to step-7.


XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXx

Option2: Reverting from user/manual group snapshot to another user/manual group snapshot (Group mirror enable on files having data)
1) Create 5 Images (2-Small each of size 1G, 3 large each of size 10G)
 rbd create image_small2 --size 1G --pool pool_2 --debug_rbd 0
 rbd create image_large1 --size 10G --pool pool_2 --debug_rbd 0
 rbd create image_large2 --size 10G --pool pool_2 --debug_rbd 0
 rbd create image_large3 --size 10G --pool pool_2 --debug_rbd 0

2) Create group
[ceph: root@tala001 /]# rbd group create --pool pool_2 --group group_1 --debug_rbd 0

3) Add all images to group
[ceph: root@tala001 /]# rbd group image add --image image_small1 --group group_1 --group-pool pool_2 --image-pool pool_2 --debug_rbd 0
rbd group image add --image image_small2 --group group_1 --group-pool pool_2 --image-pool pool_2 --debug_rbd 0
rbd group image add --image image_large1 --group group_1 --group-pool pool_2 --image-pool pool_2 --debug_rbd 0
rbd group image add --image image_large2 --group group_1 --group-pool pool_2 --image-pool pool_2 --debug_rbd 0
rbd group image add --image image_large3 --group group_1 --group-pool pool_2 --image-pool pool_2 --debug_rbd 0
[ceph: root@tala001 /]#

4) Write 10% IO on the all images
 [ceph: root@tala001 /]#  rbd bench --io-type write --io-threads 16 --io-total 100M --io-pattern rand --io-size 4096 pool_2/image_small1 --debug_rbd 0
 rbd bench --io-type write --io-threads 16 --io-total 100M --io-pattern rand --io-size 4096 pool_2/image_small2 --debug_rbd 0
 rbd bench --io-type write --io-threads 16 --io-total 1G --io-pattern rand --io-size 4096 pool_2/image_large1 --debug_rbd 0
 rbd bench --io-type write --io-threads 16 --io-total 1G --io-pattern rand --io-size 4096 pool_2/image_large2 --debug_rbd 0
 rbd bench --io-type write --io-threads 16 --io-total 1G --io-pattern rand --io-size 4096 pool_2/image_large3 --debug_rbd 0
bench  type write io_size 4096 io_threads 16 bytes 104857600 pattern random
  SEC       OPS   OPS/SEC   BYTES/SEC
elapsed: 0   ops: 25600   ops/sec: 54935.3   bytes/sec: 215 MiB/s
bench  type write io_size 4096 io_threads 16 bytes 104857600 pattern random
  SEC       OPS   OPS/SEC   BYTES/SEC
elapsed: 0   ops: 25600   ops/sec: 66149.5   bytes/sec: 258 MiB/s
bench  type write io_size 4096 io_threads 16 bytes 1073741824 pattern random
  SEC       OPS   OPS/SEC   BYTES/SEC
    1     50064   50129.8   196 MiB/s
    2    121840   60958.1   238 MiB/s
    3    193360   64479.8   252 MiB/s
    4    253328   63351.5   247 MiB/s
elapsed: 4   ops: 262144   ops/sec: 63106   bytes/sec: 247 MiB/s
bench  type write io_size 4096 io_threads 16 bytes 1073741824 pattern random
  SEC       OPS   OPS/SEC   BYTES/SEC
    1     52960   53028.7   207 MiB/s
    2    109488   54779.1   214 MiB/s
    3    163984   54684.6   214 MiB/s
    4    218768   54709.4   214 MiB/s
elapsed: 4   ops: 262144   ops/sec: 54945   bytes/sec: 215 MiB/s
bench  type write io_size 4096 io_threads 16 bytes 1073741824 pattern random
  SEC       OPS   OPS/SEC   BYTES/SEC
    1     54640   54710.4   214 MiB/s
    2    134016   67049.1   262 MiB/s
    3    203696   67926.2   265 MiB/s
elapsed: 3   ops: 262144   ops/sec: 65666.9   bytes/sec: 257 MiB/s
[ceph: root@tala001 /]#

5) Enable mirroring on group 
[ceph: root@tala001 /]# rbd mirror group enable --group group_1 --pool pool_2 --debug_rbd 0
Mirroring enabled
[ceph: root@tala001 /]#

6) Wait for system group snapshot to complete on site-b
site-a:
[ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc
      0       0       0
[ceph: root@tala001 /]#
site-b:
[ceph: root@tala005 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc
      0       0       0
[ceph: root@tala005 /]#

7) Check md5sum should match on both sites for all files
site-a:
[ceph: root@tala001 /]# rbd export pool_2/image_small1 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_2/image_small2 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_2/image_large1 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_2/image_large2 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_2/image_large3 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt
Exporting image: 100% complete...done.
b766f3ffb18a40013f99ee3a5f274c4e  file1.txt
Exporting image: 100% complete...done.
3c9f58793a78877dd0a024d451358edc  file1.txt
Exporting image: 100% complete...done.
e31145888d2505bb64af72ef83eb41ea  file1.txt
Exporting image: 100% complete...done.
90a481e5cb78972495b7dd2061146ce5  file1.txt
Exporting image: 100% complete...done.
67873822f8af199145c221633b14cd3b  file1.txt
[ceph: root@tala001 /]# 

Site-b:
[ceph: root@tala005 /]# rbd export pool_2/image_small1 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_2/image_small2 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_2/image_large1 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_2/image_large2 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_2/image_large3 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt
Exporting image: 100% complete...done.
b766f3ffb18a40013f99ee3a5f274c4e  file1.txt
Exporting image: 100% complete...done.
3c9f58793a78877dd0a024d451358edc  file1.txt
Exporting image: 100% complete...done.
e31145888d2505bb64af72ef83eb41ea  file1.txt
Exporting image: 100% complete...done.
90a481e5cb78972495b7dd2061146ce5  file1.txt
Exporting image: 100% complete...done.
67873822f8af199145c221633b14cd3b  file1.txt
[ceph: root@tala005 /]#

8) Write 10% more data to all images
 [ceph: root@tala001 /]#  rbd bench --io-type write --io-threads 16 --io-total 100M --io-pattern rand --io-size 4096 pool_2/image_small1 --debug_rbd 0
 rbd bench --io-type write --io-threads 16 --io-total 100M --io-pattern rand --io-size 4096 pool_2/image_small2 --debug_rbd 0
 rbd bench --io-type write --io-threads 16 --io-total 1G --io-pattern rand --io-size 4096 pool_2/image_large1 --debug_rbd 0
 rbd bench --io-type write --io-threads 16 --io-total 1G --io-pattern rand --io-size 4096 pool_2/image_large2 --debug_rbd 0
 rbd bench --io-type write --io-threads 16 --io-total 1G --io-pattern rand --io-size 4096 pool_2/image_large3 --debug_rbd 0
bench  type write io_size 4096 io_threads 16 bytes 104857600 pattern random
  SEC       OPS   OPS/SEC   BYTES/SEC
elapsed: 0   ops: 25600   ops/sec: 55053.4   bytes/sec: 215 MiB/s
bench  type write io_size 4096 io_threads 16 bytes 104857600 pattern random
  SEC       OPS   OPS/SEC   BYTES/SEC
elapsed: 0   ops: 25600   ops/sec: 59396.4   bytes/sec: 232 MiB/s
bench  type write io_size 4096 io_threads 16 bytes 1073741824 pattern random
  SEC       OPS   OPS/SEC   BYTES/SEC
    1     51360   51427.1   201 MiB/s
    2    116816   58444.9   228 MiB/s
    3    170384   56818.6   222 MiB/s
    4    224288   56089.7   219 MiB/s
elapsed: 4   ops: 262144   ops/sec: 55775   bytes/sec: 218 MiB/s
bench  type write io_size 4096 io_threads 16 bytes 1073741824 pattern random
  SEC       OPS   OPS/SEC   BYTES/SEC
    1     47120   47182.9   184 MiB/s
    2    109360     54715   214 MiB/s
    3    163408   54492.5   213 MiB/s
    4    218096   54541.3   213 MiB/s
elapsed: 4   ops: 262144   ops/sec: 55456.4   bytes/sec: 217 MiB/s
bench  type write io_size 4096 io_threads 16 bytes 1073741824 pattern random
  SEC       OPS   OPS/SEC   BYTES/SEC
    1     50832   50898.6   199 MiB/s
    2    106112   53090.2   207 MiB/s
    3    161136   53734.9   210 MiB/s
    4    216256   54081.2   211 MiB/s
elapsed: 4   ops: 262144   ops/sec: 54128.1   bytes/sec: 211 MiB/s
[ceph: root@tala001 /]#

9) Take manual group snapshot and wait for snapshot to complete on site-b
[ceph: root@tala001 /]# rbd group snap list --group group_1 --pool pool_2 --debug_rbd 0
ID             NAME                                                                STATE     NAMESPACE
2b0622ae17422  .mirror.primary.04cbe7ba-c59a-4b86-b0b6-94c4bf07cd1d.2b0622ae17422  complete  mirror (primary peer_uuids:[e41d994b-5428-4a4c-a96f-e7c91a1a270b])
[ceph: root@tala001 /]# rbd mirror group snapshot -p pool_2 --group group_1 --debug_rbd 0
Snapshot ID: 2b0d4c69ab36a
[ceph: root@tala001 /]# rbd group snap list --group group_1 --pool pool_2 --debug_rbd 0
ID             NAME                                                                STATE     NAMESPACE
2b0622ae17422  .mirror.primary.04cbe7ba-c59a-4b86-b0b6-94c4bf07cd1d.2b0622ae17422  complete  mirror (primary peer_uuids:[e41d994b-5428-4a4c-a96f-e7c91a1a270b])
2b0d4c69ab36a  .mirror.primary.04cbe7ba-c59a-4b86-b0b6-94c4bf07cd1d.2b0d4c69ab36a  complete  mirror (primary peer_uuids:[e41d994b-5428-4a4c-a96f-e7c91a1a270b])
[ceph: root@tala001 /]# Wait for snapshot to complete
[ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc
      0       0       0
[ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc
      5      15    1613
[ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc
      5      15    1613
[ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc
      5      15    1613
[ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc
      5      15    1613
[ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc
      5      15    1613
[ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc
      5      15    1613
[ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc
      3       9    1051
[ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc
      3       9    1051
[ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc
      3       9    1051
[ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc
      0       0       0
[ceph: root@tala001 /]#
[ceph: root@tala005 /]# rbd group snap list --group group_1 --pool pool_2 --debug_rbd 0
ID             NAME                                                                    STATE     NAMESPACE                                                         
2b0622ae17422  .mirror.non-primary.04cbe7ba-c59a-4b86-b0b6-94c4bf07cd1d.2b0622ae17422  complete  mirror (non-primary peer_uuids:[] f47a67ba-9c21-4d6d-9e07-31917c44f3b9:2b0622ae17422)
2b0d4c69ab36a  .mirror.non-primary.04cbe7ba-c59a-4b86-b0b6-94c4bf07cd1d.2b0d4c69ab36a  complete  mirror (non-primary peer_uuids:[] f47a67ba-9c21-4d6d-9e07-31917c44f3b9:2b0d4c69ab36a)

10) Calculate md5sum of all files on site-a and site-b (Should match)
Site-a:
[ceph: root@tala001 /]# rbd export pool_2/image_small1 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_2/image_small2 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_2/image_large1 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_2/image_large2 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_2/image_large3 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt
Exporting image: 100% complete...done.
7bc9b378bc8f719570345ff414da70ff  file1.txt
Exporting image: 100% complete...done.
56c78f46fa48d3889797feead6b85eed  file1.txt
Exporting image: 100% complete...done.
c7106d377219338d0c401eb59178e87a  file1.txt
Exporting image: 100% complete...done.
709a56ea0c3b9530136a51bdf2cd29a2  file1.txt
Exporting image: 100% complete...done.
a91243dfa1165c49e9cd3dc316547f15  file1.txt
[ceph: root@tala001 /]#

Site-b:
[ceph: root@tala005 /]# rbd export pool_2/image_small1 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_2/image_small2 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_2/image_large1 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_2/image_large2 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_2/image_large3 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt
Exporting image: 100% complete...done.
7bc9b378bc8f719570345ff414da70ff  file1.txt
Exporting image: 100% complete...done.
56c78f46fa48d3889797feead6b85eed  file1.txt
Exporting image: 100% complete...done.
c7106d377219338d0c401eb59178e87a  file1.txt
Exporting image: 100% complete...done.
709a56ea0c3b9530136a51bdf2cd29a2  file1.txt
Exporting image: 100% complete...done.
a91243dfa1165c49e9cd3dc316547f15  file1.txt
[ceph: root@tala005 /]#

11) Write 80% of the data
[ceph: root@tala001 /]#  rbd bench --io-type write --io-threads 16 --io-total 800M --io-pattern rand --io-size 4096 pool_2/image_small1 --debug_rbd 0
 rbd bench --io-type write --io-threads 16 --io-total 800M --io-pattern rand --io-size 4096 pool_2/image_small2 --debug_rbd 0
 rbd bench --io-type write --io-threads 16 --io-total 8G --io-pattern rand --io-size 4096 pool_2/image_large1 --debug_rbd 0
 rbd bench --io-type write --io-threads 16 --io-total 8G --io-pattern rand --io-size 4096 pool_2/image_large2 --debug_rbd 0
 rbd bench --io-type write --io-threads 16 --io-total 8G --io-pattern rand --io-size 4096 pool_2/image_large3 --debug_rbd 0
bench  type write io_size 4096 io_threads 16 bytes 838860800 pattern random
  SEC       OPS   OPS/SEC   BYTES/SEC
    1     64944   65024.6   254 MiB/s
    2    129088   64583.9   252 MiB/s
    3    195728   65269.4   255 MiB/s
elapsed: 3   ops: 204800   ops/sec: 65077.5   bytes/sec: 254 MiB/s
bench  type write io_size 4096 io_threads 16 bytes 838860800 pattern random
  SEC       OPS   OPS/SEC   BYTES/SEC
    1     63904   63983.6   250 MiB/s
    2    125888   62983.1   246 MiB/s
    3    190592   63556.8   248 MiB/s
elapsed: 3   ops: 204800   ops/sec: 63523.2   bytes/sec: 248 MiB/s
bench  type write io_size 4096 io_threads 16 bytes 8589934592 pattern random
  SEC       OPS   OPS/SEC   BYTES/SEC
    1     46144   46205.9   180 MiB/s
    2    114640   57356.3   224 MiB/s
    3    182416   60830.6   238 MiB/s
    4    241760   60458.8   236 MiB/s
    5    296544   59323.5   232 MiB/s
    6    350880   60946.8   238 MiB/s
    7    406592   58390.1   228 MiB/s
    8    473008   58118.1   227 MiB/s
    9    542864   60220.4   235 MiB/s
   10    609360   62562.8   244 MiB/s
   11    678320   65487.6   256 MiB/s
   12    745552   67791.6   265 MiB/s
   13    815072   68412.4   267 MiB/s
   14    885648   68556.4   268 MiB/s
   15    953312     68790   269 MiB/s
   16   1020720   68479.6   267 MiB/s
   17   1088528   68594.8   268 MiB/s
   18   1155888   68162.8   266 MiB/s
   19   1223680     67606   264 MiB/s
   20   1290688   67474.8   264 MiB/s
   21   1356832     67222   263 MiB/s
   22   1424416   67177.2   262 MiB/s
   23   1492016   67225.2   263 MiB/s
   24   1560768   67417.2   263 MiB/s
   25   1631600     68182   266 MiB/s
   26   1708832   70385.5   275 MiB/s
   27   1787376   72591.6   284 MiB/s
   28   1865744   74745.2   292 MiB/s
   29   1940960   76037.9   297 MiB/s
   30   2019248   77529.1   303 MiB/s
elapsed: 31   ops: 2097152   ops/sec: 67647.5   bytes/sec: 264 MiB/s
bench  type write io_size 4096 io_threads 16 bytes 8589934592 pattern random
  SEC       OPS   OPS/SEC   BYTES/SEC
    1     49616   49681.4   194 MiB/s
    2    107904   53986.7   211 MiB/s
    3    172400   57490.8   225 MiB/s
    4    231040   57778.1   226 MiB/s
    5    286960   57406.3   224 MiB/s
    6    343024   58681.3   229 MiB/s
    7    401056   58630.1   229 MiB/s
    8    459632   57446.1   224 MiB/s
    9    518272   57446.1   224 MiB/s
   10    577120   58031.7   227 MiB/s
   11    638816     59158   231 MiB/s
   12    706672   61122.8   239 MiB/s
   13    774368   62946.8   246 MiB/s
   14    843168   64978.8   254 MiB/s
   15    911920   66959.6   262 MiB/s
   16    980736   68383.6   267 MiB/s
   17   1052224     69110   270 MiB/s
   18   1124464   70018.8   274 MiB/s
   19   1196496   70665.2   276 MiB/s
   20   1268288   71273.2   278 MiB/s
   21   1339792   71810.8   281 MiB/s
   22   1410880   71730.8   280 MiB/s
   23   1479072   70921.2   277 MiB/s
   24   1549968     70694   276 MiB/s
   25   1620576   70457.2   275 MiB/s
   26   1691360   70313.2   275 MiB/s
   27   1761360   70095.6   274 MiB/s
   28   1826112   69407.6   271 MiB/s
   29   1891424   68290.8   267 MiB/s
   30   1959328     67750   265 MiB/s
   31   2026784   67084.4   262 MiB/s
   32   2094448   66617.2   260 MiB/s
elapsed: 32   ops: 2097152   ops/sec: 65451.8   bytes/sec: 256 MiB/s
bench  type write io_size 4096 io_threads 16 bytes 8589934592 pattern random
  SEC       OPS   OPS/SEC   BYTES/SEC
    1     57760   57833.5   226 MiB/s
    2    125536     62807   245 MiB/s
    3    181248   60441.1   236 MiB/s
    4    239104   59794.6   234 MiB/s
    5    294416   58897.8   230 MiB/s
    6    349904   58428.5   228 MiB/s
    7    408896   56671.7   221 MiB/s
    8    476720     59094   231 MiB/s
    9    543792   60937.2   238 MiB/s
   10    611696   63455.6   248 MiB/s
   11    678768   65772.4   257 MiB/s
   12    749728     68166   266 MiB/s
   13    823680   69391.6   271 MiB/s
   14    898288   70898.8   277 MiB/s
   15    970544   71769.2   280 MiB/s
   16   1045216   73289.2   286 MiB/s
   17   1119392   73932.4   289 MiB/s
   18   1189616   73186.8   286 MiB/s
   19   1261104   72562.8   283 MiB/s
   20   1333712   72633.2   284 MiB/s
   21   1406800   72316.4   282 MiB/s
   22   1480528   72226.8   282 MiB/s
   23   1556112   73298.8   286 MiB/s
   24   1631280   74034.8   289 MiB/s
   25   1706080   74473.2   291 MiB/s
   26   1779504   74540.4   291 MiB/s
   27   1854704   74834.8   292 MiB/s
   28   1929392   74655.6   292 MiB/s
   29   2005104   74764.4   292 MiB/s
   30   2079760   74735.6   292 MiB/s
elapsed: 30   ops: 2097152   ops/sec: 69363.6   bytes/sec: 271 MiB/s
[ceph: root@tala001 /]#

12) Take manual group snapshot
[ceph: root@tala001 /]# rbd group snap list --group group_1 --pool pool_2 --debug_rbd 0
ID             NAME                                                                STATE     NAMESPACE
2b0622ae17422  .mirror.primary.04cbe7ba-c59a-4b86-b0b6-94c4bf07cd1d.2b0622ae17422  complete  mirror (primary peer_uuids:[])
2b0d4c69ab36a  .mirror.primary.04cbe7ba-c59a-4b86-b0b6-94c4bf07cd1d.2b0d4c69ab36a  complete  mirror (primary peer_uuids:[e41d994b-5428-4a4c-a96f-e7c91a1a270b])
[ceph: root@tala001 /]# rbd mirror group snapshot -p pool_2 --group group_1 --debug_rbd 0
Snapshot ID: 2b164e0c03694
[ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc
      0       0       0
[ceph: root@tala001 /]# rbd group snap list --group group_1 --pool pool_2 --debug_rbd 0
ID             NAME                                                                STATE     NAMESPACE
2b0d4c69ab36a  .mirror.primary.04cbe7ba-c59a-4b86-b0b6-94c4bf07cd1d.2b0d4c69ab36a  complete  mirror (primary peer_uuids:[e41d994b-5428-4a4c-a96f-e7c91a1a270b])
2b164e0c03694  .mirror.primary.04cbe7ba-c59a-4b86-b0b6-94c4bf07cd1d.2b164e0c03694  complete  mirror (primary peer_uuids:[e41d994b-5428-4a4c-a96f-e7c91a1a270b])
[ceph: root@tala001 /]#

13) calculate md5sum of all files on site-a:
site-a:
[ceph: root@tala001 /]# rbd export pool_2/image_small1 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_2/image_small2 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_2/image_large1 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_2/image_large2 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_2/image_large3 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt
Exporting image: 100% complete...done.
3b7848fac2830f7b4704dbcc68d334b1  file1.txt
Exporting image: 100% complete...done.
0ef21589ce944b0feaa51d1cd7d65bef  file1.txt
Exporting image: 100% complete...done.
0ebf7c09ee1f025370f8649a125deb6a  file1.txt
Exporting image: 100% complete...done.
fadad2860dece90cb7129a52e498b6c8  file1.txt
Exporting image: 100% complete...done.
c117982a1830506533ffb8dfe88e5385  file1.txt
[ceph: root@tala001 /]#

13) Once 2 smaller files are completely synced, match md5sum of smaller files on site-b, should match with step-10
Site-b:
[ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc
      0       0       0
[ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc
      0       0       0
[ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc
      0       0       0
[ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc
      0       0       0
[ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc
      5      15    1749
[ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc
      5      15    1749
[ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc
      5      15    1749
[ceph: root@tala001 /]#
[ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc
      5      15    1749
[ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc
      5      15    1749
[ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc
      5      15    1749
[ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc
      5      15    1749
[ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc
      5      15    1749
[ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc
      3       9    1051
[ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc
      3       9    1051
[ceph: root@tala001 /]#

[ceph: root@tala005 /]# rbd export pool_2/image_small1 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_2/image_small2 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt
Exporting image: 100% complete...done.
3b7848fac2830f7b4704dbcc68d334b1  file1.txt
Exporting image: 100% complete...done.
0ef21589ce944b0feaa51d1cd7d65bef  file1.txt
[ceph: root@tala005 /]#

14) Once the larger files are still in progress, do force promote on site-b:
[ceph: root@tala005 /]# rbd group snap list --pool pool_2 --group group_1 --debug_rbd 0
ID             NAME                                                                    STATE       NAMESPACE                                                       
2b0d4c69ab36a  .mirror.non-primary.04cbe7ba-c59a-4b86-b0b6-94c4bf07cd1d.2b0d4c69ab36a    complete  mirror (non-primary peer_uuids:[] f47a67ba-9c21-4d6d-9e07-31917c44f3b9:2b0d4c69ab36a)
2b164e0c03694  .mirror.non-primary.04cbe7ba-c59a-4b86-b0b6-94c4bf07cd1d.2b164e0c03694  incomplete  mirror (non-primary peer_uuids:[] f47a67ba-9c21-4d6d-9e07-31917c44f3b9:2b164e0c03694)
[ceph: root@tala005 /]# rbd mirror group promote --pool pool_2 --group group_1 --force --debug_rbd 0
Group promoted to primary
[ceph: root@tala005 /]#  rbd group snap list --pool pool_2 --group group_1 --debug_rbd 0
ID            NAME                                                               STATE     NAMESPACE
1cc41ae0ab3e  .mirror.primary.04cbe7ba-c59a-4b86-b0b6-94c4bf07cd1d.1cc41ae0ab3e  complete  mirror (primary peer_uuids:[cda556b9-9c5d-42c1-a595-df0c0fcb056d])
[ceph: root@tala005 /]#
[ceph: root@tala005 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0
group_1:
  global_id:   04cbe7ba-c59a-4b86-b0b6-94c4bf07cd1d
  state:       up+stopped
  description: local group is primary
  service:     tala005.jwqeia on tala005
  last_update: 2025-04-30 12:41:05
  images:
    image:       13/53ce4bf9-5977-4a09-81a8-e869ab65b61b
    state:       up+stopped
    description: local image is primary

    image:       13/99f4d143-0453-416f-aabe-55585d553bd5
    state:       up+stopped
    description: local image is primary

    image:       13/a1b7e1f6-e474-480e-971e-46ec1f2c2d8a
    state:       up+stopped
    description: local image is primary

    image:       13/a1f3931f-a27c-4456-8df6-a1f39442229a
    state:       up+stopped
    description: local image is primary

    image:       13/f378f4c8-8bc6-4f87-9206-2bf9488d96fa
    state:       up+stopped
    description: local image is primary
  peer_sites:
    name: site-a
    state: up+stopped
    description: local group is primary
    last_update: 2025-04-30 12:41:09
    images:
      image:       13/53ce4bf9-5977-4a09-81a8-e869ab65b61b
      state:       up+stopped
      description: local image is primary

      image:       13/99f4d143-0453-416f-aabe-55585d553bd5
      state:       up+stopped
      description: local image is primary

      image:       13/a1b7e1f6-e474-480e-971e-46ec1f2c2d8a
      state:       up+stopped
      description: local image is primary

      image:       13/a1f3931f-a27c-4456-8df6-a1f39442229a
      state:       up+stopped
      description: local image is primary

      image:       13/f378f4c8-8bc6-4f87-9206-2bf9488d96fa
      state:       up+stopped
      description: local image is primary
  snapshots:
    .mirror.primary.04cbe7ba-c59a-4b86-b0b6-94c4bf07cd1d.1cc41ae0ab3e
[ceph: root@tala005 /]#

15) Calculate md5sum of all 5 files on site-b  
Site-b: (Expected to revert to match with step-10. but roll back is happening to step-7)
rbd export pool_2/image_small1 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_2/image_small2 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_2/image_large1 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_2/image_large2 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt

rbd export pool_2/image_large3 file1.txt --debug_rbd 0
md5sum file1.txt
rm -rf file1.txt
Exporting image: 100% complete...done.
b766f3ffb18a40013f99ee3a5f274c4e  file1.txt
Exporting image: 100% complete...done.
3c9f58793a78877dd0a024d451358edc  file1.txt
Exporting image: 100% complete...done.
e31145888d2505bb64af72ef83eb41ea  file1.txt
Exporting image: 100% complete...done.
90a481e5cb78972495b7dd2061146ce5  file1.txt
Exporting image: 100% complete...done.
67873822f8af199145c221633b14cd3b  file1.txt
[ceph: root@tala005 /]#

As, the text is long, option 3 logs are pasted in file. Kindly find it attached.

Version-Release number of selected component (if applicable):
ceph version 19.2.1-167.el9cp (3e3ca3a16912abfd58b473e2ae724703f9a0415d) squid (stable)

How reproducible: All the time

Steps to Reproduce: Listed above

Actual results: md5sum not matching, files are empty

Expected results: md5sum should match with latest consistent snapshot


Additional info: NA

Comment 16 Red Hat Bugzilla 2026-03-04 09:52:42 UTC
This product has been discontinued or is no longer tracked in Red Hat Bugzilla.