Bug 2363632
| Summary: | Data unavailability on failover during in-complete mirror group snapshot process (N-1 behavior on group rollback) | |||
|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | aarsharm | |
| Component: | RBD-Mirror | Assignee: | Prasanna Kumar Kalever <prasanna.kalever> | |
| Status: | CLOSED UPSTREAM | QA Contact: | aarsharm | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 8.1 | CC: | ceph-eng-bugs, cephqe-warriors, idryomov, mmurthy, prasanna.kalever | |
| Target Milestone: | --- | |||
| Target Release: | 9.1 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2374515 (view as bug list) | Environment: | ||
| Last Closed: | 2026-03-04 09:52:42 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
This product has been discontinued or is no longer tracked in Red Hat Bugzilla. |
Description of problem: File Integrity compromised after failover triggered during in-complete snapshot operation. The expectancy for consistency groups is to rollback to previous complete snapshot, but somehow i am not seeing that happen in failover. Have tried 3 options, listed below: Partial Group Replication for verifying consistency. XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Option 1: Reverting from user/manual group snapshot to system group snapshot 1) Create 5 Images (2-Small each of size 1G, 3 large each of size 10G) [ceph: root@tala001 /]# rbd create image_small1 --size 1G --pool pool_1 --debug_rbd 0 rbd create image_small2 --size 1G --pool pool_1 --debug_rbd 0 rbd create image_large1 --size 10G --pool pool_1 --debug_rbd 0 rbd create image_large2 --size 10G --pool pool_1 --debug_rbd 0 rbd create image_large3 --size 10G --pool pool_1 --debug_rbd 0 [ceph: root@tala001 /]# 2) Create group [ceph: root@tala001 /]# rbd group create --pool pool_1 --group group_1 --debug_rbd 0 [ceph: root@tala001 /]# 3) Add all images to group [ceph: root@tala001 /]# rbd group image add --image image_small1 --group group_1 --group-pool pool_1 --image-pool pool_1 --debug_rbd 0 rbd group image add --image image_small2 --group group_1 --group-pool pool_1 --image-pool pool_1 --debug_rbd 0 rbd group image add --image image_large1 --group group_1 --group-pool pool_1 --image-pool pool_1 --debug_rbd 0 rbd group image add --image image_large2 --group group_1 --group-pool pool_1 --image-pool pool_1 --debug_rbd 0 rbd group image add --image image_large3 --group group_1 --group-pool pool_1 --image-pool pool_1 --debug_rbd 0 [ceph: root@tala001 /]# 4) Write 10% IO on the all images [ceph: root@tala001 /]# rbd bench --io-type write --io-threads 16 --io-total 100M --io-pattern rand --io-size 4096 pool_1/image_small1 --debug_rbd 0 rbd bench --io-type write --io-threads 16 --io-total 100M --io-pattern rand --io-size 4096 pool_1/image_small2 --debug_rbd 0 rbd bench --io-type write --io-threads 16 --io-total 1G --io-pattern rand --io-size 4096 pool_1/image_large1 --debug_rbd 0 rbd bench --io-type write --io-threads 16 --io-total 1G --io-pattern rand --io-size 4096 pool_1/image_large2 --debug_rbd 0 rbd bench --io-type write --io-threads 16 --io-total 1G --io-pattern rand --io-size 4096 pool_1/image_large3 --debug_rbd 0 bench type write io_size 4096 io_threads 16 bytes 104857600 pattern random SEC OPS OPS/SEC BYTES/SEC elapsed: 0 ops: 25600 ops/sec: 63681.2 bytes/sec: 249 MiB/s bench type write io_size 4096 io_threads 16 bytes 104857600 pattern random SEC OPS OPS/SEC BYTES/SEC elapsed: 0 ops: 25600 ops/sec: 66320.9 bytes/sec: 259 MiB/s bench type write io_size 4096 io_threads 16 bytes 1073741824 pattern random SEC OPS OPS/SEC BYTES/SEC 1 59744 59819.5 234 MiB/s 2 121728 60902.1 238 MiB/s 3 176368 58813.9 230 MiB/s 4 231040 57778.1 226 MiB/s elapsed: 4 ops: 262144 ops/sec: 57651.7 bytes/sec: 225 MiB/s bench type write io_size 4096 io_threads 16 bytes 1073741824 pattern random SEC OPS OPS/SEC BYTES/SEC 1 52896 52964.7 207 MiB/s 2 110848 55459.4 217 MiB/s 3 168960 56343.8 220 MiB/s 4 227040 56777.9 222 MiB/s elapsed: 4 ops: 262144 ops/sec: 54375.1 bytes/sec: 212 MiB/s bench type write io_size 4096 io_threads 16 bytes 1073741824 pattern random SEC OPS OPS/SEC BYTES/SEC 1 50224 50290 196 MiB/s 2 106928 53498.4 209 MiB/s 3 162752 54273.8 212 MiB/s 4 219200 54817.4 214 MiB/s elapsed: 4 ops: 262144 ops/sec: 55118.3 bytes/sec: 215 MiB/s [ceph: root@tala001 /]# 5) Enable mirroring on group [ceph: root@tala001 /]# rbd mirror group enable --group group_1 --pool pool_1 --debug_rbd 0 Mirroring enabled [ceph: root@tala001 /]# 6) Wait for system group snapshot to complete on site-b site-a: [ceph: root@tala001 /]# rbd mirror group status pool_1/group_1 --debug_rbd 0 | grep "syncing" | wc 0 0 0 [ceph: root@tala001 /]# Site-b: [ceph: root@tala005 /]# rbd mirror group status pool_1/group_1 --debug_rbd 0 | grep "syncing" | wc 0 0 0 [ceph: root@tala005 /]# 7) Check md5sum of all images on both site-a and site-b should match Site-a: [ceph: root@tala001 /]# rbd export pool_1/image_small1 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_1/image_small2 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_1/image_large1 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_1/image_large2 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_1/image_large3 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt Exporting image: 100% complete...done. b60925d09bf1896ef9603ed7bd2b690f file1.txt Exporting image: 100% complete...done. cb9569fe9df63eda7b0c46e6c9e4e526 file1.txt Exporting image: 100% complete...done. ccc1b513e7d67c510ad0b08f73ac9d35 file1.txt Exporting image: 100% complete...done. 43c7e57673fe2c86a4b7b200cf3090f5 file1.txt Exporting image: 100% complete...done. 4c8ce577e80528d47a96832f61c2288a file1.txt [ceph: root@tala001 /]# site-b: [ceph: root@tala005 /]# rbd export pool_1/image_small1 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_1/image_small2 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_1/image_large1 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_1/image_large2 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_1/image_large3 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt Exporting image: 100% complete...done. b60925d09bf1896ef9603ed7bd2b690f file1.txt Exporting image: 100% complete...done. cb9569fe9df63eda7b0c46e6c9e4e526 file1.txt Exporting image: 100% complete...done. ccc1b513e7d67c510ad0b08f73ac9d35 file1.txt Exporting image: 100% complete...done. 43c7e57673fe2c86a4b7b200cf3090f5 file1.txt Exporting image: 100% complete...done. 4c8ce577e80528d47a96832f61c2288a file1.txt [ceph: root@tala005 /]# 8) Write 90% of the data on site-a [ceph: root@tala001 /]# rbd bench --io-type write --io-threads 16 --io-total 900M --io-pattern rand --io-size 4096 pool_1/image_small1 --debug_rbd 0 rbd bench --io-type write --io-threads 16 --io-total 900M --io-pattern rand --io-size 4096 pool_1/image_small2 --debug_rbd 0 rbd bench --io-type write --io-threads 16 --io-total 9G --io-pattern rand --io-size 4096 pool_1/image_large1 --debug_rbd 0 rbd bench --io-type write --io-threads 16 --io-total 9G --io-pattern rand --io-size 4096 pool_1/image_large2 --debug_rbd 0 rbd bench --io-type write --io-threads 16 --io-total 9G --io-pattern rand --io-size 4096 pool_1/image_large3 --debug_rbd 0 bench type write io_size 4096 io_threads 16 bytes 943718400 pattern random SEC OPS OPS/SEC BYTES/SEC 1 52624 52692.4 206 MiB/s 2 104672 52369.9 205 MiB/s 3 157280 52449.2 205 MiB/s 4 213648 53429 209 MiB/s elapsed: 4 ops: 230400 ops/sec: 53382.4 bytes/sec: 209 MiB/s bench type write io_size 4096 io_threads 16 bytes 943718400 pattern random SEC OPS OPS/SEC BYTES/SEC 1 55328 55399.1 216 MiB/s 2 107664 53866.6 210 MiB/s 3 160544 53537.5 209 MiB/s 4 215584 53913.2 211 MiB/s elapsed: 4 ops: 230400 ops/sec: 53781.2 bytes/sec: 210 MiB/s bench type write io_size 4096 io_threads 16 bytes 9663676416 pattern random SEC OPS OPS/SEC BYTES/SEC 1 48128 48191.9 188 MiB/s 2 103280 51673.5 202 MiB/s 3 158176 52747.9 206 MiB/s 4 230176 57562.1 225 MiB/s 5 308784 61772 241 MiB/s 6 363328 63039.6 246 MiB/s 7 420192 63382 248 MiB/s 8 477360 63836.4 249 MiB/s 9 533360 60636.4 237 MiB/s 10 588976 56038.1 219 MiB/s 11 646560 56646.1 221 MiB/s 12 703440 56649.3 221 MiB/s 13 759520 56431.7 220 MiB/s 14 814736 56274.9 220 MiB/s 15 872000 56604.5 221 MiB/s 16 929008 56489.3 221 MiB/s 17 985248 56361.3 220 MiB/s 18 1040816 56258.9 220 MiB/s 19 1096672 56386.9 220 MiB/s 20 1152400 56079.7 219 MiB/s 21 1206016 55401.3 216 MiB/s 22 1259584 54866.9 214 MiB/s 23 1312432 54322.9 212 MiB/s 24 1367632 54191.7 212 MiB/s 25 1421552 53830.1 210 MiB/s 26 1476256 54047.7 211 MiB/s 27 1531952 54473.3 213 MiB/s 28 1587904 55094.1 215 MiB/s 29 1643312 55135.7 215 MiB/s 30 1698560 55401.3 216 MiB/s 31 1753360 55420.5 216 MiB/s 32 1808624 55334.1 216 MiB/s 33 1861024 54623.7 213 MiB/s 34 1914144 54166.1 212 MiB/s 35 1967744 53836.5 210 MiB/s 36 2023392 54006.1 211 MiB/s 37 2078192 53913.3 211 MiB/s 38 2134096 54614.1 213 MiB/s 39 2189664 55103.7 215 MiB/s 40 2244272 55305.3 216 MiB/s 41 2297952 54911.7 214 MiB/s 42 2351584 54678.1 214 MiB/s elapsed: 42 ops: 2359296 ops/sec: 55992.1 bytes/sec: 219 MiB/s bench type write io_size 4096 io_threads 16 bytes 9663676416 pattern random SEC OPS OPS/SEC BYTES/SEC 1 64608 64688.3 253 MiB/s 2 130096 65088.2 254 MiB/s 3 190080 63386.1 248 MiB/s 4 252576 63163.4 247 MiB/s 5 326448 65305.5 255 MiB/s 6 396352 66348.4 259 MiB/s 7 450816 64143.6 251 MiB/s 8 505680 63119.6 247 MiB/s 9 562240 61932.4 242 MiB/s 10 616720 58054.1 227 MiB/s 11 672288 55186.9 216 MiB/s 12 727968 55430.1 217 MiB/s 13 784480 55759.7 218 MiB/s 14 841280 55807.7 218 MiB/s 15 898176 56290.9 220 MiB/s 16 956032 56748.5 222 MiB/s 17 1013424 57090.9 223 MiB/s 18 1071440 57391.7 224 MiB/s 19 1128784 57500.5 225 MiB/s 20 1186144 57593.3 225 MiB/s 21 1259360 60665.2 237 MiB/s 22 1341680 65650.8 256 MiB/s 23 1415168 68745.2 269 MiB/s 24 1498992 74041.2 289 MiB/s 25 1583344 79439.5 310 MiB/s 26 1665584 81244.3 317 MiB/s 27 1740688 79801.1 312 MiB/s 28 1815248 80015.5 313 MiB/s 29 1887296 77660.3 303 MiB/s 30 1963520 76034.8 297 MiB/s 31 2047760 76434.8 299 MiB/s 32 2126208 77103.5 301 MiB/s 33 2201728 77295.5 302 MiB/s 34 2281968 78933.9 308 MiB/s elapsed: 35 ops: 2359296 ops/sec: 67252.4 bytes/sec: 263 MiB/s bench type write io_size 4096 io_threads 16 bytes 9663676416 pattern random SEC OPS OPS/SEC BYTES/SEC 1 59424 59499.2 232 MiB/s 2 131872 65976.6 258 MiB/s 3 206256 68779.9 269 MiB/s 4 284000 71021.3 277 MiB/s 5 361872 72391.7 283 MiB/s 6 439136 75942 297 MiB/s 7 516496 76924.3 300 MiB/s 8 598272 78402.7 306 MiB/s 9 678592 78917.9 308 MiB/s 10 748208 77266.7 302 MiB/s 11 812816 74735.6 292 MiB/s 12 881936 73087.6 285 MiB/s 13 947792 69903.6 273 MiB/s 14 1010944 66470 260 MiB/s 15 1075872 65532.4 256 MiB/s 16 1141376 65711.6 257 MiB/s 17 1208304 65273.2 255 MiB/s 18 1275376 65516.4 256 MiB/s 19 1342224 66255.6 259 MiB/s 20 1405984 66022 258 MiB/s 21 1468864 65497.2 256 MiB/s 22 1529216 64182 251 MiB/s 23 1599440 64812.4 253 MiB/s 24 1672800 66114.8 258 MiB/s 25 1746288 68060.4 266 MiB/s 26 1820112 70249.2 274 MiB/s 27 1893232 72802.8 284 MiB/s 28 1965600 73231.6 286 MiB/s 29 2036240 72687.6 284 MiB/s 30 2109280 72598 284 MiB/s 31 2177328 71442.8 279 MiB/s 32 2251104 71574 280 MiB/s 33 2325984 72076.4 282 MiB/s elapsed: 33 ops: 2359296 ops/sec: 70548.5 bytes/sec: 276 MiB/s [ceph: root@tala001 /]# 9) Check md5sum of all files on site-a Site-a: [ceph: root@tala001 /]# rbd export pool_1/image_small1 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_1/image_small2 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_1/image_large1 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_1/image_large2 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_1/image_large3 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt Exporting image: 100% complete...done. f5f9c66265ce628301c9028755684f9e file1.txt Exporting image: 100% complete...done. 7222ade8d98194a1f2e6d34dbf420432 file1.txt Exporting image: 100% complete...done. 58c591442230893c36e3525435a18951 file1.txt Exporting image: 100% complete...done. 8c2eca3b46c1641450bf9adc51b7bb86 file1.txt Exporting image: 100% complete...done. 122c2e7b6aefb73fd86442d57c2718ad file1.txt [ceph: root@tala001 /]# 10) Create manual mirror group snapshot [ceph: root@tala001 /]# rbd group snap list --group group_1 --pool pool_1 --debug_rbd 0 ID NAME STATE NAMESPACE 2aee8a5b76b4b .mirror.primary.fc12c7f4-fe10-4a20-96b4-73ec95410ee2.2aee8a5b76b4b complete mirror (primary peer_uuids:[5738c991-3a0c-49b8-9174-6d7e60ae2719]) [ceph: root@tala001 /]# rbd mirror group snapshot -p pool_1 --group group_1 --debug_rbd 0 Snapshot ID: 2af60e0a4263a [ceph: root@tala001 /]# rbd group snap list --group group_1 --pool pool_1 --debug_rbd 0 ID NAME STATE NAMESPACE 2aee8a5b76b4b .mirror.primary.fc12c7f4-fe10-4a20-96b4-73ec95410ee2.2aee8a5b76b4b complete mirror (primary peer_uuids:[5738c991-3a0c-49b8-9174-6d7e60ae2719]) 2af60e0a4263a .mirror.primary.fc12c7f4-fe10-4a20-96b4-73ec95410ee2.2af60e0a4263a complete mirror (primary peer_uuids:[5738c991-3a0c-49b8-9174-6d7e60ae2719]) [ceph: root@tala001 /]# 11) When smaller 2 files gets synced to site-b. Check md5sum of both smaller files on site-b (it should match with md5sum of step-9) [ceph: root@tala001 /]# rbd mirror group status pool_1/group_1 --debug_rbd 0 | grep "syncing" | wc 5 15 1612 [ceph: root@tala001 /]# rbd mirror group status pool_1/group_1 --debug_rbd 0 | grep "syncing" | wc 5 15 1612 [ceph: root@tala001 /]# rbd mirror group status pool_1/group_1 --debug_rbd 0 | grep "syncing" | wc 5 15 1612 [ceph: root@tala001 /]# rbd mirror group status pool_1/group_1 --debug_rbd 0 | grep "syncing" | wc 5 15 1612 [ceph: root@tala001 /]# rbd mirror group status pool_1/group_1 --debug_rbd 0 | grep "syncing" | wc 5 15 1612 [ceph: root@tala001 /]# rbd mirror group status pool_1/group_1 --debug_rbd 0 | grep "syncing" | wc 5 15 1612 [ceph: root@tala001 /]# rbd mirror group status pool_1/group_1 --debug_rbd 0 | grep "syncing" | wc 5 15 1612 [ceph: root@tala001 /]# rbd mirror group status pool_1/group_1 --debug_rbd 0 | grep "syncing" | wc 3 9 1053 <<<<<<<<<<< Here 2 smaller files are synced that's why count has changed to 3 from 5 [ceph: root@tala001 /]# site-b: [ceph: root@tala005 /]# rbd export pool_1/image_small1 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_1/image_small2 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt Exporting image: 100% complete...done. f5f9c66265ce628301c9028755684f9e file1.txt Exporting image: 100% complete...done. 7222ade8d98194a1f2e6d34dbf420432 file1.txt [ceph: root@tala005 /]# 12) While larger images are still in progress, do force promote on site-b [ceph: root@tala005 /]# rbd group snap list --pool pool_1 --group group_1 --debug_rbd 0 ID NAME STATE NAMESPACE 2aee8a5b76b4b .mirror.non-primary.fc12c7f4-fe10-4a20-96b4-73ec95410ee2.2aee8a5b76b4b complete mirror (non-primary peer_uuids:[] ee75d1c2-4d1e-4915-be63-4f3d8dd36a53:2aee8a5b76b4b) 2af60e0a4263a .mirror.non-primary.fc12c7f4-fe10-4a20-96b4-73ec95410ee2.2af60e0a4263a incomplete mirror (non-primary peer_uuids:[] ee75d1c2-4d1e-4915-be63-4f3d8dd36a53:2af60e0a4263a) [ceph: root@tala005 /]# rbd mirror group promote --pool pool_1 --group group_1 --force --debug_rbd 0 Group promoted to primary [ceph: root@tala005 /]# rbd mirror group status pool_1/group_1 --debug_rbd 0 group_1: global_id: fc12c7f4-fe10-4a20-96b4-73ec95410ee2 state: up+stopped description: local group is primary service: tala005.jwqeia on tala005 last_update: 2025-04-30 12:17:05 images: image: 12/2e595353-56f9-4e59-bb9e-97e42d0f5a4c state: up+stopped description: local image is primary image: 12/52309cf9-d56a-490e-bef2-837fb398f40d state: up+stopped description: local image is primary image: 12/6febbdf6-e56e-40e3-8117-769a402e0d0d state: up+stopped description: local image is primary image: 12/856a6b7d-6b6c-4ef7-b6fb-e561f6ff1d0e state: up+stopped description: local image is primary image: 12/febbcc3e-71d7-4ec1-81b8-3a3c74a91c9d state: up+stopped description: local image is primary peer_sites: name: site-a state: up+stopped description: local group is primary last_update: 2025-04-30 12:17:09 images: image: 12/2e595353-56f9-4e59-bb9e-97e42d0f5a4c state: up+stopped description: local image is primary image: 12/52309cf9-d56a-490e-bef2-837fb398f40d state: up+stopped description: local image is primary image: 12/6febbdf6-e56e-40e3-8117-769a402e0d0d state: up+stopped description: local image is primary image: 12/856a6b7d-6b6c-4ef7-b6fb-e561f6ff1d0e state: up+stopped description: local image is primary image: 12/febbcc3e-71d7-4ec1-81b8-3a3c74a91c9d state: up+stopped description: local image is primary snapshots: .mirror.primary.fc12c7f4-fe10-4a20-96b4-73ec95410ee2.1cb6fa778e08e [ceph: root@tala005 /]# 13) check md5sum on site-b for all 5 files. (The expectancy is it should roll back to the consistent point i.e. Step-7) site-b: (md5sum coming of empty files) [ceph: root@tala005 /]# rbd export pool_1/image_small1 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_1/image_small2 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_1/image_large1 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_1/image_large2 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_1/image_large3 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt Exporting image: 100% complete...done. cd573cfaace07e7949bc0c46028904ff file1.txt Exporting image: 100% complete...done. cd573cfaace07e7949bc0c46028904ff file1.txt Exporting image: 100% complete...done. 2dd26c4d4799ebd29fa31e48d49e8e53 file1.txt Exporting image: 100% complete...done. 2dd26c4d4799ebd29fa31e48d49e8e53 file1.txt Exporting image: 100% complete...done. 2dd26c4d4799ebd29fa31e48d49e8e53 file1.txt [ceph: root@tala005 /]# Here I am seeing the md5sum of both smaller files are i.e "cd573cfaace07e7949bc0c46028904ff" and larger files as "2dd26c4d4799ebd29fa31e48d49e8e53", mostly this md5sum are of empty files. But I am expecting it to have md5sum of all 5 files similar to step-7. XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXx Option2: Reverting from user/manual group snapshot to another user/manual group snapshot (Group mirror enable on files having data) 1) Create 5 Images (2-Small each of size 1G, 3 large each of size 10G) rbd create image_small2 --size 1G --pool pool_2 --debug_rbd 0 rbd create image_large1 --size 10G --pool pool_2 --debug_rbd 0 rbd create image_large2 --size 10G --pool pool_2 --debug_rbd 0 rbd create image_large3 --size 10G --pool pool_2 --debug_rbd 0 2) Create group [ceph: root@tala001 /]# rbd group create --pool pool_2 --group group_1 --debug_rbd 0 3) Add all images to group [ceph: root@tala001 /]# rbd group image add --image image_small1 --group group_1 --group-pool pool_2 --image-pool pool_2 --debug_rbd 0 rbd group image add --image image_small2 --group group_1 --group-pool pool_2 --image-pool pool_2 --debug_rbd 0 rbd group image add --image image_large1 --group group_1 --group-pool pool_2 --image-pool pool_2 --debug_rbd 0 rbd group image add --image image_large2 --group group_1 --group-pool pool_2 --image-pool pool_2 --debug_rbd 0 rbd group image add --image image_large3 --group group_1 --group-pool pool_2 --image-pool pool_2 --debug_rbd 0 [ceph: root@tala001 /]# 4) Write 10% IO on the all images [ceph: root@tala001 /]# rbd bench --io-type write --io-threads 16 --io-total 100M --io-pattern rand --io-size 4096 pool_2/image_small1 --debug_rbd 0 rbd bench --io-type write --io-threads 16 --io-total 100M --io-pattern rand --io-size 4096 pool_2/image_small2 --debug_rbd 0 rbd bench --io-type write --io-threads 16 --io-total 1G --io-pattern rand --io-size 4096 pool_2/image_large1 --debug_rbd 0 rbd bench --io-type write --io-threads 16 --io-total 1G --io-pattern rand --io-size 4096 pool_2/image_large2 --debug_rbd 0 rbd bench --io-type write --io-threads 16 --io-total 1G --io-pattern rand --io-size 4096 pool_2/image_large3 --debug_rbd 0 bench type write io_size 4096 io_threads 16 bytes 104857600 pattern random SEC OPS OPS/SEC BYTES/SEC elapsed: 0 ops: 25600 ops/sec: 54935.3 bytes/sec: 215 MiB/s bench type write io_size 4096 io_threads 16 bytes 104857600 pattern random SEC OPS OPS/SEC BYTES/SEC elapsed: 0 ops: 25600 ops/sec: 66149.5 bytes/sec: 258 MiB/s bench type write io_size 4096 io_threads 16 bytes 1073741824 pattern random SEC OPS OPS/SEC BYTES/SEC 1 50064 50129.8 196 MiB/s 2 121840 60958.1 238 MiB/s 3 193360 64479.8 252 MiB/s 4 253328 63351.5 247 MiB/s elapsed: 4 ops: 262144 ops/sec: 63106 bytes/sec: 247 MiB/s bench type write io_size 4096 io_threads 16 bytes 1073741824 pattern random SEC OPS OPS/SEC BYTES/SEC 1 52960 53028.7 207 MiB/s 2 109488 54779.1 214 MiB/s 3 163984 54684.6 214 MiB/s 4 218768 54709.4 214 MiB/s elapsed: 4 ops: 262144 ops/sec: 54945 bytes/sec: 215 MiB/s bench type write io_size 4096 io_threads 16 bytes 1073741824 pattern random SEC OPS OPS/SEC BYTES/SEC 1 54640 54710.4 214 MiB/s 2 134016 67049.1 262 MiB/s 3 203696 67926.2 265 MiB/s elapsed: 3 ops: 262144 ops/sec: 65666.9 bytes/sec: 257 MiB/s [ceph: root@tala001 /]# 5) Enable mirroring on group [ceph: root@tala001 /]# rbd mirror group enable --group group_1 --pool pool_2 --debug_rbd 0 Mirroring enabled [ceph: root@tala001 /]# 6) Wait for system group snapshot to complete on site-b site-a: [ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc 0 0 0 [ceph: root@tala001 /]# site-b: [ceph: root@tala005 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc 0 0 0 [ceph: root@tala005 /]# 7) Check md5sum should match on both sites for all files site-a: [ceph: root@tala001 /]# rbd export pool_2/image_small1 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_2/image_small2 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_2/image_large1 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_2/image_large2 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_2/image_large3 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt Exporting image: 100% complete...done. b766f3ffb18a40013f99ee3a5f274c4e file1.txt Exporting image: 100% complete...done. 3c9f58793a78877dd0a024d451358edc file1.txt Exporting image: 100% complete...done. e31145888d2505bb64af72ef83eb41ea file1.txt Exporting image: 100% complete...done. 90a481e5cb78972495b7dd2061146ce5 file1.txt Exporting image: 100% complete...done. 67873822f8af199145c221633b14cd3b file1.txt [ceph: root@tala001 /]# Site-b: [ceph: root@tala005 /]# rbd export pool_2/image_small1 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_2/image_small2 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_2/image_large1 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_2/image_large2 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_2/image_large3 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt Exporting image: 100% complete...done. b766f3ffb18a40013f99ee3a5f274c4e file1.txt Exporting image: 100% complete...done. 3c9f58793a78877dd0a024d451358edc file1.txt Exporting image: 100% complete...done. e31145888d2505bb64af72ef83eb41ea file1.txt Exporting image: 100% complete...done. 90a481e5cb78972495b7dd2061146ce5 file1.txt Exporting image: 100% complete...done. 67873822f8af199145c221633b14cd3b file1.txt [ceph: root@tala005 /]# 8) Write 10% more data to all images [ceph: root@tala001 /]# rbd bench --io-type write --io-threads 16 --io-total 100M --io-pattern rand --io-size 4096 pool_2/image_small1 --debug_rbd 0 rbd bench --io-type write --io-threads 16 --io-total 100M --io-pattern rand --io-size 4096 pool_2/image_small2 --debug_rbd 0 rbd bench --io-type write --io-threads 16 --io-total 1G --io-pattern rand --io-size 4096 pool_2/image_large1 --debug_rbd 0 rbd bench --io-type write --io-threads 16 --io-total 1G --io-pattern rand --io-size 4096 pool_2/image_large2 --debug_rbd 0 rbd bench --io-type write --io-threads 16 --io-total 1G --io-pattern rand --io-size 4096 pool_2/image_large3 --debug_rbd 0 bench type write io_size 4096 io_threads 16 bytes 104857600 pattern random SEC OPS OPS/SEC BYTES/SEC elapsed: 0 ops: 25600 ops/sec: 55053.4 bytes/sec: 215 MiB/s bench type write io_size 4096 io_threads 16 bytes 104857600 pattern random SEC OPS OPS/SEC BYTES/SEC elapsed: 0 ops: 25600 ops/sec: 59396.4 bytes/sec: 232 MiB/s bench type write io_size 4096 io_threads 16 bytes 1073741824 pattern random SEC OPS OPS/SEC BYTES/SEC 1 51360 51427.1 201 MiB/s 2 116816 58444.9 228 MiB/s 3 170384 56818.6 222 MiB/s 4 224288 56089.7 219 MiB/s elapsed: 4 ops: 262144 ops/sec: 55775 bytes/sec: 218 MiB/s bench type write io_size 4096 io_threads 16 bytes 1073741824 pattern random SEC OPS OPS/SEC BYTES/SEC 1 47120 47182.9 184 MiB/s 2 109360 54715 214 MiB/s 3 163408 54492.5 213 MiB/s 4 218096 54541.3 213 MiB/s elapsed: 4 ops: 262144 ops/sec: 55456.4 bytes/sec: 217 MiB/s bench type write io_size 4096 io_threads 16 bytes 1073741824 pattern random SEC OPS OPS/SEC BYTES/SEC 1 50832 50898.6 199 MiB/s 2 106112 53090.2 207 MiB/s 3 161136 53734.9 210 MiB/s 4 216256 54081.2 211 MiB/s elapsed: 4 ops: 262144 ops/sec: 54128.1 bytes/sec: 211 MiB/s [ceph: root@tala001 /]# 9) Take manual group snapshot and wait for snapshot to complete on site-b [ceph: root@tala001 /]# rbd group snap list --group group_1 --pool pool_2 --debug_rbd 0 ID NAME STATE NAMESPACE 2b0622ae17422 .mirror.primary.04cbe7ba-c59a-4b86-b0b6-94c4bf07cd1d.2b0622ae17422 complete mirror (primary peer_uuids:[e41d994b-5428-4a4c-a96f-e7c91a1a270b]) [ceph: root@tala001 /]# rbd mirror group snapshot -p pool_2 --group group_1 --debug_rbd 0 Snapshot ID: 2b0d4c69ab36a [ceph: root@tala001 /]# rbd group snap list --group group_1 --pool pool_2 --debug_rbd 0 ID NAME STATE NAMESPACE 2b0622ae17422 .mirror.primary.04cbe7ba-c59a-4b86-b0b6-94c4bf07cd1d.2b0622ae17422 complete mirror (primary peer_uuids:[e41d994b-5428-4a4c-a96f-e7c91a1a270b]) 2b0d4c69ab36a .mirror.primary.04cbe7ba-c59a-4b86-b0b6-94c4bf07cd1d.2b0d4c69ab36a complete mirror (primary peer_uuids:[e41d994b-5428-4a4c-a96f-e7c91a1a270b]) [ceph: root@tala001 /]# Wait for snapshot to complete [ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc 0 0 0 [ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc 5 15 1613 [ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc 5 15 1613 [ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc 5 15 1613 [ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc 5 15 1613 [ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc 5 15 1613 [ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc 5 15 1613 [ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc 3 9 1051 [ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc 3 9 1051 [ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc 3 9 1051 [ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc 0 0 0 [ceph: root@tala001 /]# [ceph: root@tala005 /]# rbd group snap list --group group_1 --pool pool_2 --debug_rbd 0 ID NAME STATE NAMESPACE 2b0622ae17422 .mirror.non-primary.04cbe7ba-c59a-4b86-b0b6-94c4bf07cd1d.2b0622ae17422 complete mirror (non-primary peer_uuids:[] f47a67ba-9c21-4d6d-9e07-31917c44f3b9:2b0622ae17422) 2b0d4c69ab36a .mirror.non-primary.04cbe7ba-c59a-4b86-b0b6-94c4bf07cd1d.2b0d4c69ab36a complete mirror (non-primary peer_uuids:[] f47a67ba-9c21-4d6d-9e07-31917c44f3b9:2b0d4c69ab36a) 10) Calculate md5sum of all files on site-a and site-b (Should match) Site-a: [ceph: root@tala001 /]# rbd export pool_2/image_small1 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_2/image_small2 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_2/image_large1 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_2/image_large2 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_2/image_large3 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt Exporting image: 100% complete...done. 7bc9b378bc8f719570345ff414da70ff file1.txt Exporting image: 100% complete...done. 56c78f46fa48d3889797feead6b85eed file1.txt Exporting image: 100% complete...done. c7106d377219338d0c401eb59178e87a file1.txt Exporting image: 100% complete...done. 709a56ea0c3b9530136a51bdf2cd29a2 file1.txt Exporting image: 100% complete...done. a91243dfa1165c49e9cd3dc316547f15 file1.txt [ceph: root@tala001 /]# Site-b: [ceph: root@tala005 /]# rbd export pool_2/image_small1 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_2/image_small2 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_2/image_large1 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_2/image_large2 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_2/image_large3 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt Exporting image: 100% complete...done. 7bc9b378bc8f719570345ff414da70ff file1.txt Exporting image: 100% complete...done. 56c78f46fa48d3889797feead6b85eed file1.txt Exporting image: 100% complete...done. c7106d377219338d0c401eb59178e87a file1.txt Exporting image: 100% complete...done. 709a56ea0c3b9530136a51bdf2cd29a2 file1.txt Exporting image: 100% complete...done. a91243dfa1165c49e9cd3dc316547f15 file1.txt [ceph: root@tala005 /]# 11) Write 80% of the data [ceph: root@tala001 /]# rbd bench --io-type write --io-threads 16 --io-total 800M --io-pattern rand --io-size 4096 pool_2/image_small1 --debug_rbd 0 rbd bench --io-type write --io-threads 16 --io-total 800M --io-pattern rand --io-size 4096 pool_2/image_small2 --debug_rbd 0 rbd bench --io-type write --io-threads 16 --io-total 8G --io-pattern rand --io-size 4096 pool_2/image_large1 --debug_rbd 0 rbd bench --io-type write --io-threads 16 --io-total 8G --io-pattern rand --io-size 4096 pool_2/image_large2 --debug_rbd 0 rbd bench --io-type write --io-threads 16 --io-total 8G --io-pattern rand --io-size 4096 pool_2/image_large3 --debug_rbd 0 bench type write io_size 4096 io_threads 16 bytes 838860800 pattern random SEC OPS OPS/SEC BYTES/SEC 1 64944 65024.6 254 MiB/s 2 129088 64583.9 252 MiB/s 3 195728 65269.4 255 MiB/s elapsed: 3 ops: 204800 ops/sec: 65077.5 bytes/sec: 254 MiB/s bench type write io_size 4096 io_threads 16 bytes 838860800 pattern random SEC OPS OPS/SEC BYTES/SEC 1 63904 63983.6 250 MiB/s 2 125888 62983.1 246 MiB/s 3 190592 63556.8 248 MiB/s elapsed: 3 ops: 204800 ops/sec: 63523.2 bytes/sec: 248 MiB/s bench type write io_size 4096 io_threads 16 bytes 8589934592 pattern random SEC OPS OPS/SEC BYTES/SEC 1 46144 46205.9 180 MiB/s 2 114640 57356.3 224 MiB/s 3 182416 60830.6 238 MiB/s 4 241760 60458.8 236 MiB/s 5 296544 59323.5 232 MiB/s 6 350880 60946.8 238 MiB/s 7 406592 58390.1 228 MiB/s 8 473008 58118.1 227 MiB/s 9 542864 60220.4 235 MiB/s 10 609360 62562.8 244 MiB/s 11 678320 65487.6 256 MiB/s 12 745552 67791.6 265 MiB/s 13 815072 68412.4 267 MiB/s 14 885648 68556.4 268 MiB/s 15 953312 68790 269 MiB/s 16 1020720 68479.6 267 MiB/s 17 1088528 68594.8 268 MiB/s 18 1155888 68162.8 266 MiB/s 19 1223680 67606 264 MiB/s 20 1290688 67474.8 264 MiB/s 21 1356832 67222 263 MiB/s 22 1424416 67177.2 262 MiB/s 23 1492016 67225.2 263 MiB/s 24 1560768 67417.2 263 MiB/s 25 1631600 68182 266 MiB/s 26 1708832 70385.5 275 MiB/s 27 1787376 72591.6 284 MiB/s 28 1865744 74745.2 292 MiB/s 29 1940960 76037.9 297 MiB/s 30 2019248 77529.1 303 MiB/s elapsed: 31 ops: 2097152 ops/sec: 67647.5 bytes/sec: 264 MiB/s bench type write io_size 4096 io_threads 16 bytes 8589934592 pattern random SEC OPS OPS/SEC BYTES/SEC 1 49616 49681.4 194 MiB/s 2 107904 53986.7 211 MiB/s 3 172400 57490.8 225 MiB/s 4 231040 57778.1 226 MiB/s 5 286960 57406.3 224 MiB/s 6 343024 58681.3 229 MiB/s 7 401056 58630.1 229 MiB/s 8 459632 57446.1 224 MiB/s 9 518272 57446.1 224 MiB/s 10 577120 58031.7 227 MiB/s 11 638816 59158 231 MiB/s 12 706672 61122.8 239 MiB/s 13 774368 62946.8 246 MiB/s 14 843168 64978.8 254 MiB/s 15 911920 66959.6 262 MiB/s 16 980736 68383.6 267 MiB/s 17 1052224 69110 270 MiB/s 18 1124464 70018.8 274 MiB/s 19 1196496 70665.2 276 MiB/s 20 1268288 71273.2 278 MiB/s 21 1339792 71810.8 281 MiB/s 22 1410880 71730.8 280 MiB/s 23 1479072 70921.2 277 MiB/s 24 1549968 70694 276 MiB/s 25 1620576 70457.2 275 MiB/s 26 1691360 70313.2 275 MiB/s 27 1761360 70095.6 274 MiB/s 28 1826112 69407.6 271 MiB/s 29 1891424 68290.8 267 MiB/s 30 1959328 67750 265 MiB/s 31 2026784 67084.4 262 MiB/s 32 2094448 66617.2 260 MiB/s elapsed: 32 ops: 2097152 ops/sec: 65451.8 bytes/sec: 256 MiB/s bench type write io_size 4096 io_threads 16 bytes 8589934592 pattern random SEC OPS OPS/SEC BYTES/SEC 1 57760 57833.5 226 MiB/s 2 125536 62807 245 MiB/s 3 181248 60441.1 236 MiB/s 4 239104 59794.6 234 MiB/s 5 294416 58897.8 230 MiB/s 6 349904 58428.5 228 MiB/s 7 408896 56671.7 221 MiB/s 8 476720 59094 231 MiB/s 9 543792 60937.2 238 MiB/s 10 611696 63455.6 248 MiB/s 11 678768 65772.4 257 MiB/s 12 749728 68166 266 MiB/s 13 823680 69391.6 271 MiB/s 14 898288 70898.8 277 MiB/s 15 970544 71769.2 280 MiB/s 16 1045216 73289.2 286 MiB/s 17 1119392 73932.4 289 MiB/s 18 1189616 73186.8 286 MiB/s 19 1261104 72562.8 283 MiB/s 20 1333712 72633.2 284 MiB/s 21 1406800 72316.4 282 MiB/s 22 1480528 72226.8 282 MiB/s 23 1556112 73298.8 286 MiB/s 24 1631280 74034.8 289 MiB/s 25 1706080 74473.2 291 MiB/s 26 1779504 74540.4 291 MiB/s 27 1854704 74834.8 292 MiB/s 28 1929392 74655.6 292 MiB/s 29 2005104 74764.4 292 MiB/s 30 2079760 74735.6 292 MiB/s elapsed: 30 ops: 2097152 ops/sec: 69363.6 bytes/sec: 271 MiB/s [ceph: root@tala001 /]# 12) Take manual group snapshot [ceph: root@tala001 /]# rbd group snap list --group group_1 --pool pool_2 --debug_rbd 0 ID NAME STATE NAMESPACE 2b0622ae17422 .mirror.primary.04cbe7ba-c59a-4b86-b0b6-94c4bf07cd1d.2b0622ae17422 complete mirror (primary peer_uuids:[]) 2b0d4c69ab36a .mirror.primary.04cbe7ba-c59a-4b86-b0b6-94c4bf07cd1d.2b0d4c69ab36a complete mirror (primary peer_uuids:[e41d994b-5428-4a4c-a96f-e7c91a1a270b]) [ceph: root@tala001 /]# rbd mirror group snapshot -p pool_2 --group group_1 --debug_rbd 0 Snapshot ID: 2b164e0c03694 [ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc 0 0 0 [ceph: root@tala001 /]# rbd group snap list --group group_1 --pool pool_2 --debug_rbd 0 ID NAME STATE NAMESPACE 2b0d4c69ab36a .mirror.primary.04cbe7ba-c59a-4b86-b0b6-94c4bf07cd1d.2b0d4c69ab36a complete mirror (primary peer_uuids:[e41d994b-5428-4a4c-a96f-e7c91a1a270b]) 2b164e0c03694 .mirror.primary.04cbe7ba-c59a-4b86-b0b6-94c4bf07cd1d.2b164e0c03694 complete mirror (primary peer_uuids:[e41d994b-5428-4a4c-a96f-e7c91a1a270b]) [ceph: root@tala001 /]# 13) calculate md5sum of all files on site-a: site-a: [ceph: root@tala001 /]# rbd export pool_2/image_small1 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_2/image_small2 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_2/image_large1 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_2/image_large2 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_2/image_large3 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt Exporting image: 100% complete...done. 3b7848fac2830f7b4704dbcc68d334b1 file1.txt Exporting image: 100% complete...done. 0ef21589ce944b0feaa51d1cd7d65bef file1.txt Exporting image: 100% complete...done. 0ebf7c09ee1f025370f8649a125deb6a file1.txt Exporting image: 100% complete...done. fadad2860dece90cb7129a52e498b6c8 file1.txt Exporting image: 100% complete...done. c117982a1830506533ffb8dfe88e5385 file1.txt [ceph: root@tala001 /]# 13) Once 2 smaller files are completely synced, match md5sum of smaller files on site-b, should match with step-10 Site-b: [ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc 0 0 0 [ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc 0 0 0 [ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc 0 0 0 [ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc 0 0 0 [ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc 5 15 1749 [ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc 5 15 1749 [ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc 5 15 1749 [ceph: root@tala001 /]# [ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc 5 15 1749 [ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc 5 15 1749 [ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc 5 15 1749 [ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc 5 15 1749 [ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc 5 15 1749 [ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc 3 9 1051 [ceph: root@tala001 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 | grep "syncing" | wc 3 9 1051 [ceph: root@tala001 /]# [ceph: root@tala005 /]# rbd export pool_2/image_small1 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_2/image_small2 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt Exporting image: 100% complete...done. 3b7848fac2830f7b4704dbcc68d334b1 file1.txt Exporting image: 100% complete...done. 0ef21589ce944b0feaa51d1cd7d65bef file1.txt [ceph: root@tala005 /]# 14) Once the larger files are still in progress, do force promote on site-b: [ceph: root@tala005 /]# rbd group snap list --pool pool_2 --group group_1 --debug_rbd 0 ID NAME STATE NAMESPACE 2b0d4c69ab36a .mirror.non-primary.04cbe7ba-c59a-4b86-b0b6-94c4bf07cd1d.2b0d4c69ab36a complete mirror (non-primary peer_uuids:[] f47a67ba-9c21-4d6d-9e07-31917c44f3b9:2b0d4c69ab36a) 2b164e0c03694 .mirror.non-primary.04cbe7ba-c59a-4b86-b0b6-94c4bf07cd1d.2b164e0c03694 incomplete mirror (non-primary peer_uuids:[] f47a67ba-9c21-4d6d-9e07-31917c44f3b9:2b164e0c03694) [ceph: root@tala005 /]# rbd mirror group promote --pool pool_2 --group group_1 --force --debug_rbd 0 Group promoted to primary [ceph: root@tala005 /]# rbd group snap list --pool pool_2 --group group_1 --debug_rbd 0 ID NAME STATE NAMESPACE 1cc41ae0ab3e .mirror.primary.04cbe7ba-c59a-4b86-b0b6-94c4bf07cd1d.1cc41ae0ab3e complete mirror (primary peer_uuids:[cda556b9-9c5d-42c1-a595-df0c0fcb056d]) [ceph: root@tala005 /]# [ceph: root@tala005 /]# rbd mirror group status pool_2/group_1 --debug_rbd 0 group_1: global_id: 04cbe7ba-c59a-4b86-b0b6-94c4bf07cd1d state: up+stopped description: local group is primary service: tala005.jwqeia on tala005 last_update: 2025-04-30 12:41:05 images: image: 13/53ce4bf9-5977-4a09-81a8-e869ab65b61b state: up+stopped description: local image is primary image: 13/99f4d143-0453-416f-aabe-55585d553bd5 state: up+stopped description: local image is primary image: 13/a1b7e1f6-e474-480e-971e-46ec1f2c2d8a state: up+stopped description: local image is primary image: 13/a1f3931f-a27c-4456-8df6-a1f39442229a state: up+stopped description: local image is primary image: 13/f378f4c8-8bc6-4f87-9206-2bf9488d96fa state: up+stopped description: local image is primary peer_sites: name: site-a state: up+stopped description: local group is primary last_update: 2025-04-30 12:41:09 images: image: 13/53ce4bf9-5977-4a09-81a8-e869ab65b61b state: up+stopped description: local image is primary image: 13/99f4d143-0453-416f-aabe-55585d553bd5 state: up+stopped description: local image is primary image: 13/a1b7e1f6-e474-480e-971e-46ec1f2c2d8a state: up+stopped description: local image is primary image: 13/a1f3931f-a27c-4456-8df6-a1f39442229a state: up+stopped description: local image is primary image: 13/f378f4c8-8bc6-4f87-9206-2bf9488d96fa state: up+stopped description: local image is primary snapshots: .mirror.primary.04cbe7ba-c59a-4b86-b0b6-94c4bf07cd1d.1cc41ae0ab3e [ceph: root@tala005 /]# 15) Calculate md5sum of all 5 files on site-b Site-b: (Expected to revert to match with step-10. but roll back is happening to step-7) rbd export pool_2/image_small1 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_2/image_small2 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_2/image_large1 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_2/image_large2 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt rbd export pool_2/image_large3 file1.txt --debug_rbd 0 md5sum file1.txt rm -rf file1.txt Exporting image: 100% complete...done. b766f3ffb18a40013f99ee3a5f274c4e file1.txt Exporting image: 100% complete...done. 3c9f58793a78877dd0a024d451358edc file1.txt Exporting image: 100% complete...done. e31145888d2505bb64af72ef83eb41ea file1.txt Exporting image: 100% complete...done. 90a481e5cb78972495b7dd2061146ce5 file1.txt Exporting image: 100% complete...done. 67873822f8af199145c221633b14cd3b file1.txt [ceph: root@tala005 /]# As, the text is long, option 3 logs are pasted in file. Kindly find it attached. Version-Release number of selected component (if applicable): ceph version 19.2.1-167.el9cp (3e3ca3a16912abfd58b473e2ae724703f9a0415d) squid (stable) How reproducible: All the time Steps to Reproduce: Listed above Actual results: md5sum not matching, files are empty Expected results: md5sum should match with latest consistent snapshot Additional info: NA