Description of problem: Nithya was looking into something else when she ran these steps: 1. Created a group with 2 images, enabled it for mirroring and waited until it synced to the secondary. 2. Created 3 user group snaps snap1, snap2, snap3 and then created a mirror snapshot on the primary. 3. Listed the group and image snaps for 1 image on the secondary every second: Fri May 16 12:07:45 PM IST 2025 ID NAME STATE NAMESPACE 106376cf19a1 .mirror.non-primary.37f42794-30d3-48ef-a9f4-adb1c35f7ac8.106376cf19a1 complete mirror (non-primary peer_uuids:[] f4822494-dcad-4052-968d-06ece8573023:106376cf19a1) 1065ecbf3075 snap1 incomplete user Image snaps :SNAPID NAME SIZE PROTECTED TIMESTAMP NAMESPACE 15 .mirror.non_primary.079c83d3-f96e-48ca-8bfe-c733f74301e9.2_10358a37e370_106376cf19a1 12 MiB Fri May 16 12:03:45 2025 mirror (non-primary peer_uuids:[] f4822494-dcad-4052-968d-06ece8573023:13 copied) ------------------------------------------ ####### The second user group snap snap2 is created 5 s later. Fri May 16 12:07:50 PM IST 2025 ID NAME STATE NAMESPACE 106376cf19a1 .mirror.non-primary.37f42794-30d3-48ef-a9f4-adb1c35f7ac8.106376cf19a1 complete mirror (non-primary peer_uuids:[] f4822494-dcad-4052-968d-06ece8573023:106376cf19a1) 1065ecbf3075 snap1 incomplete user 106793333556 snap2 incomplete user Image snaps :SNAPID NAME SIZE PROTECTED TIMESTAMP NAMESPACE 15 .mirror.non_primary.079c83d3-f96e-48ca-8bfe-c733f74301e9.2_10358a37e370_106376cf19a1 12 MiB Fri May 16 12:03:45 2025 mirror (non-primary peer_uuids:[] f4822494-dcad-4052-968d-06ece8573023:13 copied) ------------------------------------------ ####### The third user group snap snap3 is created 5 s later. Fri May 16 12:07:56 PM IST 2025 ID NAME STATE NAMESPACE 106376cf19a1 .mirror.non-primary.37f42794-30d3-48ef-a9f4-adb1c35f7ac8.106376cf19a1 complete mirror (non-primary peer_uuids:[] f4822494-dcad-4052-968d-06ece8573023:106376cf19a1) 1065ecbf3075 snap1 incomplete user 106793333556 snap2 incomplete user 10697c05fdec snap3 incomplete user Image snaps :SNAPID NAME SIZE PROTECTED TIMESTAMP NAMESPACE 15 .mirror.non_primary.079c83d3-f96e-48ca-8bfe-c733f74301e9.2_10358a37e370_106376cf19a1 12 MiB Fri May 16 12:03:45 2025 mirror (non-primary peer_uuids:[] f4822494-dcad-4052-968d-06ece8573023:13 copied) ------------------------------------------ ### Finally the incomplete mirror snap is created after another 5s. And now a new Image mirror snap is finally created and starts syncing Fri May 16 12:08:01 PM IST 2025 ID NAME STATE NAMESPACE 106376cf19a1 .mirror.non-primary.37f42794-30d3-48ef-a9f4-adb1c35f7ac8.106376cf19a1 complete mirror (non-primary peer_uuids:[] f4822494-dcad-4052-968d-06ece8573023:106376cf19a1) 1065ecbf3075 snap1 incomplete user 106793333556 snap2 incomplete user 10697c05fdec snap3 incomplete user 106bc744cc77 .mirror.non-primary.37f42794-30d3-48ef-a9f4-adb1c35f7ac8.106bc744cc77 incomplete mirror (non-primary peer_uuids:[] f4822494-dcad-4052-968d-06ece8573023:106bc744cc77) Image snaps :SNAPID NAME SIZE PROTECTED TIMESTAMP NAMESPACE 18 .mirror.non_primary.079c83d3-f96e-48ca-8bfe-c733f74301e9.2_10358a37e370_106376cf19a1 12 MiB Fri May 16 12:08:01 2025 mirror (non-primary peer_uuids:[] f4822494-dcad-4052-968d-06ece8573023:13 33% copied) ------------------------------------------ Initial observations: This is something really weird going in there: snap 13 was already synced to the secondary, attached to snap 15. It looks like the act of syncing a mirror snapshot that logically contains user snapshots removed the already synced mirror snapshot only to recreate it a few moments later? Notice 15 .mirror.non_primary.079c83d3-f96e-48ca-8bfe-c733f74301e9.2_10358a37e370_106376cf19a1 12 MiB Fri May 16 12:03:45 2025 mirror (non-primary peer_uuids:[] f4822494-dcad-4052-968d-06ece8573023:13 copied) and then in the next output 18 .mirror.non_primary.079c83d3-f96e-48ca-8bfe-c733f74301e9.2_10358a37e370_106376cf19a1 12 MiB Fri May 16 12:08:01 2025 mirror (non-primary peer_uuids:[] f4822494-dcad-4052-968d-06ece8573023:13 33% copied) Two different non-primary snapshots (15 and 18) for the same group snapshot (106376cf19a1) attached to the same primary snapshot (13) at different points in time Relevant log snippets: 2025-05-16T12:08:00.662+0530 7f4ca01cd6c0 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x561656b38000 load_local_image_meta: 2025-05-16T12:08:00.662+0530 7f4ca01cd6c0 15 librbd::mirror::snapshot::ImageMeta: 0x561656365ea0 load: oid=rbd_header.10352cde925c, key=.rbd_mirror.71ae63ed-8eb1-48e5-88a0-40c462d34478 2025-05-16T12:08:00.664+0530 7f4c9a1c16c0 15 librbd::mirror::snapshot::ImageMeta: 0x561656365ea0 handle_load: r=-2 2025-05-16T12:08:00.664+0530 7f4c9a1c16c0 15 librbd::mirror::snapshot::ImageMeta: 0x561656365ea0 handle_load: no snapshot-based mirroring image-meta: (2) No such file or directory 2025-05-16T12:08:00.664+0530 7f4c9a1c16c0 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x561656b38000 handle_load_local_image_meta: r=-2 2025-05-16T12:08:00.664+0530 7f4c9a1c16c0 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x561656b38000 scan_local_mirror_snapshots: 2025-05-16T12:08:00.664+0530 7f4c9a1c16c0 15 rbd::mirror::image_replayer::snapshot::Replayer: 0x561656b38000 scan_local_mirror_snapshots: local mirror snapshot: id=15, mirror_ns=[mirror state=non-primary, complete=1, mirror_peer_uuids=, primary_mirror_uuid=f4822494-dcad-4052-968d-06ece8573023, primary_snap_id=d, last_copied_object_number=3, snap_seqs={13=18446744073709551614}, group_spec={pool_id=2, group_id=10358a37e370}, group_snap_id=106376cf19a1] 2025-05-16T12:08:00.664+0530 7f4c9a1c16c0 5 rbd::mirror::image_replayer::snapshot::Replayer: 0x561656b38000 scan_local_mirror_snapshots: pruning unused non-primary snapshot 15 2025-05-16T12:08:00.664+0530 7f4c9a1c16c0 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x561656b38000 prune_non_primary_snapshot: snap_id=15 2025-05-16T12:08:00.664+0530 7f4c9a1c16c0 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x561656b38000 prune_non_primary_snapshot: removing from group list snap_id=15 2025-05-16T12:08:00.664+0530 7f4c9a1c16c0 5 librbd::Operations: 0x561656063790 snap_remove: snap_name=.mirror.non_primary.079c83d3-f96e-48ca-8bfe-c733f74301e9.2_10358a37e370_106376cf19a1 The image snap with id 15 is being pruned even though it is the only local image mirror snap. The group replayer is pruning these here: 2025-05-16T12:07:55.627+0530 7f4c999c06c0 10 librbd::group::ListSnapshotsRequest: 0x561656bb0b60 handle_list_snaps: r=0 2025-05-16T12:07:55.628+0530 7f4c999c06c0 10 librbd::group::ListSnapshotsRequest: 0x561656bb0b60 sort_snaps: 2025-05-16T12:07:55.628+0530 7f4c999c06c0 10 librbd::group::ListSnapshotsRequest: 0x561656bb0b60 finish: r=0 2025-05-16T12:07:55.628+0530 7f4c999c06c0 10 rbd::mirror::group_replayer::Replayer: 0x561656177680 handle_load_local_group_snapshots: r=0 2025-05-16T12:07:55.628+0530 7f4c999c06c0 10 rbd::mirror::group_replayer::Replayer: 0x561656177680 load_remote_group_snapshots: m_remote_group_id=1059ed525071 2025-05-16T12:07:55.630+0530 7f4c931b36c0 10 rbd::mirror::group_replayer::Replayer: 0x561656177680 handle_load_remote_group_snapshots: r=0 2025-05-16T12:07:55.630+0530 7f4c931b36c0 10 rbd::mirror::group_replayer::Replayer: 0x561656177680 is_resync_requested: m_local_group_id=10358a37e370 2025-05-16T12:07:55.631+0530 7f4c931b36c0 10 rbd::mirror::group_replayer::Replayer: 0x561656177680 prune_all_image_snapshots: attempting to unlink image snaps from group snap: 106376cf19a1 2025-05-16T12:07:55.632+0530 7f4c931b36c0 10 rbd::mirror::group_replayer::Replayer: 0x561656177680 prune_all_image_snapshots: pruning: f 2025-05-16T12:07:55.633+0530 7f4c931b36c0 10 rbd::mirror::group_replayer::Replayer: 0x561656177680 prune_all_image_snapshots: pruning: 10 2025-05-16T12:07:55.633+0530 7f4c931b36c0 10 rbd::mirror::group_replayer::Replayer: 0x561656177680 scan_for_unsynced_group_snapshots: 2025-05-16T12:07:55.633+0530 7f4c931b36c0 10 rbd::mirror::group_replayer::Replayer: 0x561656177680 try_create_group_snapshot: 10697c05fdec How reproducible: Always (later reproduced by myself and Prasanna)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 8.1 security and bug fix updates), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2025:14015