Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 2368629

Summary: rbd-mirror daemon going to UNKNOWN state on deleting user group snapshot and recreating again alongside with same name
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: aarsharm
Component: RBD-MirrorAssignee: Prasanna Kumar Kalever <prasanna.kalever>
Status: CLOSED ERRATA QA Contact: aarsharm
Severity: high Docs Contact: Rivka Pollack <rpollack>
Priority: unspecified    
Version: 8.1CC: ceph-eng-bugs, cephqe-warriors, idryomov, prasanna.kalever, rpollack, sangadi, tserlin
Target Milestone: ---   
Target Release: 8.1z2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-19.2.1-240.el9cp Doc Type: Bug Fix
Doc Text:
.rbd-mirror daemon no longer enters the `UNKNOWN` state when reusing a user group snapshot Previously, when deleting a group snapshot on the primary cluster and recreating another with the same name before the deletion of the original group snapshot is mirrored to the secondary cluster, the rbd-mirror daemon would enter an `UNKNOWN` state. With this fix, group snapshot IDs are now compared instead of the IDs. As a result, a snapshot is only preserved if the ID match and not with the names. Sync issues are now resolved when group snapshot names are reused and rbd-mirror daemons no longer enter the `UNKNOWN` state due to stale remote group snapshots.
Story Points: ---
Clone Of: Environment:
Last Closed: 2025-08-18 14:02:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description aarsharm 2025-05-26 18:20:22 UTC
Description of problem:

The Adhoc scenario focuses on playing around snapshot sequencing both with user create group snapshots and system created mirror group snapshots.

Test Steps:
1.Create rbd image
2.Add to the group
3.Create group snapshot 'snap_1'

4. enable mirroring on the group
Now group has 2 snapshot (one user created group snapshot and another system created mirror group snapshot)
[ceph: root@ceph-rbd1-aarti-1-z8kn5p-node1-installer /]# rbd group snap list --pool pool_1 --group group_1 --debug_rbd 0
ID            NAME                                                               STATE     NAMESPACE
4056f9d2de72  snap_1                                                             complete  user
4062a3fd2d59  .mirror.primary.59744227-253d-4352-a195-042c8b544655.4062a3fd2d59  complete  mirror (primary peer_uuids:[fd3be1a4-06a4-4c0d-9f24-59f93dff3fdb])
[ceph: root@ceph-rbd1-aarti-1-z8kn5p-node1-installer /]#

5. Wait for both snapshot to be completed on site-b
[root@ceph-rbd2-aarti-1-z8kn5p-node1-installer ~]# rbd group snap list --pool pool_1 --group group_1 --debug_rbd 0
ID            NAME                                                                   STATE     NAMESPACE
4056f9d2de72  snap_1                                                                 complete  user
4062a3fd2d59  .mirror.non-primary.59744227-253d-4352-a195-042c8b544655.4062a3fd2d59  complete  mirror (non-primary peer_uuids:[] 51835de5-c34b-4658-874d-8eee505d2681:4062a3fd2d59)
[root@ceph-rbd2-aarti-1-z8kn5p-node1-installer ~]#

6. Delete user group snap_1 from site-a
[ceph: root@ceph-rbd1-aarti-1-z8kn5p-node1-installer /]# rbd group snap remove --pool pool_1 --group group_1 --snap snap_1 --debug_rbd 0
[ceph: root@ceph-rbd1-aarti-1-z8kn5p-node1-installer /]# rbd group snap list --pool pool_1 --group group_1 --debug_rbd 0
ID            NAME                                                               STATE     NAMESPACE
4062a3fd2d59  .mirror.primary.59744227-253d-4352-a195-042c8b544655.4062a3fd2d59  complete  mirror (primary peer_uuids:[fd3be1a4-06a4-4c0d-9f24-59f93dff3fdb])
[ceph: root@ceph-rbd1-aarti-1-z8kn5p-node1-installer /]#

7. Create user group snap_1 again on site-a
[ceph: root@ceph-rbd1-aarti-1-z8kn5p-node1-installer /]# rbd group snap create --pool pool_1 --group group_1 --snap snap_1 --debug_rbd 0
[ceph: root@ceph-rbd1-aarti-1-z8kn5p-node1-installer /]# rbd group snap list --pool pool_1 --group group_1 --debug_rbd 0
ID            NAME                                                               STATE     NAMESPACE
4062a3fd2d59  .mirror.primary.59744227-253d-4352-a195-042c8b544655.4062a3fd2d59  complete  mirror (primary peer_uuids:[fd3be1a4-06a4-4c0d-9f24-59f93dff3fdb])
407a981c5c65  snap_1                                                             complete  user
[ceph: root@ceph-rbd1-aarti-1-z8kn5p-node1-installer /]#

8. Create another mirror group snapshot
[ceph: root@ceph-rbd1-aarti-1-z8kn5p-node1-installer /]# rbd mirror group snapshot --pool pool_1 --group group_1 --debug_rbd 0
Snapshot ID: 408c15feb27f
[ceph: root@ceph-rbd1-aarti-1-z8kn5p-node1-installer /]# rbd group snap list --pool pool_1 --group group_1 --debug_rbd 0
ID            NAME                                                               STATE     NAMESPACE
4062a3fd2d59  .mirror.primary.59744227-253d-4352-a195-042c8b544655.4062a3fd2d59  complete  mirror (primary peer_uuids:[fd3be1a4-06a4-4c0d-9f24-59f93dff3fdb])
407a981c5c65  snap_1                                                             complete  user
408c15feb27f  .mirror.primary.59744227-253d-4352-a195-042c8b544655.408c15feb27f  complete  mirror (primary peer_uuids:[fd3be1a4-06a4-4c0d-9f24-59f93dff3fdb])
[ceph: root@ceph-rbd1-aarti-1-z8kn5p-node1-installer /]#

9. The mirror group snapshot created in step#8 is not getting propagated to site-b, Also group status on site-b toggles between "up+starting_replay" and "down+starting_replay".  Additionally: After some time i see rbd-mirror daemon going down in UNKNOWN state
[root@ceph-rbd2-aarti-1-z8kn5p-node1-installer ~]#  rbd group snap list --pool pool_1 --group group_1 --debug_rbd 0
ID            NAME                                                                   STATE     NAMESPACE
4056f9d2de72  snap_1                                                                 complete  user
4062a3fd2d59  .mirror.non-primary.59744227-253d-4352-a195-042c8b544655.4062a3fd2d59  complete  mirror (non-primary peer_uuids:[] 51835de5-c34b-4658-874d-8eee505d2681:4062a3fd2d59)
[root@ceph-rbd2-aarti-1-z8kn5p-node1-installer ~]#
[root@ceph-rbd2-aarti-1-z8kn5p-node1-installer ~]# rbd mirror group status --pool pool_1 --group group_1 --debug_rbd 0
group_1:
  global_id:   59744227-253d-4352-a195-042c8b544655
  state:       up+starting_replay
  description: starting replay
  service:     ceph-rbd2-aarti-1-z8kn5p-node5.mfqqkl on ceph-rbd2-aarti-1-z8kn5p-node5
  last_update: 2025-05-26 17:20:21
  images:
    image:       6/80ac99a8-3fd9-4990-8065-32576e15ac5a
    state:       up+replaying
    description: replaying, {"bytes_per_second":0.0,"bytes_per_snapshot":0.0,"last_snapshot_bytes":0,"last_snapshot_sync_seconds":0,"local_snapshot_timestamp":1748279761,"remote_snapshot_timestamp":1748279916,"replay_state":"idle"}
  peer_sites:
    name: site-a
    state: up+stopped
    description: local group is primary
    last_update: 2025-05-26 17:20:35
    images:
      image:       6/80ac99a8-3fd9-4990-8065-32576e15ac5a
      state:       up+stopped
      description: local image is primary
[root@ceph-rbd2-aarti-1-z8kn5p-node1-installer ~]# rbd mirror group status --pool pool_1 --group group_1 --debug_rbd 0
group_1:
  global_id:   59744227-253d-4352-a195-042c8b544655
  state:       down+starting_replay
  description: starting replay
  last_update: 2025-05-26 17:20:21
  images:
    image:       6/80ac99a8-3fd9-4990-8065-32576e15ac5a
    state:       down+replaying
    description: replaying, {"bytes_per_second":0.0,"bytes_per_snapshot":0.0,"last_snapshot_bytes":0,"last_snapshot_sync_seconds":0,"local_snapshot_timestamp":1748279761,"remote_snapshot_timestamp":1748279916,"replay_state":"idle"}
  peer_sites:
    name: site-a
    state: up+stopped
    description: local group is primary
    last_update: 2025-05-26 17:20:35
    images:
      image:       6/80ac99a8-3fd9-4990-8065-32576e15ac5a
      state:       up+stopped
      description: local image is primary
[root@ceph-rbd2-aarti-1-z8kn5p-node1-installer ~]#
[root@ceph-rbd2-aarti-1-z8kn5p-node1-installer ~]# rbd mirror pool status pool_1
2025-05-26T17:19:34.962+0000 7f7c179653c0 20 librbd::api::mirror: mode_get:
2025-05-26T17:19:34.967+0000 7f7c179653c0 20 librbd::api::mirror: group_status_summary:
health: WARNING
daemon health: OK
image health: OK
group health: WARNING
images: 1 total
    1 replaying
groups: 1 total
    1 starting_replay
[root@ceph-rbd2-aarti-1-z8kn5p-node1-installer ~]#
[root@ceph-rbd2-aarti-1-z8kn5p-node1-installer ~]# ceph orch ps
NAME                                                 HOST                                      PORTS        STATUS        REFRESHED  AGE  MEM USE  MEM LIM  VERSION           IMAGE ID      CONTAINER ID
mgr.ceph-rbd2-aarti-1-z8kn5p-node1-installer.vilieq  ceph-rbd2-aarti-1-z8kn5p-node1-installer  *:9283,8765  running (3h)     3m ago   3h     526M        -  19.2.1-210.el9cp  56436d51c94e  66010229e45d
mgr.ceph-rbd2-aarti-1-z8kn5p-node3.fkxxkx            ceph-rbd2-aarti-1-z8kn5p-node3            *:8443,8765  running (3h)     3m ago   3h     451M        -  19.2.1-210.el9cp  56436d51c94e  427c83a169cd
mon.ceph-rbd2-aarti-1-z8kn5p-node1-installer         ceph-rbd2-aarti-1-z8kn5p-node1-installer               running (3h)     3m ago   3h     129M    2048M  19.2.1-210.el9cp  56436d51c94e  28c1ea5e0168
mon.ceph-rbd2-aarti-1-z8kn5p-node3                   ceph-rbd2-aarti-1-z8kn5p-node3                         running (3h)     3m ago   3h     123M    2048M  19.2.1-210.el9cp  56436d51c94e  73ae26336379
mon.ceph-rbd2-aarti-1-z8kn5p-node4                   ceph-rbd2-aarti-1-z8kn5p-node4                         running (3h)     3m ago   3h     122M    2048M  19.2.1-210.el9cp  56436d51c94e  8cb6c6f761c4
osd.0                                                ceph-rbd2-aarti-1-z8kn5p-node3                         running (3h)     3m ago   3h     901M    4096M  19.2.1-210.el9cp  56436d51c94e  d031a0724f27
osd.1                                                ceph-rbd2-aarti-1-z8kn5p-node4                         running (3h)     3m ago   3h     692M    1088M  19.2.1-210.el9cp  56436d51c94e  df2827a8729d
osd.2                                                ceph-rbd2-aarti-1-z8kn5p-node5                         running (3h)     2m ago   3h     400M    1088M  19.2.1-210.el9cp  56436d51c94e  b48fece77aca
osd.3                                                ceph-rbd2-aarti-1-z8kn5p-node3                         running (3h)     3m ago   3h     603M    4096M  19.2.1-210.el9cp  56436d51c94e  a6a1a59520cd
osd.4                                                ceph-rbd2-aarti-1-z8kn5p-node4                         running (3h)     3m ago   3h     547M    1088M  19.2.1-210.el9cp  56436d51c94e  687350318d43
osd.5                                                ceph-rbd2-aarti-1-z8kn5p-node5                         running (3h)     2m ago   3h     715M    1088M  19.2.1-210.el9cp  56436d51c94e  9c8d23226e0a
osd.6                                                ceph-rbd2-aarti-1-z8kn5p-node3                         running (3h)     3m ago   3h     290M    4096M  19.2.1-210.el9cp  56436d51c94e  7207903165e5
osd.7                                                ceph-rbd2-aarti-1-z8kn5p-node4                         running (3h)     3m ago   3h     536M    1088M  19.2.1-210.el9cp  56436d51c94e  1c575003c0c0
osd.8                                                ceph-rbd2-aarti-1-z8kn5p-node5                         running (3h)     2m ago   3h     329M    1088M  19.2.1-210.el9cp  56436d51c94e  e173691d4dd0
osd.9                                                ceph-rbd2-aarti-1-z8kn5p-node3                         running (3h)     3m ago   3h     607M    4096M  19.2.1-210.el9cp  56436d51c94e  8d29da8e785c
osd.10                                               ceph-rbd2-aarti-1-z8kn5p-node4                         running (3h)     3m ago   3h     764M    1088M  19.2.1-210.el9cp  56436d51c94e  51839ad897ce
osd.11                                               ceph-rbd2-aarti-1-z8kn5p-node5                         running (3h)     2m ago   3h     633M    1088M  19.2.1-210.el9cp  56436d51c94e  38f1e0758edd
rbd-mirror.ceph-rbd2-aarti-1-z8kn5p-node5.mfqqkl     ceph-rbd2-aarti-1-z8kn5p-node5                         error            2m ago   3h        -        -  <unknown>         <unknown>     <unknown>
[root@ceph-rbd2-aarti-1-z8kn5p-node1-installer ~]#

Version-Release number of selected component (if applicable):
[ceph: root@ceph-rbd1-aarti-1-z8kn5p-node1-installer /]# ceph version
ceph version 19.2.1-210.el9cp (d7ac9a7e698531c972a3567a67da1f0a9a266075) squid (stable)
[ceph: root@ceph-rbd1-aarti-1-z8kn5p-node1-installer /]#


Note: Seen this issue in 8.1, but raising bz in 9.0 as we are near RC. Kindly access and move the target release if required and applicable.


How reproducible: Always


Steps to Reproduce: as above


Actual results: rbd mirror daemon going to error state


Expected results: rbd mirror daemon should not error out


Additional info:
[root@ceph-rbd2-aarti-1-z8kn5p-node1-installer ~]# ceph health detail
HEALTH_WARN 1 failed cephadm daemon(s)
[WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)
    daemon rbd-mirror.ceph-rbd2-aarti-1-z8kn5p-node5.mfqqkl on ceph-rbd2-aarti-1-z8kn5p-node5 is in error state
[root@ceph-rbd2-aarti-1-z8kn5p-node1-installer ~]#
[root@ceph-rbd2-aarti-1-z8kn5p-node1-installer ~]# ceph orch ls
NAME                       PORTS  RUNNING  REFRESHED  AGE  PLACEMENT
mgr                                   2/2  9m ago     4h   label:mgr
mon                                   3/3  9m ago     4h   label:mon
osd.all-available-devices              12  9m ago     4h   *
rbd-mirror                            0/1  8m ago     4h   ceph-rbd2-aarti-1-z8kn5p-node5
[root@ceph-rbd2-aarti-1-z8kn5p-node1-installer ~]#

Comment 14 errata-xmlrpc 2025-08-18 14:02:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 8.1 security and bug fix updates), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2025:14015