Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use Jira Cloud for all bug tracking management.

Bug 2406794

Summary: [RDR] A few RBD images report error due to incomplete group snapshots on the secondary cluster after workload deployment
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Aman Agrawal <amagrawa>
Component: RBD-MirrorAssignee: Ilya Dryomov <idryomov>
Status: CLOSED ERRATA QA Contact: Chaitanya <cdommeti>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 8.1CC: ceph-eng-bugs, cephqe-warriors, idryomov, nladha, sangadi, tserlin
Target Milestone: ---   
Target Release: 9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-20.1.0-132 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2416777 (view as bug list) Environment:
Last Closed: 2026-01-29 07:02:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Aman Agrawal 2025-10-28 13:20:20 UTC
Description of problem:


Version-Release number of selected component (if applicable):

OADP 1.5.2
ODF 4.20.0-115.stable
ACM 2.15.0-135
MCE 2.10.0-124
OCP  4.20.0-0.nightly-2025-10-07-014413 
GitOps 1.16.1
Virtulization 4.20.0-207 
Submariner 0.21.0  
ceph version 19.2.1-274.el9cp (3a2f1cec313e6abbd90d9260bd5e0e866817c3c7) squid (stable)


How reproducible: Not sure yet, hitting it for the first time with automation


Steps to Reproduce:
1. Deploy a RBD appset (pull) and subscription busybox workload on a RDR setup and let IO continue after DR protection. In this case, it was run for 10-15mins and the issue was hit.

Workloads were deployed via automation.
Console logs- https://url.corp.redhat.com/f616441

Test- tests/functional/disaster-recovery/regional-dr/test_site_failure_recovery_and_failover.py


Actual results: 

From hub-

 

echo "////////////////////////////////";date -u; echo "*******";oc get drpc -o wide -A
////////////////////////////////
Tue Oct 28 08:54:00 UTC 2025
*******
NAMESPACE             NAME                       AGE   PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE   PROGRESSION   START TIME             DURATION        PEER READY
busybox-workloads-1   busybox-placement-drpc     17h   amagrawa-hr-c1                                      Deployed       Completed     2025-10-27T15:38:18Z   26.331031576s   True
openshift-gitops      busybox-1-placement-drpc   17h   amagrawa-hr-c1                                      Deployed       Completed     2025-10-27T15:42:49Z   16.457398556s   True 
 





From C1-


last sync time for both VGRs is:
oc get vgr -A -oyaml | grep lastSyncTime
    lastSyncTime: "2025-10-28T07:10:01Z"
    lastSyncTime: "2025-10-28T07:10:01Z"


 

mirroringStatus:
      lastChecked: "2025-10-28T08:55:40Z"
      summary:
        daemon_health: OK
        group_health: WARNING
        group_states:
          replaying: 1
          stopping_replay: 1
        health: ERROR
        image_health: ERROR
        image_states:
          error: 2
          replaying: 18
        states:
          error: 2
          replaying: 18
    phase: Ready 
 

 

csi-vol-6896405b-b1c5-45eb-bc52-81a1791f5382
csi-vol-6896405b-b1c5-45eb-bc52-81a1791f5382:
  global_id:   619124f6-703a-4a53-863a-5b15cd252c55
  state:       up+stopped
  description: local image is primary
  service:     a on compute-2
  last_update: 2025-10-28 09:23:16
  peer_sites:
    name: 62b4febe-7ab7-4aed-9a88-3f1fd167d3a9
    state: up+error
    description: failed to refresh remote image
    last_update: 2025-10-28 09:22:562:56



csi-vol-68569653-519d-45c9-b4e0-5c70e94463c6
csi-vol-68569653-519d-45c9-b4e0-5c70e94463c6:
  global_id:   3aecb3f7-5823-4a00-bdad-fef5e4795363
  state:       up+stopped
  description: local image is primary
  service:     a on compute-2
  last_update: 2025-10-28 09:23:16
  peer_sites:
    name: 62b4febe-7ab7-4aed-9a88-3f1fd167d3a9
    state: up+error
    description: failed to refresh remote image
    last_update: 2025-10-28 09:22:56 
 


From C2-

 

 mirroringStatus:
      lastChecked: "2025-10-28T08:57:23Z"
      summary:
        daemon_health: OK
        group_health: OK
        group_states:
          replaying: 2
        health: ERROR
        image_health: ERROR
        image_states:
          error: 2
          replaying: 18
        states:
          error: 2
          replaying: 18
    phase: Ready



csi-vol-6896405b-b1c5-45eb-bc52-81a1791f5382
csi-vol-6896405b-b1c5-45eb-bc52-81a1791f5382:
  global_id:   619124f6-703a-4a53-863a-5b15cd252c55
  state:       up+error
  description: failed to refresh remote image
  service:     a on compute-1
  last_update: 2025-10-28 07:15:56
  peer_sites:
    name: e40ca6e1-43bb-4426-aa99-aa9f3d1a7d1b
    state: up+stopped
    description: local image is primary
    last_update: 2025-10-28 09:22:412:57


csi-vol-68569653-519d-45c9-b4e0-5c70e94463c6
csi-vol-68569653-519d-45c9-b4e0-5c70e94463c6:
  global_id:   3aecb3f7-5823-4a00-bdad-fef5e4795363
  state:       up+error
  description: failed to refresh remote image
  service:     a on compute-1
  last_update: 2025-10-28 07:15:56
  peer_sites:
    name: e40ca6e1-43bb-4426-aa99-aa9f3d1a7d1b
    state: up+stopped
    description: local image is primary
    last_update: 2025-10-28 09:22:16 


Expected results: RBD images shouldn't report error due to incomplete group snapshots 


Additional info: Relevant thread- https://ibm-systems-storage.slack.com/archives/C06GQDKEVGT/p1761641896967529

No node reboot, or DR operation was performed. This was on a fresh new setup.

Clusters overall health was okay. 

ODF tracker bug- DFBUGS-4392

Comment 1 Storage PM bot 2025-10-28 13:20:33 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 14 errata-xmlrpc 2026-01-29 07:02:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 9.0 Security and Enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2026:1536