Bug 2067095

Summary: [RDR] [tracker for BZ 2111364 and BZ 2211290] rbd mirror scheduling is getting stopped for some images
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Pratik Surve <prsurve>
Component: cephAssignee: Ram Raja <rraja>
ceph sub component: RBD-Mirror QA Contact: Pratik Surve <prsurve>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: unspecified CC: amagrawa, bniver, ebenahar, ekuric, flucifre, idryomov, jdurgin, jespy, kramdoss, kseeger, mmuench, muagarwa, odf-bz-bot, olakra, owasserm, rraja, sagrawal, sheggodu, sostapov, srangana, uchapaga
Version: 4.10Keywords: TestBlocker, Tracking
Target Milestone: ---   
Target Release: ODF 4.14.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.14.0-130 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 2069720 2102221 2111364 2111375 2116900 2120624 2229303 (view as bug list) Environment:
Last Closed: 2023-11-08 18:49:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1882534, 2069720, 2111364, 2111375, 2120624, 2121514, 2211290    
Bug Blocks: 2094357    

Description Pratik Surve 2022-03-23 09:57:15 UTC
Description of problem (please be detailed as possible and provide log
snippets):

[DR] rbd mirror sheduling is getting stopped for some images 

Version of all relevant components (if applicable):

OCP version:- 4.10.0-0.nightly-2022-03-17-204457
ODF version:- 4.10.0-199
CEPH version:- {
    "mon": {
        "ceph version 16.2.7-76.el8cp (f4d6ada772570ae8b05c62ad79e222fbd3f04188) pacific (stable)": 3
    },
    "mgr": {
        "ceph version 16.2.7-76.el8cp (f4d6ada772570ae8b05c62ad79e222fbd3f04188) pacific (stable)": 1
    },
    "osd": {
        "ceph version 16.2.7-76.el8cp (f4d6ada772570ae8b05c62ad79e222fbd3f04188) pacific (stable)": 3
    },
    "mds": {
        "ceph version 16.2.7-76.el8cp (f4d6ada772570ae8b05c62ad79e222fbd3f04188) pacific (stable)": 2
    },
    "rbd-mirror": {
        "ceph version 16.2.7-76.el8cp (f4d6ada772570ae8b05c62ad79e222fbd3f04188) pacific (stable)": 2
    },
    "rgw": {
        "ceph version 16.2.7-76.el8cp (f4d6ada772570ae8b05c62ad79e222fbd3f04188) pacific (stable)": 1
    },
    "overall": {
        "ceph version 16.2.7-76.el8cp (f4d6ada772570ae8b05c62ad79e222fbd3f04188) pacific (stable)": 12
    }
}

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

yes there will be a possibility of data loss

Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3

Can this issue reproducible?
yes

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Deploy RDR cluster
2. Run io for 2-3 days 
3. Check rbd snap ls for all the images on both sites 


Actual results:
$rbd snap ls output from the secondary site 

http://pastebin.test.redhat.com/1039155

$rbd mirror image status from the primary site

http://pastebin.test.redhat.com/1039160

$rbd snap ls output from the primary site

http://pastebin.test.redhat.com/1039156

$rbd mirror image status from the primary site

http://pastebin.test.redhat.com/1039157


Expected results:


Additional info:

Comment 5 Josh Durgin 2022-03-29 15:26:13 UTC
Matching the assignment of the RHCS bz

Comment 11 Mudit Agarwal 2022-04-05 13:45:16 UTC
Moving DR BZs to 4.10.z/4.11

Comment 53 Mudit Agarwal 2022-08-11 05:03:00 UTC
Please provide doc text

Comment 109 Ilya Dryomov 2023-04-11 12:38:55 UTC
*** Bug 2155753 has been marked as a duplicate of this bug. ***

Comment 110 Mudit Agarwal 2023-05-15 17:48:45 UTC
Based on 17.2.6-47

Comment 114 Mudit Agarwal 2023-06-09 02:49:01 UTC
Please add doc text.

Comment 120 Elad 2023-06-19 06:01:23 UTC
Moving to 4.13.z for verification purposes

Comment 123 Ilya Dryomov 2023-07-11 12:55:43 UTC
*** Bug 2215982 has been marked as a duplicate of this bug. ***

Comment 137 errata-xmlrpc 2023-11-08 18:49:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.14.0 security, enhancement & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6832

Comment 138 Red Hat Bugzilla 2024-03-08 04:25:05 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days