Bug 2067095 - [RDR] [tracker for BZ 2111364 and BZ 2211290] rbd mirror scheduling is getting stopped for some images [NEEDINFO]
Summary: [RDR] [tracker for BZ 2111364 and BZ 2211290] rbd mirror scheduling is gettin...
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ceph
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ODF 4.14.0
Assignee: Ram Raja
QA Contact: Pratik Surve
URL:
Whiteboard:
: 2155753 2215982 (view as bug list)
Depends On: 2121514 1882534 2069720 2111364 2111375 2120624
Blocks: 2094357
TreeView+ depends on / blocked
 
Reported: 2022-03-23 09:57 UTC by Pratik Surve
Modified: 2023-08-11 15:05 UTC (History)
21 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 2069720 2102221 2111364 2111375 2116900 2120624 2229303 (view as bug list)
Environment:
Last Closed:
Embargoed:
idryomov: needinfo? (ekuric)


Attachments (Terms of Use)

Description Pratik Surve 2022-03-23 09:57:15 UTC
Description of problem (please be detailed as possible and provide log
snippets):

[DR] rbd mirror sheduling is getting stopped for some images 

Version of all relevant components (if applicable):

OCP version:- 4.10.0-0.nightly-2022-03-17-204457
ODF version:- 4.10.0-199
CEPH version:- {
    "mon": {
        "ceph version 16.2.7-76.el8cp (f4d6ada772570ae8b05c62ad79e222fbd3f04188) pacific (stable)": 3
    },
    "mgr": {
        "ceph version 16.2.7-76.el8cp (f4d6ada772570ae8b05c62ad79e222fbd3f04188) pacific (stable)": 1
    },
    "osd": {
        "ceph version 16.2.7-76.el8cp (f4d6ada772570ae8b05c62ad79e222fbd3f04188) pacific (stable)": 3
    },
    "mds": {
        "ceph version 16.2.7-76.el8cp (f4d6ada772570ae8b05c62ad79e222fbd3f04188) pacific (stable)": 2
    },
    "rbd-mirror": {
        "ceph version 16.2.7-76.el8cp (f4d6ada772570ae8b05c62ad79e222fbd3f04188) pacific (stable)": 2
    },
    "rgw": {
        "ceph version 16.2.7-76.el8cp (f4d6ada772570ae8b05c62ad79e222fbd3f04188) pacific (stable)": 1
    },
    "overall": {
        "ceph version 16.2.7-76.el8cp (f4d6ada772570ae8b05c62ad79e222fbd3f04188) pacific (stable)": 12
    }
}

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

yes there will be a possibility of data loss

Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3

Can this issue reproducible?
yes

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Deploy RDR cluster
2. Run io for 2-3 days 
3. Check rbd snap ls for all the images on both sites 


Actual results:
$rbd snap ls output from the secondary site 

http://pastebin.test.redhat.com/1039155

$rbd mirror image status from the primary site

http://pastebin.test.redhat.com/1039160

$rbd snap ls output from the primary site

http://pastebin.test.redhat.com/1039156

$rbd mirror image status from the primary site

http://pastebin.test.redhat.com/1039157


Expected results:


Additional info:

Comment 5 Josh Durgin 2022-03-29 15:26:13 UTC
Matching the assignment of the RHCS bz

Comment 11 Mudit Agarwal 2022-04-05 13:45:16 UTC
Moving DR BZs to 4.10.z/4.11

Comment 53 Mudit Agarwal 2022-08-11 05:03:00 UTC
Please provide doc text

Comment 109 Ilya Dryomov 2023-04-11 12:38:55 UTC
*** Bug 2155753 has been marked as a duplicate of this bug. ***

Comment 110 Mudit Agarwal 2023-05-15 17:48:45 UTC
Based on 17.2.6-47

Comment 114 Mudit Agarwal 2023-06-09 02:49:01 UTC
Please add doc text.

Comment 120 Elad 2023-06-19 06:01:23 UTC
Moving to 4.13.z for verification purposes

Comment 123 Ilya Dryomov 2023-07-11 12:55:43 UTC
*** Bug 2215982 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.