Bug 2033455 - [RDR] OSD Blocklist entries added during failover and fallback operations prevent rbd-mirror communication
Summary: [RDR] OSD Blocklist entries added during failover and fallback operations pre...
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ceph
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Ilya Dryomov
QA Contact: Elad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-16 21:26 UTC by Jean-Charles Lopez
Modified: 2023-08-11 15:13 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)

Description Jean-Charles Lopez 2021-12-16 21:26:35 UTC
Description of problem (please be detailed as possible and provide log
snippests):
- Deployed a test application on Cluster 1 via ACM and Ramen
- Failed Over to cluster 2
- Relocated application on Cluster 1
- Failed Over to cluster 2
- Let the application run for the entire night
- Relocated application on Cluster
- Deleted application via ACM and Ramen
- Deployed a test application on Cluster 1 via ACM and Ramen

On the new deployment the RBD images are created on cluster 1 but the mirroring is not happening.

RBD Mirror report daemon_health OK but images in error or unknown status

Version of all relevant components (if applicable):
OCP 4.9
ODF 4.9.1 Build 252


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes

Is there any workaround available to the best of your knowledge?
Yes
Remove the OSD block list entries from both clusters
Restart the rbd-mirror pod on each cluster

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
5

Can this issue reproducible?
Unsure at this point

Can this issue reproduce from the UI?
No

If this is a regression, please provide more details to justify this:
Unsure at this point

Steps to Reproduce:
1.
2.
3.


Actual results:
rbd-mirror report errors
RBD images are created on cluster 1
RBD images are NOT created on cluster 2


Expected results:
rbd-mirror report health ok
RBD images are created on cluster 1
RBD images are created on cluster 2

Additional info:
We identified the problem through the rbd-mirror log
debug 2021-12-16T17:11:47.801+0000 7f4fc10b4700 -1 rbd::mirror::InstanceReplayer: 0x55f75227a140 start_image_replayer: global_image_id=c1cf27bc-4046-4269-a4e3-52404211f945: blocklisted detected during image replay

The error shows in both cluster

The RBD image status show the following
sh-4.4$ rbd mirror image status ocs-storagecluster-cephblockpool/csi-vol-5e5431d9-5e90-11ec-ad05-0a580a83001c
csi-vol-5e5431d9-5e90-11ec-ad05-0a580a83001c:
  global_id:   599661e3-4255-40c3-8b9a-50be309b7cd0
  state:       up+stopped
  description: local image is primary
  last_update: 2021-12-16 17:09:09
  peer_sites:
    name: 93712e2c-0253-4dae-914e-6418b0df74bb
    state: down+unknown
    description: status not found
    last_update: 
  snapshots:
    3380 .mirror.primary.599661e3-4255-40c3-8b9a-50be309b7cd0.e6856604-763a-4072-9316-e16e9df07cc9 (peer_uuids:[7d3d7527-9ed2-49e4-8b9a-6fa791c8ae84])
    3395 .mirror.primary.599661e3-4255-40c3-8b9a-50be309b7cd0.03e0ee1e-0116-43f1-880c-02de74518869 (peer_uuids:[7d3d7527-9ed2-49e4-8b9a-6fa791c8ae84])
    3417 .mirror.primary.599661e3-4255-40c3-8b9a-50be309b7cd0.4fba24c5-8963-4d02-a186-82c7590c8067 (peer_uuids:[7d3d7527-9ed2-49e4-8b9a-6fa791c8ae84])
sh-4.4$ rbd mirror image status ocs-storagecluster-cephblockpool/csi-vol-5e5ba8b6-5e90-11ec-ad05-0a580a83001c
csi-vol-5e5ba8b6-5e90-11ec-ad05-0a580a83001c:
  global_id:   0f2ca8df-4d22-4261-aec7-fa5705d11f0d
  state:       up+stopped
  description: local image is primary
  service:     a on ip-10-0-198-218.us-east-2.compute.internal
  last_update: 2021-12-16 17:09:11
  peer_sites:
    name: 93712e2c-0253-4dae-914e-6418b0df74bb
    state: down+unknown
    description: status not found
    last_update: 
  snapshots:
    3385 .mirror.primary.0f2ca8df-4d22-4261-aec7-fa5705d11f0d.2aad366c-ba10-41d1-bafe-5c1c86185b59 (peer_uuids:[7d3d7527-9ed2-49e4-8b9a-6fa791c8ae84])
    3393 .mirror.primary.0f2ca8df-4d22-4261-aec7-fa5705d11f0d.8c2c90ed-dd39-4820-905b-f5c67f240923 (peer_uuids:[7d3d7527-9ed2-49e4-8b9a-6fa791c8ae84])
    3418 .mirror.primary.0f2ca8df-4d22-4261-aec7-fa5705d11f0d.f5f6011c-a43c-4a84-afe8-ca40383d445a (peer_uuids:[7d3d7527-9ed2-49e4-8b9a-6fa791c8ae84])

Comment 7 Mudit Agarwal 2022-06-29 13:34:19 UTC
Not a TP blocker, moving it out of 4.11

Comment 24 Mudit Agarwal 2023-04-06 12:44:31 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=2034283 is moved to 4.14


Note You need to log in before you can comment on or make changes to this bug.