Description of problem (please be detailed as possible and provide log snippets): [DR] CephBlockPool resources reports the wrong mirroringStatus A version of all relevant components (if applicable): ODF version:- 4.9.0-248.ci OCP version:- 4.9.0-0.nightly-2021-11-12-222121 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 4 Can this issue reproducible? yes Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Deploy DR cluster 2. Deploy workloads 3. Perform failover 4. Delete the workload 5. check cephblockpool resource Actual results: mirroringStatus: lastChecked: "2021-11-30T11:24:04Z" summary: daemon_health: OK health: ERROR image_health: ERROR states: error: 2 replaying: 3 Expected results: Additional info: When we do rbd mirror image status for all rbd image we don't see any image in error state bash-4.4$ for i in $(rbd ls -p ocs-storagecluster-cephblockpool); do rbd mirror image status ocs-storagecluster-cephblockpool/$i 2>/dev/null; done 2021-11-30T11:21:34.492+0000 7f6fb909c2c0 20 librbd::api::Image: list_images: list 0x7ffdfc8cc080 2021-11-30T11:21:34.495+0000 7f6fb909c2c0 20 librbd::api::Image: list_images_v2: io_ctx=0x7ffdfc8cc080 2021-11-30T11:21:34.496+0000 7f6fb909c2c0 20 librbd::api::Trash: list: list 0x7ffdfc8cc080 2021-11-30T11:21:34.496+0000 7f6fb909c2c0 20 librbd::api::Trash: list_trash_image_specs: list_trash_image_specs 0x7ffdfc8cc080 csi-vol-dummy-34c13019-232f-42ca-9102-9050ce1eea88: global_id: 8782ec3e-7b16-4d9e-9299-b115f40bafb0 state: up+replaying description: replaying, {"bytes_per_second":0.0,"bytes_per_snapshot":0.0,"local_snapshot_timestamp":1638271140,"remote_snapshot_timestamp":1638271200,"replay_state":"syncing","syncing_percent":30,"syncing_snapshot_timestamp":1638271200} service: a on prsurve-vm-dev-v6775-worker-528v4 last_update: 2021-11-30 11:21:08 peer_sites: name: 34c13019-232f-42ca-9102-9050ce1eea88 state: up+stopped description: local image is primary last_update: 2021-11-30 11:21:21 csi-vol-dummy-436a1e97-7d6a-41f2-8420-3cc2cdfae539: global_id: 230b088e-c2e7-450f-8e9b-ddf6b1ffee98 state: up+stopped description: local image is primary service: a on prsurve-vm-dev-v6775-worker-528v4 last_update: 2021-11-30 11:21:10 peer_sites: name: 34c13019-232f-42ca-9102-9050ce1eea88 state: up+replaying description: replaying, {"bytes_per_second":0.0,"bytes_per_snapshot":0.0,"local_snapshot_timestamp":1638271260,"remote_snapshot_timestamp":1638271260,"replay_state":"idle"} last_update: 2021-11-30 11:21:19 snapshots: 9424 .mirror.primary.230b088e-c2e7-450f-8e9b-ddf6b1ffee98.48108f1e-2ba0-4c97-b7d6-bfc5eb1ca0ee (peer_uuids:[e1fbbb13-a7bc-4c57-8128-e7a8a033b7b2]) test_1110: global_id: 24a3c8b6-e6e1-4739-9319-6408c3e6b38f state: up+replaying description: replaying, {"bytes_per_second":0.0,"bytes_per_snapshot":0.0,"local_snapshot_timestamp":1638212565,"remote_snapshot_timestamp":1638212565,"replay_state":"idle"} service: a on prsurve-vm-dev-v6775-worker-528v4 last_update: 2021-11-30 11:21:08 peer_sites: name: 34c13019-232f-42ca-9102-9050ce1eea88 state: up+stopped description: local image is primary last_update: 2021-11-30 11:21:21
The output comes from the command "rbd mirror pool status <poolName>" without any changes. So if there is a difference we should move this to the "ceph" component. Please move or close. Thanks.
Closing due to the inactivity, feel free to re-open if you have any concerns. If so do it under the "ceph" component if you feel someone needs to investigate further. Thanks.
Proposing as a blocker for 4.10.0 as this is definitely something we can't have for GA support level
Sure Ilya, if this is fixed with BZ #2008587 then I can move this to ON_QA as well. QE team, please retest with the latest ODF build which has the fix for BZ #2008587 If you can still see the problem then please re-open or raise a new BZ.
Ilya, As this bug is a blocker for 4.9.z TP we need the fix of BZ #2008587 to be backported to 5.0z as well as in 4.9.z we are using 5.0z unless we get that QE can't test it in 4.9.z
Moving to Assigned based on the comment#52
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.10.0 enhancement, security & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1372