Bug 2005919 - [DR] [Tracker for BZ #2008587] when Relocate action is performed and the Application is deleted completely rbd image is not getting deleted on secondary site
Summary: [DR] [Tracker for BZ #2008587] when Relocate action is performed and the Appl...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ceph
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ODF 4.10.0
Assignee: Ilya Dryomov
QA Contact: Sidhant Agrawal
URL:
Whiteboard:
: 2037650 (view as bug list)
Depends On: 2008587 2011326 2047279
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-20 13:37 UTC by Pratik Surve
Modified: 2023-08-09 16:37 UTC (History)
23 users (show)

Fixed In Version: 4.10.0-124
Doc Type: Bug Fix
Doc Text:
.Image deletion events on secondary clusters are handled correctly Previously, when RBD images were deleted by the user on the primary cluster would not consequently be deleted by `rbd-mirror daemon` on the secondary cluster. It happened due to an error that occurred in the rbd-mirror daemon that prevented image deletion events from being properly propagated to the secondary cluster. With this update, RBD images are handled properly on both primary and secondary cluster when deleted.
Clone Of:
: 2008587 (view as bug list)
Environment:
Last Closed: 2022-04-13 18:49:43 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github RamenDR ramen issues 264 0 None open Secondary images are not always garbage collected when VRG and CR are deleted 2021-09-21 10:54:02 UTC
Red Hat Product Errata RHSA-2022:1372 0 None None None 2022-04-13 18:50:25 UTC

Description Pratik Surve 2021-09-20 13:37:16 UTC
Description of problem (please be detailed as possible and provide log
snippests):

[DR] when Relocate action is performed rbd image is not getting deleted on seconday site

Version of all relevant components (if applicable):

odf-operator.v4.9.0-138.ci

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?
yes

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Deploy 2 DR cluster
2. Deploy workload
3. Perform Failover 
4. After some time perform Relocate
5. Delete the Application completely
6. Check for pv,pvc,vrc
6. Check for rbd image on secondary site


Actual results:

rbd image still present on the secondary cluster


 rbd info ocs-storagecluster-cephblockpool/csi-vol-b0211025-1611-11ec-962a-0a580a8301dd
rbd image 'csi-vol-b0211025-1611-11ec-962a-0a580a8301dd':
	size 1 GiB in 256 objects
	order 22 (4 MiB objects)
	snapshot_count: 2
	id: 62b137f3f8bc
	block_name_prefix: rbd_data.62b137f3f8bc
	format: 2
	features: layering, non-primary
	op_features: 
	flags: 
	create_timestamp: Wed Sep 15 10:42:59 2021
	access_timestamp: Wed Sep 15 14:43:54 2021
	modify_timestamp: Wed Sep 15 10:42:59 2021
	mirroring state: enabled
	mirroring mode: snapshot
	mirroring global id: e8c61a2f-ddaf-4df9-ba93-d352483efe44
	mirroring primary: false
bash-4.4$ rbd mirror image status  ocs-storagecluster-cephblockpool/csi-vol-b0211025-1611-11ec-962a-0a580a8301dd
csi-vol-b0211025-1611-11ec-962a-0a580a8301dd:
  global_id:   e8c61a2f-ddaf-4df9-ba93-d352483efe44
  state:       up+error
  description: split-brain
  last_update: 2021-09-16 11:08:14
  peer_sites:
    name: 3401ff21-accc-4fbe-9cd7-34c9e729aa0d
    state: up+unknown
    description: remote image is non-primary
    last_update: 2021-09-20 13:35:29


The same rbd image is not present on secondary site

Expected results:
rbd image should be deleted

Additional info:

Comment 3 Shyamsundar 2021-09-21 00:38:55 UTC
RBD image on the Secondary site post a relocation of failover, would still exist. The image would be garbage collected when the application and its PVCs are deleted on the primary site.

Hence I assume this is not an issue.

Although, in our testing we have noticed that the remote site image is not always garbage collected when primary site application (PVC and resources) is deleted.

@pratik, looking for clarification if the application was deleted and the remote image was still present, or the expectation here is that post relocation the remote image would not be present. If the former then we would need to track and get this fixed, if the latter then there is an expectation mismatch on the feature.

Comment 14 Scott Ostapovicz 2021-10-20 14:14:57 UTC
This will be targeted for RHCS 5.0 z2 and therefore be available for the ODF 4.9 z stream.

Comment 15 Mudit Agarwal 2021-10-21 12:04:48 UTC
This BZ was discussed in the last DR sync up and was agreed upon to move out of 4.9 because the ceph fix is out of scope for 5.0z1
It is an image garbage collection issue. Does not impact a user from trying out the feature and should be ok to be part of a subsequent z release.
Users who have tried this out in their clusters may need to use the toolbox(with the help of support) to garbage collect the image on the secondary cluster.

Moving this out and marking it as a known issue.

Comment 19 Mudit Agarwal 2021-11-16 13:34:11 UTC
Shyam, please add doc text

Comment 24 Shyamsundar 2022-01-06 12:36:29 UTC
*** Bug 2037650 has been marked as a duplicate of this bug. ***

Comment 30 krishnaram Karthick 2022-03-10 05:08:41 UTC
The intention was to close the 4.9.z bug but I did not notice that it is also targeted for 4.10. So removing the 4.9 flag and moving the bug back to on_qa -> verified for correctness.

Comment 31 Mudit Agarwal 2022-04-05 07:19:47 UTC
Please add doc text

Comment 34 errata-xmlrpc 2022-04-13 18:49:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.10.0 enhancement, security & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1372


Note You need to log in before you can comment on or make changes to this bug.