2248824 – [4.14 Backport][RDR] [Hub recovery] After hub recovery, MCO didn't recreate the VolumeReplicationClass

Bug 2248824 - [4.14 Backport][RDR] [Hub recovery] After hub recovery, MCO didn't recreate the VolumeReplicationClass

Summary: [4.14 Backport][RDR] [Hub recovery] After hub recovery, MCO didn't recreate t...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	odf-dr
Sub Component:
Version:	4.14
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	ODF 4.14.1
Assignee:	Vineet
QA Contact:	Aman Agrawal
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	2249009 (view as bug list)
Depends On:	2246186 2249009
Blocks:
TreeView+	depends on / blocked

Reported:	2023-11-09 08:09 UTC by krishnaram Karthick
Modified:	2023-12-07 13:21 UTC (History)
CC List:	5 users (show)
Fixed In Version:	4.14.1-4
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	2246186
Environment:
Last Closed:	2023-12-07 13:21:25 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	red-hat-storage odf-multicluster-orchestrator pull 184	0	None	open	Bug 2248824: [release-4.14] Requeue DRPolicy if no mirrorpeers found	2023-11-10 08:25:48 UTC
Red Hat Product Errata	RHBA-2023:7696	0	None	None	None	2023-12-07 13:21:26 UTC

Description krishnaram Karthick 2023-11-09 08:09:11 UTC

cloning this bug for 4.14 backport

+++ This bug was initially created as a clone of Bug #2246186 +++

Description of problem (please be detailed as possible and provide log
snippests): This BZ is an extended one for the issue reported in BZ2246084 where it was identified that the MCO isn't creating VolumeReplicationClass for rbd backed workloads (if I am not wrong), and thus leads to unstable workloads resources status which affects failover of those workloads (failover & cleanup won't complete as expected).

Requesting @bmekhiss to add more details to the BZ for better understanding & correct the above noted observations if needed.


Version of all relevant components (if applicable):
OCP 4.14.0-0.nightly-2023-10-18-004928
advanced-cluster-management.v2.9.0-188 
ODF 4.14.0-156
ceph version 17.2.6-148.el9cp (badc1d27cb07762bea48f6554ad4f92b9d3fbb6b) quincy (stable)
Submariner   image: brew.registry.redhat.io/rh-osbs/iib:599799
ACM 2.9.0-DOWNSTREAM-2023-10-18-17-59-25
Latency 50ms RTT


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Read the Description above
2.
3.


Actual results:


Expected results:


Additional info:

--- Additional comment from RHEL Program Management on 2023-10-25 17:54:29 UTC ---

This bug having no release flag set previously, is now set with release flag 'odf‑4.14.0' to '?', and so is being proposed to be fixed at the ODF 4.14.0 release. Note that the 3 Acks (pm_ack, devel_ack, qa_ack), if any previously set while release flag was missing, have now been reset since the Acks are to be set against a release flag.

--- Additional comment from RHEL Program Management on 2023-10-25 17:54:29 UTC ---

Since this bug has severity set to 'urgent', it is being proposed as a blocker for the currently set release flag. Please resolve ASAP.

--- Additional comment from Benamar Mekhissi on 2023-10-25 19:45:25 UTC ---

After hub recovery was completed, we found that the VolumeReplicationClass was missing on the managed clusters. Checked the new active hub looking for the ManifestWork that starts with vrc[...] and didn't find one. This issue caused the RBD volumes to be targeted for VolSync replication. This worked, but that wasn't the desired outcome.

Restarting the multicluster operator pod regenerated the ManifestWork, which created the VolumeReplicationClass(s). I am attaching the MCO log file prior to restarting it.

--- Additional comment from Benamar Mekhissi on 2023-10-25 19:47:20 UTC ---



--- Additional comment from Vineet on 2023-10-30 12:05:46 UTC ---

Can we account for frequency at which this happens ? Restarting the pod resolved the issue immediately since it was an update conflict

--- Additional comment from Aman Agrawal on 2023-10-30 12:37:43 UTC ---

(In reply to Vineet from comment #5)
> Can we account for frequency at which this happens ? Restarting the pod
> resolved the issue immediately since it was an update conflict

Pls check https://bugzilla.redhat.com/show_bug.cgi?id=2246084#c6

--- Additional comment from RHEL Program Management on 2023-11-04 11:25:47 UTC ---

This BZ is being approved for ODF 4.14.0 release, upon receipt of the 3 ACKs (PM,Devel,QA) for the release flag 'odf‑4.14.0

--- Additional comment from RHEL Program Management on 2023-11-04 11:25:47 UTC ---

Since this bug has been approved for ODF 4.14.0 release, through release flag 'odf-4.14.0+', the Target Release is being set to 'ODF 4.14.0

--- Additional comment from Mudit Agarwal on 2023-11-07 11:47:18 UTC ---

Moving hub recovery issues out to 4.15 based on offline discussion.

--- Additional comment from RHEL Program Management on 2023-11-07 12:04:11 UTC ---

The 'Target Release' is not to be set manually at the Red Hat OpenShift Data Foundation product.

The 'Target Release' will be auto set appropriately, after the 3 Acks (pm,devel,qa) are set to "+" for a specific release flag and that release flag gets auto set to "+".

--- Additional comment from RHEL Program Management on 2023-11-07 12:04:11 UTC ---

This BZ is being approved for ODF 4.15.0 release, upon receipt of the 3 ACKs (PM,Devel,QA) for the release flag 'odf‑4.15.0

--- Additional comment from RHEL Program Management on 2023-11-07 12:04:11 UTC ---

Since this bug has been approved for ODF 4.15.0 release, through release flag 'odf-4.15.0+', the Target Release is being set to 'ODF 4.15.0

--- Additional comment from Aman Agrawal on 2023-11-07 16:02:06 UTC ---

Hi Mudit, could we pls re-target this bug to 4.14.z and not 4.15?

Comment 6 Vineet 2023-11-10 08:25:02 UTC

*** Bug 2249009 has been marked as a duplicate of this bug. ***

Comment 14 errata-xmlrpc 2023-12-07 13:21:25 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Data Foundation 4.14.1 Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:7696

Note You need to log in before you can comment on or make changes to this bug.