Bug 2211643
Summary: | [MDR][ACM Tracker] After zone failure(c1+h1 cluster) and hub recovery, apps on c2 cluster are cleaned up as application namespace Manifestwork isn't backed up | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Parikshith <pbyregow> |
Component: | odf-dr | Assignee: | Benamar Mekhissi <bmekhiss> |
odf-dr sub component: | ramen | QA Contact: | Shrivaibavi Raghaventhiran <sraghave> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | urgent | ||
Priority: | unspecified | CC: | akandath, bmekhiss, edonnell, hnallurv, jpacker, kseeger, leyan, muagarwa, odf-bz-bot, owasserm, rtalur, xiangli |
Version: | 4.13 | Keywords: | Regression |
Target Milestone: | --- | Flags: | xiangli:
needinfo-
|
Target Release: | ODF 4.14.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | 4.13.0-218 | Doc Type: | Bug Fix |
Doc Text: |
Previously, during hub recovery, OpenShift Data Foundation encountered a known issue with Red Hat Advanced Cluster Management version 2.7.4 (or higher) where certain managed resources associated with the subscription-based workload might have been unintentionally deleted.
This issue has been fixed, and no managed resources are deleted during hub recovery.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2023-11-08 18:50:55 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 2154341, 2173907, 2173997, 2176028, 2183153, 2213472, 2244409 |
Description
Parikshith
2023-06-01 10:33:43 UTC
Hi Benamar Does this fix cover upgrade scenario that we discussed in yesterday's meeting? If not, then what are the steps to cover it manually? Please let us know Harish @hnallurv To update the namespace ManifestWork after upgrade, follow these steps: 0. Find where the application is running: ``` oc get drpc -A NAMESPACE NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE busybox-sample busybox-drpc 68d c1 c2 Failover FailedOver ``` 1. Find the ManifestWork for the namespace ``` oc get manifestwork -n c2 | grep ns NAME AGE busybox-drpc-busybox-sample-ns-mw 69d ``` 2. Find the namespace ManifestWork for the application. It is named based on this format "%1-%2-%3-mw". The breakdown of that is as follows: - %1: Name of the application - %2: Namespace of the application - %3: the word 'ns' example: busybox-drpc-busybox-sample-ns-mw --> [busybox-drpc]-[busybox-sample]-ns-mw 2. Edit the ManifestWork ``` oc edit manifestwork -n c2 busybox-drpc-busybox-sample-ns-mw ``` 3. Add the following lable to the .spec.workload.manifests section ``` labels: cluster.open-cluster-management.io/backup: resource ``` 4. Here is an example: ``` apiVersion: work.open-cluster-management.io/v1 kind: ManifestWork metadata: annotations: drplacementcontrol.ramendr.openshift.io/drpc-name: busybox-drpc drplacementcontrol.ramendr.openshift.io/drpc-namespace: busybox-sample creationTimestamp: "2023-03-30T19:50:25Z" finalizers: - cluster.open-cluster-management.io/manifest-work-cleanup generation: 2 name: busybox-drpc-busybox-sample-ns-mw namespace: c2 resourceVersion: "910332" uid: 788ff2c3-4d2e-49dc-b222-61f581131866 spec: workload: manifests: - apiVersion: v1 kind: Namespace labels: cluster.open-cluster-management.io/backup: resource metadata: name: busybox-sample spec: {} status: {} ``` ACM issue https://issues.redhat.com/browse/ACM-5795 is fixed with 2.7.7 *** Bug 2222706 has been marked as a duplicate of this bug. *** Tested versions: ---------------- OCP - 4.14.0-0.nightly-2023-10-08-220853 ODF - 4.14.0-146.stable ACM - 2.9.0-180 Steps performed: ----------------- 1. Configured 4.14 MetroDR setup with ACM 2.9.0 zone-a: c1, hub-active. zone-b: clu2, hub-passive 2. Deployed subscription and appset apps on both managed cluster(c1, c2) 3. Applied DR policy to apps and had apps on deployed, failedover and relocate states 4. Created backup 5. Brought down zone-a(c1, hub-active, ceph nodes) 6. Restored on hub-passive Observations: -------------- 1. Post restoring, had to manually import the c1 managed cluster.(using auto-import-secret) 2. After few minutes dr policy reached validated state. All applications were running and not cleaned up on managed clusters. 3. Openshift-storage ns and other app resources were intact With above observations moving the BZ to Verified state Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.14.0 security, enhancement & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:6832 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |