Description of problem (please be detailed as possible and provide log snippests): Discovered apps stuck in waitforreadiness after initiating FailOver. PVCs do get failedOver to secondary cluster but app pods doesn't. Version of all relevant components: OCP: 4.16.0-0.nightly-2024-06-20-005834 ODF/MCO: 4.16.0-130 ACM: 2.11.0-122 OADP: 1.4 oc get drpc -A -o wide NAMESPACE NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE PROGRESSION START TIME DURATION PEER READY openshift-dr-ops test-disc-1 65m pbyregow-c1 pbyregow-c2 Failover FailedOver WaitForReadiness 2024-06-21T11:04:57Z False ramen-dr-cluster-operator logs on managed cluster: 2024-06-21T11:42:51.971Z INFO controllers.VolumeReplicationGroup.vrginstance controllers/vrg_kubeobjects.go:674 Kube object protection {"VolumeReplicationGroup": {"name":"test-disc-1","namespace":"openshift-dr-ops"}, "rid": "603016c3-fa29-4be3-9fed-b2b1c4003d9e", "State": "primary", "disabled": false, "VRG": false, "configMap": false, "for": "recovery"} 2024-06-21T11:42:51.996Z INFO controllers.VolumeReplicationGroup.vrginstance velero/requests.go:152 Kube objects recover {"VolumeReplicationGroup": {"name":"test-disc-1","namespace":"openshift-dr-ops"}, "rid": "603016c3-fa29-4be3-9fed-b2b1c4003d9e", "State": "primary", "s3 url": "https://s3-openshift-storage.apps.pbyregow-c2.qe.rh-ocs.com", "s3 bucket": "odrbucket-b365d656fd21", "s3 region": "noobaa", "s3 key prefix": "openshift-dr-ops/test-disc-1/kube-objects/1/", "secret key ref": "&SecretKeySelector{LocalObjectReference:LocalObjectReference{Name:v34290e3a72502c50dfabf8e2c8feb0b3ee39883,},Key:ramengenerated,Optional:nil,}", "CA certificates": "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUR1ekNDQXFPZ0F3SUJBZ0lVZEM5R2MwYkhTTGx6UVJzV2Jkam5zL1ZjWExzd0RRWUpLb1pJaHZjTkFRRUwKQlFBd2JURWZNQjBHQTFVRUNnd1dUME5USUZGRklDMGdVbVZrSUVoaGRDd2dTVzVqTGpFUE1BMEdBMVVFQ3d3RwpUME5USUZGRk1SY3dGUVlEVlFRRERBNVBRMU1nVVVVZ1VtOXZkQ0JEUVRFZ01CNEdDU3FHU0liM0RRRUpBUllSCmIyTnpMWEZsUUhKbFpHaGhkQzVqYjIwd0hoY05NakV3T1RFME1UQTBNekExV2hjTk16RXd... 2024-06-21T11:42:51.996Z INFO controllers.VolumeReplicationGroup.vrginstance velero/requests.go:658 Backup {"VolumeReplicationGroup": {"name":"test-disc-1","namespace":"openshift-dr-ops"}, "rid": "603016c3-fa29-4be3-9fed-b2b1c4003d9e", "State": "primary", "phase": "FailedValidation", "warnings": 0, "errors": 0, "failure": "", "validation errors": ["Invalid included/excluded namespace lists: Namespace \"dummy\" not found"]} 2024-06-21T11:42:51.996Z ERROR controllers.VolumeReplicationGroup.vrginstance controllers/vrg_kubeobjects.go:626 Kube objects group recover error {"VolumeReplicationGroup": {"name":"test-disc-1","namespace":"openshift-dr-ops"}, "rid": "603016c3-fa29-4be3-9fed-b2b1c4003d9e", "State": "primary", "number": 1, "profile": "s3profile-pbyregow-c2-ocs-external-storagecluster", "group": 0, "name": "", "error": "backupFailedValidation"} github.com/ramendr/ramen/controllers.(*VRGInstance).kubeObjectsRecoveryStartOrResume /remote-source/app/controllers/vrg_kubeobjects.go:626 github.com/ramendr/ramen/controllers.(*VRGInstance).kubeObjectsRecover /remote-source/app/controllers/vrg_kubeobjects.go:501 github.com/ramendr/ramen/controllers.(*VRGInstance).restorePVsAndPVCsFromS3 /remote-source/app/controllers/vrg_volrep.go:1899 github.com/ramendr/ramen/controllers.(*VRGInstance).restorePVsAndPVCsForVolRep /remote-source/app/controllers/vrg_volrep.go:1837 github.com/ramendr/ramen/controllers.(*VRGInstance).clusterDataRestore /remote-source/app/controllers/volumereplicationgroup_controller.go:613 github.com/ramendr/ramen/controllers.(*VRGInstance).processAsPrimary /remote-source/app/controllers/volumereplicationgroup_controller.go:888 github.com/ramendr/ramen/controllers.(*VRGInstance).processVRG /remote-source/app/controllers/volumereplicationgroup_controller.go:561 github.com/ramendr/ramen/controllers.(*VolumeReplicationGroupReconciler).Reconcile /remote-source/app/controllers/volumereplicationgroup_controller.go:448 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/controller/controller.go:119 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/controller/controller.go:316 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/controller/controller.go:266 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/controller/controller.go:227 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, Cant failover discovered apps successfully. Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? 1/1 Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: yes Steps to Reproduce: 1. Configured 4.16.0-130 MDR cluster with ACM 2.11 and OADP 1.4 2. Create a discovered app on c1 and protect it 3. Fence c1 cluster 4. Failover discovered app from c1 to c2 managed cluster. Actual results: FailOver of discovered app did not complete successfully Expected results: Should be able to failover discovered apps successfully. Additional info:
(In reply to Raghavendra Talur from comment #6) > This is a blocker for 4.16. Can we have the acks please? > > Still working on the fix. Placing needinfo.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:4591