Description of problem (please be detailed as possible and provide log snippests): [RDR][Discovered Apps] ramen proceeds to recover kubeObjects from a capture even if the capture is invalid Version of all relevant components (if applicable): OCP version:- 4.16.0-0.nightly-2024-05-19-083311 ODF version:- 4.16.0-102 CEPH version:- ceph version 18.2.1-167.el9cp (e8c836edb24adb7717a6c8ba1e93a07e3efede29) reef (stable) ACM version:- 2.11.0-86 SUBMARINER version:- v0.18.0 VOLSYNC version:- volsync-product.v0.9.0 VOLSYNC method:- destinationCopyMethod: Direct Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 3 Can this issue reproducible? Yes Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to reproduce: 1. Deploy RDR cluster 2. Deploy discovered app workload and dr protect it 3. Perform a failover operation Actual results: 2024-05-21T14:13:20.185Z INFO controllers.VolumeReplicationGroup.vrginstance velero/requests.go:657 Backup {"VolumeReplicationGroup": {"name":"busybox-disc-rbd-1","namespace":"openshift-dr-ops"}, "rid": "17ae1901-a8fc-4b74-8d5b-411ae8741507", "State": "primary", "phase": "FailedValidation", "warnings": 0, "errors": 0, "failure": "", "validation errors": ["an existing backup storage location wasn't specified at backup creation time and the default 'openshift-dr-ops--busybox-disc-rbd-1--1----s3profile-prsurve-c1-ocs-storagecluster' wasn't found. Please address this issue (see `velero backup-location -h` for options) and create a new backup. Error: BackupStorageLocation.velero.io \"openshift-dr-ops--busybox-disc-rbd-1--1----s3profile-prsurve-c1-ocs-storagecluster\" not found"]} 2024-05-21T14:13:20.185Z ERROR controllers.VolumeReplicationGroup.vrginstance controllers/vrg_kubeobjects.go:611 Kube objects group recover error {"VolumeReplicationGroup": {"name":"busybox-disc-rbd-1","namespace":"openshift-dr-ops"}, "rid": "17ae1901-a8fc-4b74-8d5b-411ae8741507", "State": "primary", "number": 1, "profile": "s3profile-prsurve-c1-ocs-storagecluster", "group": 0, "name": "", "error": "backupFailedValidation"} github.com/ramendr/ramen/controllers.(*VRGInstance).kubeObjectsRecoveryStartOrResume /remote-source/app/controllers/vrg_kubeobjects.go:611 github.com/ramendr/ramen/controllers.(*VRGInstance).kubeObjectsRecover /remote-source/app/controllers/vrg_kubeobjects.go:494 github.com/ramendr/ramen/controllers.(*VRGInstance).restorePVsAndPVCsFromS3 /remote-source/app/controllers/vrg_volrep.go:1899 github.com/ramendr/ramen/controllers.(*VRGInstance).restorePVsAndPVCsForVolRep /remote-source/app/controllers/vrg_volrep.go:1837 github.com/ramendr/ramen/controllers.(*VRGInstance).clusterDataRestore /remote-source/app/controllers/volumereplicationgroup_controller.go:603 github.com/ramendr/ramen/controllers.(*VRGInstance).processAsPrimary /remote-source/app/controllers/volumereplicationgroup_controller.go:872 github.com/ramendr/ramen/controllers.(*VRGInstance).processVRG /remote-source/app/controllers/volumereplicationgroup_controller.go:551 github.com/ramendr/ramen/controllers.(*VolumeReplicationGroupReconciler).Reconcile /remote-source/app/controllers/volumereplicationgroup_controller.go:438 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/controller/controller.go:119 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/controller/controller.go:316 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/controller/controller.go:266 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/controller/controller.go:227 2024-05-21T14:13:20.185Z INFO controllers.VolumeReplicationGroup.vrginstance runtime/panic.go:914 Exiting processing VolumeReplicationGroup {"VolumeReplicationGroup": {"name":"busybox-disc-rbd-1","namespace":"openshift-dr-ops"}, "rid": "17ae1901-a8fc-4b74-8d5b-411ae8741507", "State": "primary"} 2024-05-21T14:13:20.185Z INFO controllers.VolumeReplicationGroup runtime/panic.go:914 Exiting reconcile loop {"VolumeReplicationGroup": {"name":"busybox-disc-rbd-1","namespace":"openshift-dr-ops"}, "rid": "17ae1901-a8fc-4b74-8d5b-411ae8741507"} 2024-05-21T14:13:20.185Z INFO controller/controller.go:115 Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference {"controller": "volumereplicationgroup", "controllerGroup": "ramendr.openshift.io", "controllerKind": "VolumeReplicationGroup", "VolumeReplicationGroup": {"name":"busybox-disc-rbd-1","namespace":"openshift-dr-ops"}, "namespace": "openshift-dr-ops", "name": "busybox-disc-rbd-1", "reconcileID": "22361a5c-d9e4-4b96-96fe-505f3fcdfa75"} panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x19b708d] goroutine 405 [running]: sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1() /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/controller/controller.go:116 +0x1e5 panic({0x1bc6c80?, 0x3313c30?}) /usr/lib/golang/src/runtime/panic.go:914 +0x21f github.com/ramendr/ramen/controllers.(*VRGInstance).getRecoverOrProtectRequest.func4({0x0, 0x0}) /remote-source/app/controllers/vrg_kubeobjects.go:561 +0x8d github.com/ramendr/ramen/controllers.(*VRGInstance).kubeObjectsRecoveryStartOrResume(0xc0006e63c0, 0xc0006e6580, {{0x23de158, 0xc00392f260}, {{0xc003c13a10, 0x27}, {0xc003af1c50, 0x16}, {0xc003ba1fc0, 0x3a}, ...}}, ...) /remote-source/app/controllers/vrg_kubeobjects.go:612 +0x72d github.com/ramendr/ramen/controllers.(*VRGInstance).kubeObjectsRecover(0xc0006e63c0, 0x54a?, {{0xc003c13a10, 0x27}, {0xc003af1c50, 0x16}, {0xc003ba1fc0, 0x3a}, {0xc003c681a0, 0x6}, ...}, ...) /remote-source/app/controllers/vrg_kubeobjects.go:494 +0x5dc github.com/ramendr/ramen/controllers.(*VRGInstance).restorePVsAndPVCsFromS3(0xc0006e63c0, 0xc0006e6580) /remote-source/app/controllers/vrg_volrep.go:1899 +0x638 github.com/ramendr/ramen/controllers.(*VRGInstance).restorePVsAndPVCsForVolRep(0xc0006e63c0, 0xc0006e6580) /remote-source/app/controllers/vrg_volrep.go:1837 +0x10e github.com/ramendr/ramen/controllers.(*VRGInstance).clusterDataRestore(0xc0006e63c0, 0xc003bae930?) /remote-source/app/controllers/volumereplicationgroup_controller.go:603 +0x130 github.com/ramendr/ramen/controllers.(*VRGInstance).processAsPrimary(0xc0006e63c0) /remote-source/app/controllers/volumereplicationgroup_controller.go:872 +0x105 github.com/ramendr/ramen/controllers.(*VRGInstance).processVRG(0xc0006e63c0) /remote-source/app/controllers/volumereplicationgroup_controller.go:551 +0x630 github.com/ramendr/ramen/controllers.(*VolumeReplicationGroupReconciler).Reconcile(0xc0006d3500, {0x23db4a0?, 0xc000cb01e0}, {{{0xc000b014b0, 0x10}, {0xc000b1e780, 0x12}}}) /remote-source/app/controllers/volumereplicationgroup_controller.go:438 +0xae5 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x23dedb8?, {0x23db4a0?, 0xc000cb01e0?}, {{{0xc000b014b0?, 0xb?}, {0xc000b1e780?, 0x0?}}}) /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/controller/controller.go:119 +0xb7 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0000c0a00, {0x23db4d8, 0xc0007cc410}, {0x1c72840?, 0xc001704d20?}) /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/controller/controller.go:316 +0x3cc sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0000c0a00, {0x23db4d8, 0xc0007cc410}) /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/controller/controller.go:266 +0x1c9 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2() /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/controller/controller.go:227 +0x79 created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 90 /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/controller/controller.go:223 +0x565 Expected results: Additional info: Deploying and failover operator was done using automation
Pratik, you did not describe the problem and the expected results. The title give some hints: "ramen proceeds to recover kubeObjects from a capture even if the capture is invalid" But there is not detail on the invalid capture and what is means. In the "actual results" you show that ramen was terminated after dereferencing a nil pointer, this should never happen and easy to fix for this code path. For "Is this reproducible" you answer Yes. This is not detailed enough. We need to understand if this is a random error or it happens every time. Reporting how many runs you did and how many runs failed will help. Please complete: - Description of the issue - Expected results - How many times did you test/how many time it failed - Complete configuration for reproducing this issue in another system
We think we understand the problem and it is fixed now. No more info needed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:4591