Description of problem (please be detailed as possible and provide log snippests): On 4.13 MDR setup, not able to fence the DR clusters. It will get stuck in 'fencing' state. oc describe networkfence network-fence-pbyregow-c1 Message: failed to add finalizer (csiaddons.openshift.io/network-fence) to NetworkFence resource (network-fence-pbyregow-c1): admission webhook "vnetworkfence.kb.io" denied the request: NetworkFence.csiaddons.openshift.io "network-fence-pbyregow-c1" is invalid: spec.parameters: Invalid value: map[string]string{"clusterID":"openshift-storage"}: parameters cannot be changed Version of all relevant components (if applicable): ocp: 4.13.0-0.nightly-2023-03-23-204038 odf: 4.13.0-110 acm: 2.7.2 mco: 4.13.0-110 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, cannot failover without fencing Is there any workaround available to the best of your knowledge? no Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 Can this issue reproducible? 2/2 times First noticed on 4.13.0-108 Reproduced on 4.13.0-110 If this is a regression, please provide more details to justify this: Yes, works on 4.12.1 and 4.12.2 MDR configs Steps to Reproduce: 1. Create a Metro-DR cluster with 3 OCP clusters, ie hub, c1, and c2 2. Configure dr policy and fencing 3. Create an application on the managed cluster, c1 4. Fence c1 Steps 1-4 are done by following doc [1] [1] https://dxp-docp-prod.apps.ext-waf.spoke.prod.us-west-2.aws.paas.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.12/html-single/configuring_openshift_data_foundation_disaster_recovery_for_openshift_workloads/index?lb_target=preview#configure-drclusters-for-fencing-automation Actual results: c1 cluster will be stuck in fencing state Expected results: Cluster should be moved to fenced state Additional info:
The error comes from https://github.com/csi-addons/kubernetes-csi-addons/blob/main/apis/csiaddons/v1alpha1/networkfence_webhook.go#L64-L66 ``` if reflect.DeepEqual(n.Spec.Parameters, oldNetworkFence.Spec.Parameters) { allErrs = append(allErrs, field.Invalid(field.NewPath("spec").Child("parameters"), n.Spec.Parameters, "parameters cannot be changed")) } ``` So, if reflect.DeepEqual() returns true, the error is returned? I think it misses a !
Also observed this on IBM Z, fencing of the DR cluster was not successful. - lastTransitionTime: "2023-03-28T19:56:13Z" message: fencing operation not successful observedGeneration: 5 reason: FenceError status: "False" type: Fenced - lastTransitionTime: "2023-03-28T19:56:13Z" message: fencing operation not successful observedGeneration: 5 reason: FenceError status: "True" type: Clean 2023-03-28T19:58:58.950Z INFO controllers.DRCluster controllers/drcluster_controller.go:290 Nothing to update {Phase:Fencing Conditions:[{Type:Fenced Status:False ObservedGeneration:5 LastTransitionTime:2023-03-28 19:56:13 +0000 UTC Reason:FenceError Message:fencing operation not successful} {Type:Clean Status:True ObservedGeneration:5 LastTransitionTime:2023-03-28 19:56:13 +0000 UTC Reason:FenceError Message:fencing operation not successful} {Type:Validated Status:True ObservedGeneration:5 LastTransitionTime:2023-03-28 19:56:13 +0000 UTC Reason:Succeeded Message:Validated the cluster}]} {"name": "ocsm4205001", "rid": "b81d5c34-1531-4c46-a9f3-fa9ffe1aed39"} 2023-03-28T19:58:58.950Z INFO controllers.DRCluster controllers/drcluster_controller.go:149 reconcile exit {"name": "ocsm4205001", "rid": "b81d5c34-1531-4c46-a9f3-fa9ffe1aed39"} 2023-03-28T19:58:58.950Z ERROR controller/controller.go:326 Reconciler error {"controller": "drcluster", "controllerGroup": "ramendr.openshift.io", "controllerKind": "DRCluster", "DRCluster": {"name":"ocsm4205001"}, "namespace": "", "name": "ocsm4205001", "reconcileID": "64d82e9d-5b37-4ad8-965d-d519732d9bad", "error": "failed to handle cluster fencing: fencing operation result not successful"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.1/pkg/internal/controller/controller.go:326 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.1/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.1/pkg/internal/controller/controller.go:234
@Sarakia, in which ODF version you have seen this issue? is it ODF 4.13?
@mrajanna : Its ODF version v4.13.0-110.stable, same as mentioned n the BZ
@sbalusu do you get the same error when you 'oc describe' the networkfence CR? A fix for this should be included in the next ODF build.
Fixed with 4.13.0-114
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Data Foundation 4.13.0 enhancement and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:3742