Bug 2182375
| Summary: | [MDR] Not able to fence DR clusters | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Parikshith <pbyregow> |
| Component: | csi-addons | Assignee: | Niels de Vos <ndevos> |
| Status: | CLOSED ERRATA | QA Contact: | Parikshith <pbyregow> |
| Severity: | urgent | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.13 | CC: | hnallurv, kseeger, muagarwa, ndevos, ocs-bugs, odf-bz-bot, sbalusu |
| Target Milestone: | --- | Keywords: | Regression |
| Target Release: | ODF 4.13.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-06-21 15:25:01 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
The error comes from https://github.com/csi-addons/kubernetes-csi-addons/blob/main/apis/csiaddons/v1alpha1/networkfence_webhook.go#L64-L66 ``` if reflect.DeepEqual(n.Spec.Parameters, oldNetworkFence.Spec.Parameters) { allErrs = append(allErrs, field.Invalid(field.NewPath("spec").Child("parameters"), n.Spec.Parameters, "parameters cannot be changed")) } ``` So, if reflect.DeepEqual() returns true, the error is returned? I think it misses a ! Also observed this on IBM Z, fencing of the DR cluster was not successful.
- lastTransitionTime: "2023-03-28T19:56:13Z"
message: fencing operation not successful
observedGeneration: 5
reason: FenceError
status: "False"
type: Fenced
- lastTransitionTime: "2023-03-28T19:56:13Z"
message: fencing operation not successful
observedGeneration: 5
reason: FenceError
status: "True"
type: Clean
2023-03-28T19:58:58.950Z INFO controllers.DRCluster controllers/drcluster_controller.go:290 Nothing to update {Phase:Fencing Conditions:[{Type:Fenced Status:False ObservedGeneration:5 LastTransitionTime:2023-03-28 19:56:13 +0000 UTC Reason:FenceError Message:fencing operation not successful} {Type:Clean Status:True ObservedGeneration:5 LastTransitionTime:2023-03-28 19:56:13 +0000 UTC Reason:FenceError Message:fencing operation not successful} {Type:Validated Status:True ObservedGeneration:5 LastTransitionTime:2023-03-28 19:56:13 +0000 UTC Reason:Succeeded Message:Validated the cluster}]} {"name": "ocsm4205001", "rid": "b81d5c34-1531-4c46-a9f3-fa9ffe1aed39"}
2023-03-28T19:58:58.950Z INFO controllers.DRCluster controllers/drcluster_controller.go:149 reconcile exit {"name": "ocsm4205001", "rid": "b81d5c34-1531-4c46-a9f3-fa9ffe1aed39"}
2023-03-28T19:58:58.950Z ERROR controller/controller.go:326 Reconciler error {"controller": "drcluster", "controllerGroup": "ramendr.openshift.io", "controllerKind": "DRCluster", "DRCluster": {"name":"ocsm4205001"}, "namespace": "", "name": "ocsm4205001", "reconcileID": "64d82e9d-5b37-4ad8-965d-d519732d9bad", "error": "failed to handle cluster fencing: fencing operation result not successful"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.1/pkg/internal/controller/controller.go:326
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.1/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.1/pkg/internal/controller/controller.go:234
@Sarakia, in which ODF version you have seen this issue? is it ODF 4.13? @mrajanna : Its ODF version v4.13.0-110.stable, same as mentioned n the BZ @sbalusu do you get the same error when you 'oc describe' the networkfence CR? A fix for this should be included in the next ODF build. Fixed with 4.13.0-114 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Data Foundation 4.13.0 enhancement and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:3742 |
Description of problem (please be detailed as possible and provide log snippests): On 4.13 MDR setup, not able to fence the DR clusters. It will get stuck in 'fencing' state. oc describe networkfence network-fence-pbyregow-c1 Message: failed to add finalizer (csiaddons.openshift.io/network-fence) to NetworkFence resource (network-fence-pbyregow-c1): admission webhook "vnetworkfence.kb.io" denied the request: NetworkFence.csiaddons.openshift.io "network-fence-pbyregow-c1" is invalid: spec.parameters: Invalid value: map[string]string{"clusterID":"openshift-storage"}: parameters cannot be changed Version of all relevant components (if applicable): ocp: 4.13.0-0.nightly-2023-03-23-204038 odf: 4.13.0-110 acm: 2.7.2 mco: 4.13.0-110 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, cannot failover without fencing Is there any workaround available to the best of your knowledge? no Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 Can this issue reproducible? 2/2 times First noticed on 4.13.0-108 Reproduced on 4.13.0-110 If this is a regression, please provide more details to justify this: Yes, works on 4.12.1 and 4.12.2 MDR configs Steps to Reproduce: 1. Create a Metro-DR cluster with 3 OCP clusters, ie hub, c1, and c2 2. Configure dr policy and fencing 3. Create an application on the managed cluster, c1 4. Fence c1 Steps 1-4 are done by following doc [1] [1] https://dxp-docp-prod.apps.ext-waf.spoke.prod.us-west-2.aws.paas.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.12/html-single/configuring_openshift_data_foundation_disaster_recovery_for_openshift_workloads/index?lb_target=preview#configure-drclusters-for-fencing-automation Actual results: c1 cluster will be stuck in fencing state Expected results: Cluster should be moved to fenced state Additional info: