Description of problem: Config NodeNetworkConfigurationPolicy caused "Internal error occurred" for creating datavolume Version-Release number of selected component (if applicable): OCP4.7 Client Version: 4.7.0-202012110053.p0-4ebfe9c Server Version: 4.7.0-0.nightly-2020-12-14-080124 Kubernetes Version: v1.19.2+e386040 CNV2.6 How reproducible: Always Steps to Reproduce: 1. Setup a fresh cluster and make sure the dv could be created normally 2. Config the NNCP --- apiVersion: nmstate.io/v1alpha1 kind: NodeNetworkConfigurationPolicy metadata: name: br3-nncp spec: desiredState: interfaces: - name: br3 type: linux-bridge state: up bridge: options: stp: enabled: false port: - name: ens8 ipv4: enabled: false dhcp: false ipv6: enabled: false nodeSelector: node-role.kubernetes.io/worker: '' $ oc get nncp NAME STATUS br3-nncp SuccessfullyConfigured 3. Create the DV --- apiVersion: cdi.kubevirt.io/v1alpha1 kind: DataVolume metadata: name: import-dv-http spec: source: http: url: $URL accessModes: - ReadWriteOnce resources: requests: storage: 5Gi storageClassName: ocs-storagecluster-ceph-rbd volumeMode: Block contentType: kubevirt Actual results: $ oc create -f dv.yaml Error from server (InternalError): error when creating "dv": Internal error occurred: failed calling webhook "datavolume-mutate.cdi.kubevirt.io": Post "https://cdi-api.openshift-cnv.svc:443/datavolume-mutate?timeout=30s": dial tcp 10.128.2.11:8443: connect: no route to host Expected results: DV created normally with NNCP configured Additional info: Sometimes the hco-webhook or hco-operator pod crashed and kept restarting hco-operator-7d97fcbd65-w6z97 0/1 CrashLoopBackOff 15 or hco-webhook-59cdb8fbc5-47cvk 0/1 CrashLoopBackOff 9 $ oc logs -p hco-operator-7d97fcbd65-w6z97 -n openshift-cnv {"level":"info","ts":1607995346.9719403,"logger":"cmd","msg":"Go Version: go1.14.7"} {"level":"info","ts":1607995346.971978,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"} {"level":"info","ts":1607995346.9720407,"logger":"cmd","msg":"Found namespace","Namespace":"openshift-cnv"} I1215 01:22:28.023021 1 request.go:621] Throttling request took 1.039523169s, request: GET:https://172.30.0.1:443/apis/local.storage.openshift.io/v1alpha1?timeout=32s $ oc logs -p -n openshift-cnv hco-webhook-59cdb8fbc5-47cvk {"level":"info","ts":1607944002.9428325,"logger":"cmd","msg":"Go Version: go1.14.7"} {"level":"info","ts":1607944002.9428897,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"} {"level":"info","ts":1607944002.9429717,"logger":"cmd","msg":"Found namespace","Namespace":"openshift-cnv"} I1214 11:06:43.994586 1 request.go:621] Throttling request took 1.03801636s, request: GET:https://172.30.0.1:443/apis/network.openshift.io/v1?timeout=32s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ReconcileHCO 76m (x2839 over 10h) kubevirt-hyperconverged HCO Reconcile completed successfully Warning NodeNotReady 71m node-controller Node is not ready Normal Created 70m (x5 over 10h) kubelet Created container hyperconverged-cluster-operator Normal Started 70m (x5 over 10h) kubelet Started container hyperconverged-cluster-operator Normal Pulled 70m (x4 over 72m) kubelet Container image "registry.redhat.io/container-native-virtualization/hyperconverged-cluster-operator@sha256:51a9fc1e24056dd104838944891e830e434ec7052e76cd74f665e2bfc2845c10" already present on machine Warning Unhealthy 49m (x7 over 68m) kubelet Readiness probe failed: Get "http://10.128.2.10:6060/readyz": dial tcp 10.128.2.10:6060: connect: connection refused Warning BackOff 5m1s (x318 over 72m) kubelet Back-off restarting failed container
Yan, thanks for reporting this. Could you please share contents of `oc get nns -o yaml` before and after the configuration of NNCP? Would you also record on which nodes are CDI and HCO running?
Created attachment 1739619 [details] pod status in openshift-cnv ns
Created attachment 1739621 [details] nns contents after configuration
Hi, Petr I attached the pod status and nns contents of post-configuration in attachment (I have preserved the cluster for debugging if needed). For the nns contents before configuration, is it ok to attach a fresh installed cluster's nns?
Petr, may be related to https://bugzilla.redhat.com/show_bug.cgi?id=1904889 I don't think we got the fix for CNV yet.
Thanks Yan, Meni. I just (re-)attached the fixed version of nmstate to D/S. It should be soon available in errata as kubernetes-nmstate-handler-container-v2.6.0-11. Could we retry once it is available and see if this issue persists?
Verified. Checked Scenario: 1. Config the NNCP (attached nncp.yaml). 2. Apply the NNCP - `oc apply -f nncp.yaml` 3. Get Status of NNCP- Success: [cnv-qe-jenkins@network03-9n2rz-executor bug-nncp-dv]$ oc get nncp NAME STATUS br3-nncp SuccessfullyConfigured 4. Create dv from attached dv.yaml - `oc create -f dv.yaml` 5. DV created successfully with NNCP configured and no errors: [cnv-qe-jenkins@network03-9n2rz-executor bug-nncp-dv]$ oc create -f dv.yaml datavolume.cdi.kubevirt.io/import-dv-http created 6. Checked also that hco-webhook and hco-operator pods are running after the creation. * Checked also that VM creation works (came up from the orginal bug).
Created attachment 1744329 [details] NNCP Example for Verification
Created attachment 1744330 [details] DV Example for Verification
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 2.6.0 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:0799