1907691 – [CNV] Configuring NodeNetworkConfigurationPolicy caused "Internal error occurred" for creating datavolume

Bug 1907691 - [CNV] Configuring NodeNetworkConfigurationPolicy caused "Internal error occurred" for creating datavolume

Summary: [CNV] Configuring NodeNetworkConfigurationPolicy caused "Internal error occur...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	2.6.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	2.6.0
Assignee:	Petr Horáček
QA Contact:	Ofir Nash
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-12-15 02:08 UTC by Yan Du
Modified:	2021-03-10 11:23 UTC (History)
CC List:	3 users (show)
Fixed In Version:	kubernetes-nmstate-handler-container-v2.6.0-14
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-03-10 11:22:41 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
pod status in openshift-cnv ns (9.49 KB, text/plain) 2020-12-16 12:30 UTC, Yan Du	no flags	Details
nns contents after configuration (28.78 KB, text/plain) 2020-12-16 12:31 UTC, Yan Du	no flags	Details
NNCP Example for Verification (448 bytes, text/plain) 2021-01-04 14:42 UTC, Ofir Nash	no flags	Details
DV Example for Verification (391 bytes, text/plain) 2021-01-04 14:43 UTC, Ofir Nash	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2021:0799	0	None	None	None	2021-03-10 11:23:19 UTC

Description Yan Du 2020-12-15 02:08:01 UTC

Description of problem:
Config NodeNetworkConfigurationPolicy caused "Internal error occurred" for creating datavolume

Version-Release number of selected component (if applicable):
OCP4.7
Client Version: 4.7.0-202012110053.p0-4ebfe9c
Server Version: 4.7.0-0.nightly-2020-12-14-080124
Kubernetes Version: v1.19.2+e386040
CNV2.6

How reproducible:
Always

Steps to Reproduce:
1. Setup a fresh cluster and make sure the dv could be created normally

2. Config the NNCP
---
apiVersion: nmstate.io/v1alpha1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: br3-nncp
spec:
  desiredState:
    interfaces:
    - name: br3
      type: linux-bridge
      state: up
      bridge:
        options:
          stp:
            enabled: false
        port:
        - name: ens8
      ipv4:
        enabled: false
        dhcp: false
      ipv6:
        enabled: false
  nodeSelector:
    node-role.kubernetes.io/worker: ''

$ oc get nncp
NAME       STATUS
br3-nncp   SuccessfullyConfigured

3. Create the DV

---
apiVersion: cdi.kubevirt.io/v1alpha1
kind: DataVolume
metadata:
  name: import-dv-http
spec:
  source:
    http:
      url: $URL
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: 5Gi
    storageClassName: ocs-storagecluster-ceph-rbd
    volumeMode: Block
  contentType: kubevirt


Actual results:
$ oc create -f dv.yaml
Error from server (InternalError): error when creating "dv": Internal error occurred: failed calling webhook "datavolume-mutate.cdi.kubevirt.io": Post "https://cdi-api.openshift-cnv.svc:443/datavolume-mutate?timeout=30s": dial tcp 10.128.2.11:8443: connect: no route to host


Expected results:
DV created normally with NNCP configured 


Additional info:
Sometimes the hco-webhook or hco-operator pod crashed and kept restarting

hco-operator-7d97fcbd65-w6z97                        0/1     CrashLoopBackOff 15

or 

hco-webhook-59cdb8fbc5-47cvk             0/1   CrashLoopBackOff  9  


$ oc logs -p hco-operator-7d97fcbd65-w6z97 -n openshift-cnv 
{"level":"info","ts":1607995346.9719403,"logger":"cmd","msg":"Go Version: go1.14.7"}
{"level":"info","ts":1607995346.971978,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1607995346.9720407,"logger":"cmd","msg":"Found namespace","Namespace":"openshift-cnv"}
I1215 01:22:28.023021       1 request.go:621] Throttling request took 1.039523169s, request: GET:https://172.30.0.1:443/apis/local.storage.openshift.io/v1alpha1?timeout=32s

$ oc logs -p -n openshift-cnv hco-webhook-59cdb8fbc5-47cvk
{"level":"info","ts":1607944002.9428325,"logger":"cmd","msg":"Go Version: go1.14.7"}
{"level":"info","ts":1607944002.9428897,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1607944002.9429717,"logger":"cmd","msg":"Found namespace","Namespace":"openshift-cnv"}
I1214 11:06:43.994586       1 request.go:621] Throttling request took 1.03801636s, request: GET:https://172.30.0.1:443/apis/network.openshift.io/v1?timeout=32s


Events:
  Type     Reason        Age                   From                     Message
  ----     ------        ----                  ----                     -------
  Normal   ReconcileHCO  76m (x2839 over 10h)  kubevirt-hyperconverged  HCO Reconcile completed successfully
  Warning  NodeNotReady  71m                   node-controller          Node is not ready
  Normal   Created       70m (x5 over 10h)     kubelet                  Created container hyperconverged-cluster-operator
  Normal   Started       70m (x5 over 10h)     kubelet                  Started container hyperconverged-cluster-operator
  Normal   Pulled        70m (x4 over 72m)     kubelet                  Container image "registry.redhat.io/container-native-virtualization/hyperconverged-cluster-operator@sha256:51a9fc1e24056dd104838944891e830e434ec7052e76cd74f665e2bfc2845c10" already present on machine
  Warning  Unhealthy     49m (x7 over 68m)     kubelet                  Readiness probe failed: Get "http://10.128.2.10:6060/readyz": dial tcp 10.128.2.10:6060: connect: connection refused
  Warning  BackOff       5m1s (x318 over 72m)  kubelet                  Back-off restarting failed container

Comment 1 Petr Horáček 2020-12-16 11:23:54 UTC

Yan, thanks for reporting this. Could you please share contents of `oc get nns -o yaml` before and after the configuration of NNCP?

Would you also record on which nodes are CDI and HCO running?

Comment 2 Yan Du 2020-12-16 12:30:31 UTC

Created attachment 1739619 [details]
pod status in openshift-cnv ns

Comment 3 Yan Du 2020-12-16 12:31:13 UTC

Created attachment 1739621 [details]
nns contents after configuration

Comment 4 Yan Du 2020-12-16 12:31:48 UTC

Hi, Petr

I attached the pod status and nns contents of post-configuration in attachment (I have preserved the cluster for debugging if needed). For the nns contents before configuration, is it ok to attach a fresh installed cluster's nns?

Comment 5 Meni Yakove 2020-12-21 10:12:39 UTC

Petr, 
may be related to https://bugzilla.redhat.com/show_bug.cgi?id=1904889
I don't think we got the fix for CNV yet.

Comment 6 Petr Horáček 2020-12-21 10:56:29 UTC

Thanks Yan, Meni.

I just (re-)attached the fixed version of nmstate to D/S. It should be soon available in errata as kubernetes-nmstate-handler-container-v2.6.0-11.

Could we retry once it is available and see if this issue persists?

Comment 7 Ofir Nash 2021-01-04 14:40:47 UTC

Verified.

Checked Scenario:
1. Config the NNCP (attached nncp.yaml).

2. Apply the NNCP - `oc apply -f nncp.yaml`

3. Get Status of NNCP- Success:
[cnv-qe-jenkins@network03-9n2rz-executor bug-nncp-dv]$ oc get nncp
NAME       STATUS
br3-nncp   SuccessfullyConfigured

4. Create dv from attached dv.yaml - `oc create -f dv.yaml`

5. DV created successfully with NNCP configured and no errors:
[cnv-qe-jenkins@network03-9n2rz-executor bug-nncp-dv]$ oc create -f dv.yaml 
datavolume.cdi.kubevirt.io/import-dv-http created

6. Checked also that hco-webhook and hco-operator pods are running after the creation.

* Checked also that VM creation works (came up from the orginal bug).

Comment 8 Ofir Nash 2021-01-04 14:42:42 UTC

Created attachment 1744329 [details]
NNCP Example for Verification

Comment 9 Ofir Nash 2021-01-04 14:43:02 UTC

Created attachment 1744330 [details]
DV Example for Verification

Comment 12 errata-xmlrpc 2021-03-10 11:22:41 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 2.6.0 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0799

Note You need to log in before you can comment on or make changes to this bug.