1853911 – VM with dot in network name fails to start with unclear message

Bug 1853911 - VM with dot in network name fails to start with unclear message

Summary: VM with dot in network name fails to start with unclear message

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	2.4.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	2.6.0
Assignee:	Radim Hrazdil
QA Contact:	Yossi Segev
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-07-05 14:00 UTC by Yossi Segev
Modified:	2021-03-10 11:17 UTC (History)
CC List:	3 users (show)
Fixed In Version:	virt-api-container-v2.6.0-77
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-03-10 11:16:12 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
multus-vm-dot.yaml (2.24 KB, text/plain) 2020-07-05 14:02 UTC, Yossi Segev	no flags	Details
multus-vm.yaml (2.24 KB, text/plain) 2020-07-05 14:04 UTC, Yossi Segev	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	kubevirt kubevirt pull 4252	0	None	closed	Validate network interface name	2020-12-28 09:57:15 UTC
Red Hat Product Errata	RHSA-2021:0799	0	None	None	None	2021-03-10 11:17:36 UTC

Description Yossi Segev 2020-07-05 14:00:46 UTC

Description of problem:
When adding secondary NIC to a VM, you cannot define the network name with a dot in it. This is in spite of k8s official documentation, which allegedly allows this (https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names).
In addition - the message that is presented as a result of this failure is not clear and not relevant.


Version-Release number of selected component (if applicable):
CNV 2.4.0


How reproducible:
Always


Steps to Reproduce:
1.
On a cluster with multi physical NIC on its node - create a bridge interface over one of the secondary interfaces, by applying the following policy:

cat << EOF | oc apply -f -
apiVersion: nmstate.io/v1alpha1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: ens8-br-nncp
spec:
  desiredState:
    interfaces:
    - bridge:
        options:
          stp:
            enabled: false
        port:
        - name: ens8
      ipv6:
        enabled: false
      name: ens8-br
      state: up
      type: linux-bridge
  nodeSelector:
    kubernetes.io/hostname: host-172-16-0-34
EOF

Change the value of "kubernetes.io/hostname" to the hostname of a worker in your cluster.
Change the name of the bridge port (under interfaces[0].bridge.port.name) to the one a physcial secondary interface in the node. (It is also recommended to change the bridge name and NNCP name accordingly, but it's not mandatory).

2.
Create a NetworkAttachmentDefinition of the bridge you configured:
$ cat << EOF | oc apply -f -
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  annotations:
    k8s.v1.cni.cncf.io/resourceName: bridge.network.kubevirt.io/ens8-br
  name: ens8-br-nad
  namespace: yoss-ns
spec:
  config: '{"cniVersion": "0.3.1", "name": "ens8-br", "plugins": [{"type":
    "cnv-bridge", "bridge": "ens8-br"}]}'
EOF

Change the bridge name according to the name you set in the previous step.

3.
Create a VM using the attached spec:
$ oc apply -f multus-vm-dot.yaml
(Change the nodeSelector value o the same node you used when configuring the NNCP).

4.
Start the VM:
$ virtctl start multus-vm

5.
Follow the progress of the VMI:
$ oc get vmi -w


Actual results:
<BUG>
1.
The VMI enters an infinite loop of scheduling, and can't get to Running state (in practice what happens is that on each iteration - a new VMI is created):
NAME        AGE   PHASE        IP    NODENAME
multus-vm   3s    Scheduling         
multus-vm   8s    Scheduled    10.128.2.129   host-172-16-0-34
multus-vm   9s    Scheduled    10.128.2.129   host-172-16-0-34
multus-vm   9s    Scheduled    10.128.2.129   host-172-16-0-34
...
multus-vm   10s   Scheduled    10.128.2.129   host-172-16-0-34
multus-vm   10s   Scheduled    10.128.2.129   host-172-16-0-34
multus-vm   11s   Failed       10.128.2.129   host-172-16-0-34
multus-vm   11s   Failed       10.128.2.129   host-172-16-0-34
multus-vm   11s   Failed       10.128.2.129   host-172-16-0-34
multus-vm   0s                                
multus-vm   0s                                
multus-vm   0s    Pending                     
multus-vm   0s    Scheduling                  
multus-vm   1s    Scheduling                  

2.
There is no clear message indicating that the failure is due to the name of network. When the VMI is gone and a new one is created, I see this output in the VMI describe:
$ oc describe vmi dhcp-server-vm
...
Events:
  Type     Reason            Age               From                                                         Message
  ----     ------            ----              ----                                                         -------
  Normal   SuccessfulCreate  10s               virtualmachine-controller                                    Created virtual machine pod virt-launcher-dhcp-server-vm-jt9s5
  Warning  SyncFailed        1s                virt-handler, master-2.cnvcl2.cnvqe.lab.eng.rdu2.redhat.com  unknown error encountered sending command SyncVMI: rpc error: code = Unavailable desc = transport is closing
  Warning  SyncFailed        1s (x11 over 1s)  virt-handler, master-2.cnvcl2.cnvqe.lab.eng.rdu2.redhat.com  failed to detect VMI pod: dial unix //pods/4701df26-d742-47da-85f4-b0770ec986f3/volumes/kubernetes.io~empty-dir/sockets/launcher-sock: connect: connection refused


Expected results:
1. According to https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names - dot is a valid character in object names (and this configuration used to work properly on previous versions).
2. Upon such failure - there must be a clear message indicating that the failure is due to an invalid network name. It took a lot of debugging time from both QE and dev just to simplify the scenario and isolate it the case of bad network name.


Additional info:
1.
The problem is with the dot in the network nbame. Using the attached vm-multus.yaml works, when the difference between both spec if the dot vs. hyphen in the network name: $ diff multus-vm*.yaml
43c43
<             name: ens8.br
---
>             name: ens8-br
55c55
<         name: ens8.br
---
>         name: ens8-br

2.
This issue also occurs the *default* network name includes a dot (for example "d.efault").
So I assume the problem is not in multus, but elsewhere.

Comment 1 Yossi Segev 2020-07-05 14:02:35 UTC

Created attachment 1699946 [details]
multus-vm-dot.yaml

Failed VM spec

Comment 2 Yossi Segev 2020-07-05 14:03:25 UTC

Comment on attachment 1699946 [details]
multus-vm-dot.yaml

Failed VM spec.

Comment 3 Yossi Segev 2020-07-05 14:04:09 UTC

Created attachment 1699947 [details]
multus-vm.yaml

Valid VM spec

Comment 4 Yossi Segev 2020-12-21 15:45:53 UTC

Radim, an you please specify what exactly is the fix for this bug. Is it fixing the error message, supporting dots in network name, or both?

Comment 5 Radim Hrazdil 2020-12-28 21:39:03 UTC

Hello Yossi, you can see the change in the linked PR (https://github.com/kubevirt/kubevirt/pull/4252/files).
The issue was fixed by not allowing dots in network interface name.
So user shouldn't be able to create a VM with NIC with a dot in its name.


The following error msg should be returned: "Network interface name can only contain alphabetical characters, numbers, dashes (-) or underscores (_)"

Comment 6 Yossi Segev 2020-12-29 11:36:56 UTC

Verified in:
OCP 4.7.0-fc.0
CNV 2.6.0
virt-api: registry.redhat.io/container-native-virtualization/virt-api@sha256:e5888f4b00be83a48f43737ae78ea9d7909de865fa7c45a19d069047347917a0

Verified by repeating the original scenario form the bug description, the result is that the creation of the VM was rejected with the message Radim mentioned in comment #5.

Comment 9 errata-xmlrpc 2021-03-10 11:16:12 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 2.6.0 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0799

Note You need to log in before you can comment on or make changes to this bug.