Bug 1727810

Summary: Starting VM fails due to bridge device annotation in Network Attachment Definition
Product: Container Native Virtualization (CNV) Reporter: Yossi Segev <ysegev>
Component: NetworkingAssignee: Dan Kenigsberg <danken>
Status: CLOSED NOTABUG QA Contact: Meni Yakove <myakove>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 2.0CC: cnv-qe-bugs, ncredi, ysegev
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-07-08 10:59:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
VM spec with secondat network interface none

Description Yossi Segev 2019-07-08 08:50:38 UTC
Created attachment 1588293 [details]
VM spec with secondat network interface

Description of problem:
When creating a VM with a secondary network interface, the VM fails to start if the Network Attachment Definition includes annotation of the bridge device name.

Version-Release number of selected component (if applicable):
OCP 4.1/CNV 2.0

How reproducible:
Always


Steps to Reproduce:
1. Create a bridge interface called br1 on all the worker (not master) nodes of a cluster:
 $ ip link add br1 type bridge
 $ ip link set br1 up

2. Create the following Network Attachment Definition, which include annotation for only worker nodes with br1 interface:
$ cat << EOF | oc create -f -
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: a-bridge-network
  annotations:
    k8s.v1.cni.cncf.io/resourceName: bridge-cni.network.kubevirt.io/br1                                                                                                                                            
spec:
  config: '{
    "cniVersion": "0.3.0",
    "name": "a-bridge-network",
    "type": "cnv-bridge",
    "bridge": "br1",
    "isGateway": true,
    "ipam": {}
}'

3. Start a VM using the attached VM spec (vm-cirros-my-secondary-nic.yaml):
 $ oc create -f vm-cirros-my-secondary-nic.yaml
 $ virtctl start vm-cirros

Actual results:
VMI remains stuck on "Scheduling" status, with this error (collected using "oc describe vmi"):
Status:
  Conditions:
    Last Probe Time:       2019-07-08T07:52:44Z
    Last Transition Time:  2019-07-08T07:52:44Z
    Message:               0/6 nodes are available: 3 Insufficient devices.kubevirt.io/kvm, 3 Insufficient devices.kubevirt.io/tun, 3 Insufficient devices.kubevirt.io/vhost-net, 3 node(s) didn't match node selector, 6 Insufficient bridge-cni.network.kubevirt.io/br1.
    Reason:                Unschedulable


Expected results:
VM should start successfully and move to "Running" status.

Additional info:
Workaround:
Remove the annotation line from the Network Attachment Definition spec.

Comment 1 Dan Kenigsberg 2019-07-08 09:48:58 UTC
We have this running on cnv-tests. Did you try running test_bridge_marker on your cluster?
Can you verify the bridge marker is running?

Comment 2 Yossi Segev 2019-07-08 10:05:00 UTC
I have just ran test_bridge_marker on that same cluster, and all 3 tests passed (non failure or skip).
As for verifying that bridge marker is running:
[cnv-qe-jenkins@cnv-executor-ysegev2 cnv-tests]$ oc get all --all-namespaces | grep marker
linux-bridge                                            pod/bridge-marker-2cfk4                                               1/1       Running     0          3d20h
linux-bridge                                            pod/bridge-marker-75bl8                                               1/1       Running     0          3d20h
linux-bridge                                            pod/bridge-marker-8qrnh                                               1/1       Running     1          3d20h
linux-bridge                                            pod/bridge-marker-97nvw                                               1/1       Running     5          3d20h
linux-bridge                                            pod/bridge-marker-hdhmz                                               1/1       Running     0          3d20h
linux-bridge                                            pod/bridge-marker-s6wpl                                               1/1       Running     1          3d20h

linux-bridge                             daemonset.apps/bridge-marker                   6         6         6         6            6           beta.kubernetes.io/arch=amd64     3d20h

This cluster has 3 workers and 3 masters, hence the 6 running pods. Anyway - it looks like bridge marker is functioning.

Comment 4 Yossi Segev 2019-07-08 10:59:42 UTC
Turns out that the bug is in the way the annotation is defined.
It is
k8s.v1.cni.cncf.io/resourceName: bridge-cni.network.kubevirt.io/br1
while it should be
k8s.v1.cni.cncf.io/resourceName: bridge.network.kubevirt.io/br1

My erroneous configuration derived from an error in the documentation.

Comment 5 Dan Kenigsberg 2019-07-08 12:04:19 UTC
> My erroneous configuration derived from an error in the documentation.

so this should be made a doc bug, or an upstream issue. Can you have one of the two?

Comment 7 Dan Kenigsberg 2019-07-08 12:28:23 UTC
> Why do you suspect it is an u/s issue?

I did not know which docs have mislead you.