Bug 2218671

Summary: Need clearer message when migration failed due to missing resource
Product: Container Native Virtualization (CNV) Reporter: Yossi Segev <ysegev>
Component: VirtualizationAssignee: sgott
Status: NEW --- QA Contact: Kedar Bidarkar <kbidarka>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.14.0CC: alkaplan, gouyang, hstastna, nrozen, phoracek
Target Milestone: ---   
Target Release: 4.15.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yossi Segev 2023-06-29 19:29:23 UTC
Description of problem:
When attempting to hot-plug an interface to a guest VM using `virtctl addinterface` command, without creating a backing bridge on the node first, the command fails as expected, but the message doesn't notify the user what is missing:
```
$ virtctl addinterface vm-fedora --network-attachment-definition-name hp-br-nad --name hp2 
the server could not find the requested resource
```
The VM and the NetworkAttachmentDefinition are there (in the same namespace), and what is missing is the `hp-br` bridge interface on the nodes, which the NAD refers to:

```$ oc get vm
NAME        AGE   STATUS    READY
vm-fedora   16m   Running   True
$
$ oc get net-attach-def hp-br-nad -o yaml
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  annotations:
    k8s.v1.cni.cncf.io/resourceName: bridge.network.kubevirt.io/hp-br
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"k8s.cni.cncf.io/v1","kind":"NetworkAttachmentDefinition","metadata":{"annotations":{"k8s.v1.cni.cncf.io/resourceName":"bridge.network.kubevirt.io/hp-br"},"name":"hp-br-nad","namespace":"yoss-ns"},"spec":{"config":"{\"cniVersion\": \"0.3.1\", \"name\": \"hp-br\", \"plugins\": [{\"type\": \"cnv-bridge\", \"bridge\": \"hp-br\"}]}"}}
  creationTimestamp: "2023-06-29T18:37:35Z"
  generation: 1
  name: hp-br-nad
  namespace: yoss-ns
  resourceVersion: "313420"
  uid: 44422089-e2e0-417a-811e-ff4367a1789e
spec:
  config: '{"cniVersion": "0.3.1", "name": "hp-br", "plugins": [{"type": "cnv-bridge",
    "bridge": "hp-br"}]}'
```


Version-Release number of selected component (if applicable):
CNV 4.14.0
virtctl version:
Client Version: version.Info{GitVersion:"v1.0.0-beta.0-188-g39dc57cad", GitCommit:"39dc57cad5ea6f882b847fd1ab312ee951d8cd9c", GitTreeState:"clean", BuildDate:"2023-06-04T04:31:39Z", GoVersion:"go1.19.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{GitVersion:"v1.0.0-beta.0-449-g2d9380079", GitCommit:"2d93800792c87fd0f744a13f80ce82260dd6e279", GitTreeState:"clean", BuildDate:"2023-06-28T15:10:09Z", GoVersion:"go1.19.9", Compiler:"gc", Platform:"linux/amd64"}


How reproducible:
Always


Steps to Reproduce:
1. Create and run a basic VM (no secondary NICs).
$ oc create ns yoss-ns
namespace/yoss-ns created
$ oc project yoss-ns
Now using project "yoss-ns" on server "https://api.net-ys-414o.rhos-psi.cnv-qe.rhood.us:6443".
$ oc apply -f vm-fedora.yaml 
virtualmachine.kubevirt.io/vm-fedora created
$ virtctl start vm-fedora
VM vm-fedora was scheduled to start

2. Create a NetworkattachmentDefinition.
$ oc apply -f bridge-nad.yaml 
networkattachmentdefinition.k8s.cni.cncf.io/hp-br-nad created

4. Run the command to add the new interface to the VM:
virtctl addinterface <vm-name> --network-attachment-definition-name <net-attach-def-name> --name <interface-name>
$ virtctl addinterface vm-fedora --network-attachment-definition-name hp-br-nad --name hp2


Actual results:
Failure message doesn't specify which is the missing resource, which can be quite confusing for the user:
```
the server could not find the requested resource
```


Expected results:
A clearer message, which indicates which is the missing resource.
In this case, for example:
```
the server could not find bridge resource 'hp-br', which is referenced in NetworkAttachmentDefinition `hp-br-nad`.
```

Comment 6 Alona Kaplan 2023-07-18 12:33:39 UTC
Yossi, are you sure this behavior can be reproduced and is not caused because of another issue in your env?
If there is no bridge, it should be created by the CNI.

Comment 7 Alona Kaplan 2023-07-18 12:46:27 UTC
My mistake, the bridge won't be created. By anyway, we don't expect to see any error.

Comment 8 Yossi Segev 2023-07-18 13:55:03 UTC
(In reply to Alona Kaplan from comment #7)
> My mistake, the bridge won't be created. By anyway, we don't expect to see
> any error.

The `addinterface` action doesn't fail, but the migration fails (after timeout), but without any clear message, which should explain that it failed because the bridge interface resource is missing.
This is what currently seen in the migration describe:

Events:
  Type     Reason                           Age                    From                       Message
  ----     ------                           ----                   ----                       -------
  Normal   SuccessfulCreate                 8m16s                  virtualmachine-controller  Created migration target pod virt-launcher-vm-fedora-9nwxt
  Warning  migrationTargetPodUnschedulable  3m17s (x5 over 8m16s)  virtualmachine-controller  Migration target pod for VMI [yoss-ns/vm-fedora] is currently unschedulable.
  Normal   SuccessfulDelete                 3m17s                  virtualmachine-controller  unschedulable pod timeout period exceeded%!(EXTRA string=virt-launcher-vm-fedora-9nwxt)
  Warning  FailedMigration                  3m17s (x2 over 3m17s)  virtualmachine-controller  Migration target pod was removed during active migration.

Comment 9 Alona Kaplan 2023-07-18 14:11:56 UTC
Can you please share the `oc describe` of the VM?

Comment 12 Alona Kaplan 2023-07-18 15:26:53 UTC
Migration failing message due to a missing resource message is not a network bug, moving to compute team. Also changing the title to reflect the real issue.

Comment 13 Kedar Bidarkar 2023-07-19 12:20:17 UTC
Targeting it for CNV-4.15 due to the severity and also it is about ensuring a clear message and not broken functionality.