Bug 1881930
Summary: | Failure to create tap device upon VM creation | ||||||
---|---|---|---|---|---|---|---|
Product: | Container Native Virtualization (CNV) | Reporter: | Yossi Segev <ysegev> | ||||
Component: | Networking | Assignee: | Miguel Duarte Barroso <mduarted> | ||||
Status: | CLOSED ERRATA | QA Contact: | Meni Yakove <myakove> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 2.5.0 | CC: | cnv-qe-bugs, danken, lbednar, ncredi, phoracek | ||||
Target Milestone: | --- | Keywords: | Regression, TestBlocker | ||||
Target Release: | 2.5.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | virt-launcher-container-v2.5.0-56 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2020-11-17 13:24:24 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Another finding from Miguel's investigation is that this bug affects connectivity, so when verifying this bug - add verifying the connectivity via the primary interface, in addition to verifying points 1 (No such warning/error) and 2 (Tap device exists for every interface in the VM) in the original bug description. For whatever reason, the build process upstream and downstream are entirely different - upstream uses bazel, while downstream does not. As a direct consequence of this, d/s, we end up compiling the KubeVirt binaries without selinux support, which causes the selinux stub to be used. Using the stub makes it impossible for virt-handler to read the correct selinux context of virt-launcher, and also make impossible for the handler to switch context. Not being able to switch context creates the tap device with the incorrect labels, which ultimately prevents libvirt from opening it. Verified on OCP 4.6.0-fc.9 / CNV v2.5.0 by checking the 3 expected results in the bug description + comment #1: 1. The warning error doesn't appear in any of VMI description, virt-handler log or virt-launcher log. 2. Tap device exists for the single interface (default eth0) on the VM (checked in the virt-launcher's domxml). 3. Valid connectivity (using ping) between 2 created VMs. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Virtualization 2.5.0 Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:5127 |
Created attachment 1716019 [details] vm-fedora.yaml Description of problem: starting a VM results in a warning message in the VMI description about a failure to create a tap device. Version-Release number of selected component (if applicable): OCP version: Client Version: 4.6.0-202009212020.p0-0a57069 Server Version: 4.6.0-fc.7 Kubernetes Version: v1.19.0+b4ffb45 CNV version: 2.5.0 How reproducible: Always Steps to Reproduce: 1. In an OCP 4.6/CNV 2.5 cluster - create a VM: $ oc apply -f vm-fedora.yaml virtualmachine.kubevirt.io/vm-fedora created The Fedora VM spec yaml I used is attached. 2. Start the VM: $ virtctl start vm-fedora VM vm-fedora was scheduled to start 3. Wait for the VMI to get to Running state: $ oc get vmi vm-fedora -w NAME AGE PHASE IP NODENAME vm-fedora 1s Scheduling vm-fedora 5s Scheduled myakove-8ljbm-worker-0-xvkrt vm-fedora 5s Scheduled myakove-8ljbm-worker-0-xvkrt vm-fedora 7s Scheduled myakove-8ljbm-worker-0-xvkrt vm-fedora 8s Running 10.128.3.126 myakove-8ljbm-worker-0-xvkrt vm-fedora 8s Running 10.128.3.126 myakove-8ljbm-worker-0-xvkrt 4. Check the VM description (specifically the Events section): $ oc describe vmi vm-fedora Actual results: $ oc describe vmi vm-fedora ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulCreate 3m31s virtualmachine-controller Created virtual machine pod virt-launcher-vm-fedora-jskm4 Warning SyncFailed 3m25s virt-handler server error. command SyncVMI failed: "LibvirtError(Code=38, Domain=0, Message='Unable to create tap device tap0: Permission denied')" Normal Started 3m24s virt-handler VirtualMachineInstance started. Normal Created 48s (x11 over 3m24s) virt-handler VirtualMachineInstance defined. <BUG> The warning message about failure to create tap device. Expected results: 1. No such warning/error. 2. Tap device exists for every interface in the VM. To verify that - dump the domxml of the virt-launcher pod of the VMI: a. Find the virt-launcher pod: $ oc get pod | grep launcher virt-launcher-vm-fedora-n2v7p 2/2 Running 0 40m b. Find the Id of the VMI domain: [cnv-qe-jenkins@myakove-8ljbm-executor yossi]$ oc exec -it virt-launcher-vm-fedora-n2v7p -- virsh list Defaulting container name to compute. Use 'oc describe pod/virt-launcher-vm-fedora-n2v7p -n yoss-ns' to see all of the containers in this pod. Id Name State ----------------------------------- 2 yoss-ns_vm-fedora running c. Dump the domxml for this domain (which is "2" in this example) [cnv-qe-jenkins@myakove-8ljbm-executor yossi]$ oc exec -it virt-launcher-vm-fedora-n2v7p -- virsh dumpxml 2 d. Search for the ethernet entries - they all should have tap device defined in them, for example: <interface type='ethernet'> ... <target dev='tap0' managed='no'/> <model type='virtio'/> ... </interface> Additional info: This error can also be found in the virt-handler and virt-launcher logs: a. virt-handler: $ oc get vmi vm-fedora NAME AGE PHASE IP NODENAME vm-fedora 49m Running 10.128.3.126 myakove-8ljbm-worker-0-xvkrt [cnv-qe-jenkins@myakove-8ljbm-executor yossi]$ oc get pods -n openshift-cnv -o wide | grep "virt-handler" | grep "myakove-8ljbm-worker-0-xvkrt" virt-handler-8xlfq 1/1 Running 0 21h 10.128.2.4 myakove-8ljbm-worker-0-xvkrt <none> <none> $ oc logs virt-handler-8xlfq -n openshift-cnv ... {"component":"virt-handler","kind":"","level":"error","msg":"Synchronizing the VirtualMachineInstance failed.","name":"oper-test-vm-1600844833.408544","namespace":"cluster-addons-operator-test-network-addons-operator","pos":"vm.go:1328","reason":"server error. command SyncVMI failed: \"LibvirtError(Code=38, Domain=0, Message='Unable to create tap device tap0: Permission denied')\"","timestamp":"2020-09-23T07:07:56.769890Z","uid":"ce26416b-ee8b-4089-9fc7-1110acf49f92"} ... b. virt-launcher: $ oc get pods | grep "virt-launcher" virt-launcher-vm-fedora-n2v7p 2/2 Running 0 52m $ oc logs virt-launcher-vm-fedora-tpqk2 -c compute ... {"component":"virt-launcher","kind":"","level":"error","msg":"Starting the VirtualMachineInstance failed.","name":"vm-fedora","namespace":"yoss-ns","pos":"manager.go:1245","reason":"virError(Code=38, Domain=0, Message='Unable to create tap device tap0: Permission denied')","timestamp":"2020-09-23T11:22:47.393442Z","uid":"cc269f09-1d8e-489d-84bd-f03f067089ff"} {"component":"virt-launcher","kind":"","level":"error","msg":"Failed to sync vmi","name":"vm-fedora","namespace":"yoss-ns","pos":"server.go:161","reason":"virError(Code=38, Domain=0, Message='Unable to create tap device tap0: Permission denied')","timestamp":"2020-09-23T11:22:47.393609Z","uid":"cc269f09-1d8e-489d-84bd-f03f067089ff"} ...