Description of problem: Version-Release number of selected component (if applicable): Kubevirt 0.44 How reproducible: Steps to Reproduce: 1. Install kubevirt 0.44 in a cluster in which nested virtualization is not enabled 2. Configure useEmulation: true 3. Create a VMI and observe that virt-handler gives an error Actual results: virt-handler gives this error {"component":"virt-handler","kind":"","level":"error","msg":"Synchronizing the VirtualMachineInstance failed.","name":"testvm","namespace":"vmsns","pos":"vm.go:1689","reason":"stat /proc/80230/root/dev/kvm: no such file or directory","timestamp":"2021-08-11T11:16:57.413693Z","uid":"dd96e05d-5d8f-457e-9742-07028b70a8ec"} Expected results: virt-handler should not check kvm when useEmulation: true Additional info: We are hitting this issue in hco's upstream tests. virt-handler logs are available here https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_release/20950/rehearse-20950-pull-ci-kubevirt-hyperconverged-cluster-operator-main-hco-e2e-upgrade-index-aws/1425398461880078336/artifacts/hco-e2e-upgrade-index-aws/gather-extra/artifacts/pods/kubevirt-hyperconverged_virt-handler-fnktt_virt-handler.log
This is a perhaps poorly named setting. Hardware emulation will be used if it's available--even if useEmulation is set to true. It's more of a fallback setting. Is this causing you to not be able to create a VMI at all or is this just an error message in the logs?
VMI doesn't enter Running phase. See https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_release/20950/rehearse-20950-pull-ci-kubevirt-hyperconverged-cluster-operator-main-hco-e2e-upgrade-index-aws/1425398461880078336/build-log.txt Also you can see the virt-launcher logs here https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_release/20950/rehearse-20950-pull-ci-kubevirt-hyperconverged-cluster-operator-main-hco-e2e-upgrade-index-aws/1425398461880078336/artifacts/hco-e2e-upgrade-index-aws/gather-extra/artifacts/pods/vmsns_virt-launcher-testvm-kf44c_compute.log There is another error there: {"component":"virt-launcher","level":"info","msg":"Connecting to libvirt daemon failed: virError(Code=38, Domain=7, Message='Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory')","pos":"libvirt.go:500","timestamp":"2021-08-11T11:16:35.046447Z"}
this is the error that I think is causing this. {"component":"virt-handler","kind":"","level":"error","msg":"Synchronizing the VirtualMachineInstance failed.","name":"testvm","namespace":"vmsns","pos":"vm.go:1689","reason":"stat /proc/53683/root/dev/kvm: no such file or directory","timestamp":"2021-08-11T11:04:56.937231Z","uid":"770d3a01-ee72-435b-8b85-004ba9d51100"} somewhere in virt-launcher a check for the kvm device is failing even when emulation is enabled.
> somewhere in virt-launcher a check for the kvm device is failing even when emulation is enabled. it's not virt-launcher. It's virt-handler attempting to unconditionally set permissions on the dev/kvm device even when the device doesn't exist. This was introduced in the non-root PR here https://github.com/kubevirt/kubevirt/pull/6041 in vm.go. the kubevirt/kubevirt project's CI does not validate that emulation mode works. If we consider this a critical code path, then we need to add this functionality as a pre-submit condition.
this pr should address this issue. https://github.com/kubevirt/kubevirt/pull/6190
To verify, follow steps to reproduce.
On a cluster, with VM's and with no Nested Virt Enabled. We can use the below command to enable Emulation. ]$ oc annotate --overwrite -n openshift-cnv hyperconverged kubevirt-hyperconverged kubevirt.kubevirt.io/jsonpatch='[{ "op": "add", "path": "/spec/configuration/developerConfiguration", "value": { "useEmulation": true } }]'
Used VM's with no Nested Virt Enabled. Was able to successfully create VM's with "useEmulation: true" using the command in comment7 As seen below, the "domain type" is "qemu" and not "kvm" ]$ oc rsh virt-launcher-vm2-rhel84-ocs-kz6cp sh-4.4# virsh list Id Name State ---------------------------------------- 1 default_vm2-rhel84-ocs running sh-4.4# virsh dumpxml default_vm2-rhel84-ocs <domain type='qemu' id='1'> <name>default_vm2-rhel84-ocs</name> ---------------------------- The VM booted successfully and the login also was a success. ]$ virtctl console vm2-rhel84-ocs Successfully connected to vm2-rhel84-ocs console. The escape sequence is ^] Red Hat Enterprise Linux 8.4 (Ootpa) Kernel 4.18.0-305.12.1.el8_4.x86_64 on an x86_64 Activate the web console with: systemctl enable --now cockpit.socket vm2-rhel84-ocs login: cloud-user Password: Last login: Fri Aug 27 11:53:13 on ttyS0 [cloud-user@vm2-rhel84-ocs ~]$ ---- Summary: VM boot's up, login is successful and no more any error from virt-handler as mentioned in comment3 Verified with: virt-operator/images/v4.9.0-30
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.9.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:4104