Description of problem ---------------------- Creation of VirtualMachines is failing due an error with cgroups in virt-launcher Pod: > server error. command SyncVMI failed: "LibvirtError(Code=38, Domain=54, Message='Failed to create v1 controller cpu for group: Read-only file system')" Version ------- OCP: v4.9.0-0.nightly-2021-07-24-064622 RHCOS: v49.84.202107220219-0 CNV: http://cnv-version-explorer.apps.cnv.engineering.redhat.com/BundleDetails?ver=v4.9.0-57 How reproducible ---------------- 100% Steps to Reproduce ------------------ Create a basic VM: > --- > kind: VirtualMachine > apiVersion: kubevirt.io/v1 > metadata: > name: cirros > spec: > template: > spec: > domain: > cpu: > cores: 1 > devices: > disks: > - name: rootdisk > disk: > bus: virtio > resources: > requests: > memory: '128Mi' > volumes: > - name: rootdisk > dataVolume: > name: cirros-rootdisk > running: true > dataVolumeTemplates: > - metadata: > name: cirros-rootdisk > spec: > source: > http: > url: http://cnv-qe-server.rhevdev.lab.eng.rdu2.redhat.com/files/cnv-tests/cirros-images/cirros-0.5.1-x86_64-disk.img > pvc: > accessModes: > - ReadWriteOnce > resources: > requests: > storage: '150Mi' Actual results -------------- The VirtualMachineInstance stays in Scheduled Phase because the virt-launcher Pod is facing issues with cgroups: > {"component":"virt-launcher","level":"error","msg":"Failed to create v1 controller cpu for group: Read-only file system","pos":"virCgroupV1MakeGroup:675","subcomponent":"libvirt","thread":"34","timestamp":"2021-07-24T19:34:56.099000Z"} > {"component":"virt-launcher","kind":"","level":"error","msg":"Failed to start VirtualMachineInstance with flags 0.","name":"cirros","namespace":"openshift-cnv","pos":"manager.go:827","reason":"virError(Code=38, Domain=54, Message='Failed to create v1 controller cpu for group: Read-only file system')","timestamp":"2021-07-24T19:34:56.305333Z","uid":"d6b381b1-367b-4882-a6a4-9fc1b84745b5"} > {"component":"virt-launcher","kind":"","level":"error","msg":"Failed to sync vmi","name":"cirros","namespace":"openshift-cnv","pos":"server.go:184","reason":"virError(Code=38, Domain=54, Message='Failed to create v1 controller cpu for group: Read-only file system')","timestamp":"2021-07-24T19:34:56.305505Z","uid":"d6b381b1-367b-4882-a6a4-9fc1b84745b5"} Expected results ---------------- The VirtualMachine should start properly.
This bug's root cause has been found and is now fixed upstream with this PR: https://github.com/kubevirt/kubevirt/pull/6153. As explained in the PR itself: Very long story short: Background: multiple libvirtd processes can run for multiple VMs, some of them can be root and some non-root. QEMU's configuration file path is different for root / non-root VMS. For root VMs it's /etc/libvirt/qemu.conf For non-root VMs it's /var/run/libvirt/qemu.conf. (for more info: https://libvirt.org/manpages/libvirtd.html#when-run-as-non-root) In Kubevirt, we also add cgroup_controllers = [ ] string to the configuration file (here: https://github.com/kubevirt/kubevirt/blob/main/pkg/virt-launcher/virtwrap/util/libvirt_helper.go#L454). Bug root cause: As can be seen by this PR, the bug is that the wrong configuration file (the non-root one) is being chosen also for root VMs. Bug outcome: The outcome is this bug. Deep in libvirt's code there an if-else branch (in virCgroupV1DetectControllers() function) that depends on the number on controllers defined in QEMU config file. Previously it was 0, since we had cgroup_controllers = [ ] in the config file, but since this bug causes us to look at the wrong config file (non-root one) the actual config file doesn't have cgroup_controllers defined at all, therefor in libvirt the number of controllers is determined to be -1. This change in libvirt code-path breaks Kubevirt and causes VMs to stay in Scheduled mode until they fail. We need to make sure the configuration file is set up correctly to fix this as this PR does. Thanks very much to @dollierp for helping me with this bug!
It has been mitigated by modifying the default /etc/libvirt/qemu.conf file. Removing blocker tags.
Verified with http://cnv-version-explorer.apps.cnv.engineering.redhat.com/BundleDetails?ver=v4.9.0-79. virt-launcher does not create file /var/run/libvirt/qemu.conf anymore for root VMs and overrides the file /etc/libvirt/qemu.conf instead.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.9.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:4104