Bug 1985670 - virt-launcher fails to create v1 controller cpu for group: Read-only file system
Summary: virt-launcher fails to create v1 controller cpu for group: Read-only file system
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 4.9.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.9.0
Assignee: Itamar Holder
QA Contact: Israel Pinto
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-24 19:58 UTC by Denis Ollier
Modified: 2021-11-02 15:59 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-11-02 15:59:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt kubevirt pull 6153 0 None open Don't copy qemu.conf when running as root 2021-08-04 10:24:46 UTC
Red Hat Product Errata RHSA-2021:4104 0 None None None 2021-11-02 15:59:53 UTC

Description Denis Ollier 2021-07-24 19:58:01 UTC
Description of problem
----------------------

Creation of VirtualMachines is failing due an error with cgroups in virt-launcher Pod:

> server error. command SyncVMI failed: "LibvirtError(Code=38, Domain=54, Message='Failed to create v1 controller cpu for group: Read-only file system')"

Version
-------

OCP: v4.9.0-0.nightly-2021-07-24-064622
RHCOS: v49.84.202107220219-0
CNV: http://cnv-version-explorer.apps.cnv.engineering.redhat.com/BundleDetails?ver=v4.9.0-57

How reproducible
----------------

100%

Steps to Reproduce
------------------

Create a basic VM:

> ---
> kind: VirtualMachine
> apiVersion: kubevirt.io/v1
> metadata:
>   name: cirros
> spec:
>   template:
>     spec:
>       domain:
>         cpu:
>           cores: 1
>         devices:
>           disks:
>             - name: rootdisk
>               disk:
>                 bus: virtio
>         resources:
>           requests:
>             memory: '128Mi'
>       volumes:
>         - name: rootdisk
>           dataVolume:
>             name: cirros-rootdisk
>   running: true
>   dataVolumeTemplates:
>     - metadata:
>         name: cirros-rootdisk
>       spec:
>         source:
>           http:
>             url: http://cnv-qe-server.rhevdev.lab.eng.rdu2.redhat.com/files/cnv-tests/cirros-images/cirros-0.5.1-x86_64-disk.img
>         pvc:
>           accessModes:
>             - ReadWriteOnce
>           resources:
>             requests:
>               storage: '150Mi'

Actual results
--------------

The VirtualMachineInstance stays in Scheduled Phase because the virt-launcher Pod is facing issues with cgroups:

> {"component":"virt-launcher","level":"error","msg":"Failed to create v1 controller cpu for group: Read-only file system","pos":"virCgroupV1MakeGroup:675","subcomponent":"libvirt","thread":"34","timestamp":"2021-07-24T19:34:56.099000Z"}
> {"component":"virt-launcher","kind":"","level":"error","msg":"Failed to start VirtualMachineInstance with flags 0.","name":"cirros","namespace":"openshift-cnv","pos":"manager.go:827","reason":"virError(Code=38, Domain=54, Message='Failed to create v1 controller cpu for group: Read-only file system')","timestamp":"2021-07-24T19:34:56.305333Z","uid":"d6b381b1-367b-4882-a6a4-9fc1b84745b5"}
> {"component":"virt-launcher","kind":"","level":"error","msg":"Failed to sync vmi","name":"cirros","namespace":"openshift-cnv","pos":"server.go:184","reason":"virError(Code=38, Domain=54, Message='Failed to create v1 controller cpu for group: Read-only file system')","timestamp":"2021-07-24T19:34:56.305505Z","uid":"d6b381b1-367b-4882-a6a4-9fc1b84745b5"}

Expected results
----------------

The VirtualMachine should start properly.

Comment 2 Itamar Holder 2021-07-29 08:09:12 UTC
This bug's root cause has been found and is now fixed upstream with this PR: https://github.com/kubevirt/kubevirt/pull/6153.

As explained in the PR itself:


Very long story short:

Background:
multiple libvirtd processes can run for multiple VMs, some of them can be root and some non-root. QEMU's configuration file path is different for root / non-root VMS.

For root VMs it's /etc/libvirt/qemu.conf
For non-root VMs it's /var/run/libvirt/qemu.conf.
(for more info: https://libvirt.org/manpages/libvirtd.html#when-run-as-non-root)

In Kubevirt, we also add cgroup_controllers = [ ] string to the configuration file (here: https://github.com/kubevirt/kubevirt/blob/main/pkg/virt-launcher/virtwrap/util/libvirt_helper.go#L454).

Bug root cause:
As can be seen by this PR, the bug is that the wrong configuration file (the non-root one) is being chosen also for root VMs.

Bug outcome:
The outcome is this bug. Deep in libvirt's code there an if-else branch (in virCgroupV1DetectControllers() function) that depends on the number on controllers defined in QEMU config file. Previously it was 0, since we had cgroup_controllers = [ ] in the config file, but since this bug causes us to look at the wrong config file (non-root one) the actual config file doesn't have cgroup_controllers defined at all, therefor in libvirt the number of controllers is determined to be -1.

This change in libvirt code-path breaks Kubevirt and causes VMs to stay in Scheduled mode until they fail.

We need to make sure the configuration file is set up correctly to fix this as this PR does.


Thanks very much to @dollierp for helping me with this bug!

Comment 3 Denis Ollier 2021-08-02 16:13:27 UTC
It has been mitigated by modifying the default /etc/libvirt/qemu.conf file.

Removing blocker tags.

Comment 4 Denis Ollier 2021-08-06 20:21:05 UTC
Verified with http://cnv-version-explorer.apps.cnv.engineering.redhat.com/BundleDetails?ver=v4.9.0-79.

virt-launcher does not create file /var/run/libvirt/qemu.conf anymore for root VMs and overrides the file /etc/libvirt/qemu.conf instead.

Comment 7 errata-xmlrpc 2021-11-02 15:59:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.9.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4104


Note You need to log in before you can comment on or make changes to this bug.