Bug 1842716

Summary: Failure to spawn VMs with CPU pinning
Product: Red Hat Enterprise Linux 7 Reporter: David Hill <dhill>
Component: libvirtAssignee: Libvirt Maintainers <libvirt-maint>
Status: CLOSED WORKSFORME QA Contact: jiyan <jiyan>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.8CC: berrange, jsuchane, lhuang, lmen, phrdina, xuzhang, yalzhang
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-06-10 17:55:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Hill 2020-06-01 22:46:31 UTC
Description of problem:
Failure to spawn VMs with CPU pinning as it fails with the following error:

2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [req-127a4dfa-026d-44d7-9e40-689e8f15726d a0eaf04e8f5e4d4a9ec64f9bde40ec28 1d6d9fd1f5624118ae7c00258327d8e0 - default default] [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e] Instance 
failed to spawn: libvirtError: Unable to write to '/sys/fs/cgroup/cpuset/machine.slice/machine-qemu\x2d1\x2dinstance\x2d00000145.scope/emulator/cpuset.cpus': Permission denied
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e] Traceback (most recent call last):
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2423, in _build_resources
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]     yield resources
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2203, in _build_and_run_instance
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]     block_device_info=block_device_info)
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3186, in spawn
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]     destroy_disks_on_failure=True)
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5709, in _create_domain_and_network
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]     destroy_disks_on_failure)
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]     self.force_reraise()
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]     six.reraise(self.type_, self.value, self.tb)
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5678, in _create_domain_and_network
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]     post_xml_callback=post_xml_callback)
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5613, in _create_domain
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]     guest.launch(pause=pause)
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 144, in launch
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]     self._encoded_xml, errors='ignore')
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]     self.force_reraise()
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]     six.reraise(self.type_, self.value, self.tb)
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 139, in launch
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]     return self._domain.createWithFlags(flags)
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 186, in doit
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]     result = proxy_call(self._autowrap, f, *args, **kwargs)
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 144, in proxy_call
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]     rv = execute(f, *args, **kwargs)
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 125, in execute
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]     six.reraise(c, e, tb)
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]     rv = meth(*args, **kwargs)
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1110, in createWithFlags
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e]     if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
2020-05-27 12:34:37.366 8 ERROR nova.compute.manager [instance: 1d4c177b-643f-494d-8da0-5ade5d3ab26e] libvirtError: Unable to write to '/sys/fs/cgroup/cpuset/machine.slice/machine-qemu\x2d1\x2dinstance\x2d00000145.scope/emulator/cpuset.cp
us': Permission denied



Version-Release number of selected component (if applicable):

                "name": "rhosp13/openstack-nova-libvirt",
                "release": "141",
                "summary": "Red Hat OpenStack Platform 13.0 nova-libvirt",
                "url": "https://access.redhat.com/containers/#/registry.access.redhat.com/rhosp13/openstack-nova-libvirt/images/13.0-141",
                "vcs-ref": "70c66538af02d6fc12222ae62e8bac4f835ddf50",
                "vcs-type": "git",
                "vendor": "Red Hat, Inc.",
                "version": "13.0"


How reproducible:
This environment

Steps to Reproduce:
1. Boot a VM with CPU pinning
2.
3.

Actual results:
Fails to spawn

Expected results:
Succeed

Additional info:
We tried with selinux in permissive
All CPUs are online and the compute node was rebooted numeros times

  <cputune>
    <shares>8192</shares>
    <vcpupin vcpu='0' cpuset='9'/>
    <vcpupin vcpu='1' cpuset='37'/>
    <vcpupin vcpu='2' cpuset='11'/>
    <vcpupin vcpu='3' cpuset='39'/>
    <vcpupin vcpu='4' cpuset='34'/>
    <vcpupin vcpu='5' cpuset='6'/>
    <vcpupin vcpu='6' cpuset='40'/>
    <vcpupin vcpu='7' cpuset='12'/>
    <emulatorpin cpuset='6,9,11-12,34,37,39-40'/>
  </cputune>

We've also tried removing CPUAffinity setting in /etc/systemd/system.conf and all cpu pinning in the kernel boot arguments ...

Comment 2 David Hill 2020-06-01 22:56:05 UTC
I have the same version in my lab and I wasn't able to reproduce this issue ...

Comment 3 Daniel Berrangé 2020-06-02 11:35:42 UTC
As a first step it'd be desirable to see  the  cpuset.cpus  content for /sys/fs/cgroup/cpuset/machine.slice/  and /sys/fs/cgroup/cpuset   too for the machine which exhibits this behaviour

Comment 4 Daniel Berrangé 2020-06-02 11:51:22 UTC
In the sosreport

$ cat /sys/fs/cgroup/cpuset/machine.slice/cpuset.cpus
0,14,28,42

$ cat /sys/fs/cgroup/cpuset/cpuset.cpus
0-55


And in the guest XML config we have

  <cputune>
    <shares>8192</shares>
    <vcpupin vcpu='0' cpuset='9'/>
    <vcpupin vcpu='1' cpuset='37'/>
    <vcpupin vcpu='2' cpuset='11'/>
    <vcpupin vcpu='3' cpuset='39'/>
    <vcpupin vcpu='4' cpuset='34'/>
    <vcpupin vcpu='5' cpuset='6'/>
    <vcpupin vcpu='6' cpuset='40'/>
    <vcpupin vcpu='7' cpuset='12'/>
    <emulatorpin cpuset='6,9,11-12,34,37,39-40'/>
  </cputune>

So the problem here is that something has configured a very restrictive CPU mask on machine.slice cgroup.  The guest is configued to use CPUs that are outside this mask, and this si likely why they get EPERM.

The key question is who/what has configured machine.slice  cpuset.cpus in this way, and why.

Comment 5 Pavel Hrdina 2020-06-02 12:41:34 UTC
One possible reason why machine.slice cpuset.cpus was modified and doesn't contain all host CPUs is by changing online CPUs on the host. This is the main issue with cpuset in cgroups v1 where disabling host CPU changes the cpuset.cpus but enabling it again will not set it back as it cannot know if the changes was done by disabling that CPU or by user changing cpuset.cpus.

In cgroups v2 this is solved by having cpuset.cpus and cpuset.cpus.effective where user changes values in cpuset.cpus and cpuset.cpus.effective reflects the actual state so it is possible to set 0-20 for example but have 0-10 only as effective if it is disabled by parent cgroup or changes to online CPUs on the host.

Comment 6 Daniel Berrangé 2020-06-02 12:46:34 UTC
IIUC, while systemd has support for mounting "cpuset" controller, it doesn't ever do anything with it itself. libvirt creates the machine.slice subdirectory for cpuset controller, and populates  cpuset.cpus by copying from the parent (root) cgroup.  The root cgroup shows  0-55, so libvirt should have copied that value into machine.slicen.

Supposedly this host has been rebooted multiple times and always shows the same behaviour, so any offlining of CPUs was not random historical baggage.

That said, I wonder if anything in tuned temporarily takes CPUs offline & online when configuring the host ?

Comment 7 Daniel Berrangé 2020-06-02 13:32:17 UTC
I think it would be useful to get a libvirt debug log from the customer's machine

In /etc/libvirt/libvirtd.conf  set

 log_filters="1:systemd 1:cgroup 1:qemu"
 log_outputs="1:file:/var/log/libvirt/libvirtd.log"

then reboot the host machine.

When it first comes up, and all the services are running but *BEFORE* starting any VMs, report

 - Content of /sys/fs/cgroup/cpuset/cpuset.cpus
 - Whether /sys/fs/cgroup/cpuset/machine.slice exists or not, and if so
 - Content of /sys/fs/cgroup/cpuset/machine.slice/cpuset.cpus

Now, try to start a VM when it fails, again report

 - Content of /sys/fs/cgroup/cpuset/cpuset.cpus
 - Whether /sys/fs/cgroup/cpuset/machine.slice exists or not, and if so
 - Content of /sys/fs/cgroup/cpuset/machine.slice/cpuset.cpus

And also attach the libvirtd.log file at this time.

Comment 8 David Hill 2020-06-02 19:08:14 UTC
It looks like the container nova_libvirt was created with "CpusetCpus" using the non-isolated cores ... I tried finding the value in the templates/kolla/etc to no avail!  I'm wondering if it's not just because isolcpus is setted in the kernel cmdline and docker hardcoded the CpusetCpus values at creation time.   I don't think running libvirtd with debugging is required here and it'll probably provide the same CpusetCpus values anyways.

Comment 9 Jaroslav Suchanek 2020-06-10 17:01:30 UTC
(In reply to David Hill from comment #8)
> It looks like the container nova_libvirt was created with "CpusetCpus" using
> the non-isolated cores ... I tried finding the value in the
> templates/kolla/etc to no avail!  I'm wondering if it's not just because
> isolcpus is setted in the kernel cmdline and docker hardcoded the CpusetCpus
> values at creation time.   I don't think running libvirtd with debugging is
> required here and it'll probably provide the same CpusetCpus values anyways.

Please help me understand the situation. Is there anything libvirt should address here? Or in any other component?

Thanks.