Bug 1703661

Summary:

'cannot set CPU affinity' error when starting guest

Product:

Red Hat Enterprise Linux 7

Reporter:

Junxiang Li <junli>

Component:

libvirt

Assignee:

Andrea Bolognani <abologna>

Status:

CLOSED ERRATA

QA Contact:

Virtualization Bugs <virt-bugs>

Severity:

high

Docs Contact:

Priority:

high

Version:

7.7

CC:

abologna, dzheng, jdenemar, jiyan, jomurphy, jsuchane, mdeng, mtessun, mzamazal, ngu, qzhang

Target Milestone:

Keywords:

Automation, Regression

Target Release:

---

Hardware:

ppc64le

OS:

Linux

Whiteboard:

Fixed In Version:

libvirt-4.5.0-20.el7

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Clones:

1716387 (view as bug list)

Environment:

Last Closed:

2019-08-06 13:14:55 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
vm.log	none
libvirtd.log	none

Description Junxiang Li 2019-04-27 09:54:49 UTC

Description of problem:
When numatune placement auto is set, guest can not be started.

Version-Release number of selected component (if applicable):
# rpm -q libvirt qemu-kvm-rhev kernel
libvirt-4.5.0-15.virtcov.el7.ppc64le
qemu-kvm-rhev-2.12.0-27.el7.ppc64le
kernel-3.10.0-1034.el7.ppc64le

How reproducible:
100%

Steps to Reproduce:
1. To define a guest with:
<numatune><memory mode="strict" placement="auto" /></numatune>
2. Try to start it

Actual results:
# virsh start test1
error: Failed to start domain test1
error: invalid argument: Failed to parse bitmap ''

Expected results:
Domain test1 started

Additional info:
1. It looks like this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1233023
2. When I use the compose RHEL-7.7-20190403.0(libvirt-4.5.0-11), the test pass. And then I update the libvirt to the latest(4.5.0-15), it will reproduce the problem. But when I downgrade it to 4.5.0-11, it still could be reproduced.
3. When I use the compose RHEL-7.7-20190424.0(libvirt-4.5.0-14), it could be reproduced

Comment 3 Jaroslav Suchanek 2019-04-29 16:08:57 UTC

Is it reproducible only on ppc64le platform?

Comment 4 Junxiang Li 2019-04-30 01:57:28 UTC

(In reply to Jaroslav Suchanek from comment #3)
> Is it reproducible only on ppc64le platform?

Yes, I could NOT reproduce it on x86_64

Comment 7 Andrea Bolognani 2019-05-22 07:42:41 UTC

I tried reproducing this on one of the POWER 8 machines I have
access to, without success.

Given that, and the fact that Comment 5 also points out it can only
be reproduced on a single machine, I'm betting my money on that
machine having a peculiar NUMA topology that confuses libvirt.

Can you please post the output of 'numactl -H'? Ideally I'd get
shell access to the machine, but that output is a starting point.

Comment 9 Andrea Bolognani 2019-05-24 13:59:31 UTC

Unfortunately the machine you'd given me access to seem to have
gotten stuck while I was working on it, and now I'm locked out; it
won't even respond to pings or offer serial console access, and
since I'm not the one loaning it on Beaker I can't access the power
controls :(

Before that happened, though, I managed to look around a bit, and
my theory that the issue was caused by a peculiar NUMA topology
seems to have been incorrect after all, as it looked pretty much
the same as any other POWER 8 machine I've worked on.

I did also verify that the issue does not reproduce with
libvirt-4.5.0-11.el7.ppc64le but does show up after upgrading to
libvirt-4.5.0-14.el7.ppc64le; looking at the differences between
those two versions, I believe the problematic commits are

  commit b733703cfcc4b4e8966051ba20bed301645331d0
  Author: Michal Privoznik <mprivozn>
  Date:   Thu Apr 18 18:58:58 2019 +0200

    qemu: Set up EMULATOR thread and cpuset.mems before exec()-ing qemu

    https://bugzilla.redhat.com/show_bug.cgi?id=1695434

    It's funny how this went unnoticed for such a long time. Long
    story short, if a domain is configured with
    VIR_DOMAIN_NUMATUNE_MEM_STRICT libvirt doesn't really honour
    that. This is because of 7e72ac787848 after which libvirt allowed
    qemu to allocate memory just anywhere and only after that it used
    some magic involving cpuset.memory_migrate and cpuset.mems to
    move the memory to desired NUMA nodes. This was done in order to
    work around some KVM bug where KVM would fail if there wasn't a
    DMA zone available on the NUMA node. Well, while the work around
    might stopped libvirt tickling the KVM bug it also caused a bug
    on libvirt side: if there is not enough memory on configured NUMA
    node(s) then any attempt to start a domain must fail. Because of
    the way we play with guest memory domains can start just happily.

    The solution is to move the child we've just forked into emulator
    cgroup, set up cpuset.mems and exec() qemu only after that.

    This basically reverts 7e72ac787848b7434c9 which was a workaround
    for kernel bug. This bug was apparently fixed because I've tested
    this successfully with recent kernel.

    Signed-off-by: Michal Privoznik <mprivozn>
    Reviewed-by: Martin Kletzander <mkletzan>
    (cherry picked from commit 0eaa4716e1b8f6eb59d77049aed3735c3b5fbdd6)
    Signed-off-by: Michal Privoznik <mprivozn>
    Message-Id: <efd9d64c94a027281c244c05f69cc9f4c31ed83b.1555606711.git.mprivozn>
    Reviewed-by: Jiri Denemark <jdenemar>

  commit eb7ef8053311d82d43912a5cc1e82d0266bb29de
  Author: Michal Privoznik <mprivozn>
  Date:   Thu Apr 18 18:58:57 2019 +0200

    qemu: Rework setting process affinity

    RHEL-7.7: https://bugzilla.redhat.com/show_bug.cgi?id=1695434
    RHEL-8.0.1: https://bugzilla.redhat.com/show_bug.cgi?id=1503284

    The way we currently start qemu from CPU affinity POV is as
    follows:

      1) the child process is set affinity to all online CPUs (unless
      some vcpu pinning was given in the domain XML)

      2) Once qemu is running, cpuset cgroup is configured taking
      memory pinning into account

    Problem is that we let qemu allocate its memory just anywhere in
    1) and then rely in 2) to be able to move the memory to
    configured NUMA nodes. This might not be always possible (e.g.
    qemu might lock some parts of its memory) and is very suboptimal
    (copying large memory between NUMA nodes takes significant amount
    of time).

    The solution is to set affinity to one of (in priority order):
      - The CPUs associated with NUMA memory affinity mask
      - The CPUs associated with emulator pinning
      - All online host CPUs

    Later (once QEMU has allocated its memory) we then change this
    again to (again in priority order):
      - The CPUs associated with emulator pinning
      - The CPUs returned by numad
      - The CPUs associated with vCPU pinning
      - All online host CPUs

    Signed-off-by: Michal Privoznik <mprivozn>
    Reviewed-by: Daniel P. Berrangé <berrange>
    (cherry picked from commit f136b83139c63f20de0df3285d9e82df2fb97bfc)

    I had to explicitly free bitmaps, because there is no VIR_AUTOPTR
    just yet.

    Signed-off-by: Michal Privoznik <mprivozn>
    Message-Id: <a6edd347c999f999a49d1a878c74c690eb2ab619.1555606711.git.mprivozn>
    Reviewed-by: Jiri Denemark <jdenemar

which were backported to RHEL 7.7 to fix Bug 1695434.

I'm currently trying to get access to a different POWER 8 machine on
which to continue the investigation. I'll keep you posted.

Comment 10 Andrea Bolognani 2019-05-24 16:09:52 UTC

Alright, I managed to get access to a different POWER 8 machine and
reproduce the issue there. The original machine became accessible
again in the meantime, but I don't really need it any longer so it
can safely be returned.

Through comparison between the avocado-vt-vm1 guest that was on the
original machine and a Fedora 30 guest that I created with the same
command line I would normally use for testing, I figured out why I
could not initially reproduce the issue: I usually assign 8 vCPUs to
guests, and in that scenario the guest will start just fine, but as
soon as I change its configuration to

  <vcpu placement='auto'>2</vcpu>

then I get the error message. The fact that 8 is exactly the number
of threads per core on a POWER 8 machine is almost certainly key to
understand why that value works, and all others don't.

I'll investigate further next week.

Comment 11 Jaroslav Suchanek 2019-05-29 12:18:11 UTC

Milan, can you please estimate, how this issue impacts RHV? Is <numatune><memory mode="strict" placement="auto" /></numatune> used in RHV anyhow? Thanks.

Comment 12 Milan Zamazal 2019-05-29 13:08:57 UTC

Looking around, I can see that:

- <numatune><memory mode="strict"/>...</numatune> can be used.
- We don't use explicit `placement' attribute in `memory'.
- We use <vcpu> also without `placement' attribute.

I'm not sure whether this, with any actual combination of elements and attribute values generated in RHV, can induce placement="auto".

Comment 13 Andrea Bolognani 2019-05-29 15:50:02 UTC

(In reply to Milan Zamazal from comment #12)
> Looking around, I can see that:
> 
> - <numatune><memory mode="strict"/>...</numatune> can be used.
> - We don't use explicit `placement' attribute in `memory'.
> - We use <vcpu> also without `placement' attribute.
> 
> I'm not sure whether this, with any actual combination of elements and
> attribute values generated in RHV, can induce placement="auto".

Thanks for looking into this! :)

If I'm reading the documentation ([1] and [2]) correctly, then
<vcpu> without placement is equivalent to placement='static', and
<numatune><memory> will inherit the the placement from <vcpu> in
this scenario, which in turn makes providing a nodeset mandatory.
The above matches the results I got while testing:

  # cat test.xml
  ...
  <vcpu>2</vcpu>
  <numatune>
    <memory mode='strict' />
  </numatune>
  ...
  # virsh define test.xml
  error: Failed to define domain from test.cml
  error: unsupported configuration: nodeset for NUMA memory tuning
         must be set if 'placement' is 'static'
  #

So RHV must be providing the nodeset argument too, right? And either
way, this bug only shows up when using placement='auto', so if RHV
doesn't use that feature then it's not going to be affected.


[1] https://libvirt.org/formatdomain.html#elementsCPUAllocation
[2] https://libvirt.org/formatdomain.html#elementsNUMATuning

Comment 14 Milan Zamazal 2019-05-29 16:47:03 UTC

(In reply to Andrea Bolognani from comment #13)

> So RHV must be providing the nodeset argument too, right?

Well, looking at https://github.com/oVirt/ovirt-engine/blob/4dbfe06a726ff39c8660d177dc58fd56152830d9/backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/builder/vminfo/LibvirtVmXmlBuilder.java#L567, it should. But on a (literally!) closer look, there is `modeset' instead of `nodeset' there, so I wonder whether anything uses that piece of code at all. Not your worry of course, your assumption should be correct.

Comment 15 Andrea Bolognani 2019-05-29 17:40:14 UTC

(In reply to Milan Zamazal from comment #14)
> (In reply to Andrea Bolognani from comment #13)
> 
> > So RHV must be providing the nodeset argument too, right?
> 
> Well, looking at
> https://github.com/oVirt/ovirt-engine/blob/
> 4dbfe06a726ff39c8660d177dc58fd56152830d9/backend/manager/modules/vdsbroker/
> src/main/java/org/ovirt/engine/core/vdsbroker/builder/vminfo/
> LibvirtVmXmlBuilder.java#L567, it should.

Good!

> But on a (literally!) closer look,
> there is `modeset' instead of `nodeset' there, so I wonder whether anything
> uses that piece of code at all. Not your worry of course, your assumption
> should be correct.

Oh boy :)

I dug more in the meantime and realized that you can also hit the
bug with something like

  <vcpu>8</vcpu>
  <numatune>
    <memory mode='strict' nodeset='1'/>
  </numatune>

but thanks to the typo in ovirt-engine you spotted and mentioned
above, RHV should still be in the clear.

Comment 16 Milan Zamazal 2019-05-31 09:01:49 UTC

I got confirmation that the <memory> element is currently not used in RHV (the `if' statement referred in Comment 14 is a dead piece of code).

Comment 17 Andrea Bolognani 2019-05-31 15:24:07 UTC

Patches posted upstream.

  https://www.redhat.com/archives/libvir-list/2019-May/msg00919.html

Comment 18 Andrea Bolognani 2019-05-31 15:25:28 UTC

(In reply to Milan Zamazal from comment #16)
> I got confirmation that the <memory> element is currently not used in RHV
> (the `if' statement referred in Comment 14 is a dead piece of code).

That's very good to know, thanks! :)

Comment 19 Andrea Bolognani 2019-05-31 15:39:57 UTC

(In reply to Junxiang Li from comment #0)
> Steps to Reproduce:
> 1. To define a guest with:
> <numatune><memory mode="strict" placement="auto" /></numatune>
> 2. Try to start it
> 
> Actual results:
> # virsh start test1
> error: Failed to start domain test1
> error: invalid argument: Failed to parse bitmap ''

One thing that I apparently forgot to point out is that I never
managed to reproduce those exact symptoms: what I got instead was
along the lines of

  # virsh start guest
  error: Failed to start domain guest
  error: cannot set CPU affinity on process 40055: Invalid argument

I can hit the specific error message reported above if I edit a
guest so that its configuration contains something like

  <vcpu>2</vcpu>
  <numatune>
    <memory nodeset=''/>
  </numatune>

In that case, after saving and closing the editor I get

  error: invalid argument: Failed to parse bitmap ''
  Failed. Try again? [y,n,i,f,?]: i
  error: invalid argument: Failed to parse bitmap ''
  Failed. Try again? [y,n,f,?]:

with no way to proceed, which is the expected behavior. Can you
please try reproducing the issue again and confirm that you're
indeed seeing the specific error message reported above, and not
the same one I'm seeing? Because if that's the case, we might
need more digging :)

Comment 22 Andrea Bolognani 2019-06-03 12:02:53 UTC

I finally managed to reproduce the issue reported initially (failed
to parse bitmap), but since this bug has been mostly used to track
work on the second issue (cannot set CPU affinity) I'm changing the
title to reflect that.

I've created a separate bug (Bug 1716387) for the first issue and
will track it there from now on. Sorry for any confusion this might
cause.

Comment 23 Andrea Bolognani 2019-06-03 12:05:11 UTC

I'm also making the bug public since all the non-public information
is already relegated to private comments.

Comment 24 Andrea Bolognani 2019-06-04 09:38:52 UTC

Fix merged upstream.

  commit 5f2212c062c720716b7701fa0a5511311dc6e906
  Author: Andrea Bolognani <abologna>
  Date:   Thu May 30 19:20:34 2019 +0200

    qemu: Fix qemuProcessInitCpuAffinity()
    
    Ever since the feature was introduced with commit 0f8e7ae33ace,
    it has contained a logic error in that it attempted to use a NUMA
    node map where a CPU map was expected.
    
    Because of that, guests using <numatune> might fail to start:
    
      # virsh start guest
      error: Failed to start domain guest
      error: cannot set CPU affinity on process 40055: Invalid argument
    
    This was particularly easy to trigger on POWER 8 machines, where
    secondary threads always show up as offline in the host: having
    
      <numatune>
        <memory mode='strict' placement='static' nodeset='1'/>
      </numatune>
    
    in the guest configuration, for example, would result in libvirt
    trying to set the process affinity so that it would prefer
    running on CPU 1, but since that's a secondary thread and thus
    shows up as offline, the operation would fail, and so would
    starting the guest.
    
    Use the newly introduced virNumaNodesetToCPUset() to convert the
    NUMA node map to a CPU map, which in the example above would be
    48,56,64,72,80,88 - a valid input for virProcessSetAffinity().
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1703661
    
    Signed-off-by: Andrea Bolognani <abologna>
    Reviewed-by: Ján Tomko <jtomko>

  v5.4.0-45-g5f2212c062

Comment 27 Junxiang Li 2019-06-10 06:13:32 UTC

Reproduce:
env:
    # rpm -q libvirt
    libvirt-4.5.0-19.el7.ppc64le
step:
    1. Edit the guest xml with "<numatune><memory mode="strict" placement="auto" /></numatune>"
    2. Try to start the guest
result:
    report the following error message:
    error: Failed to start domain avocado-vt-vm1
    error: cannot set CPU affinity on process 156584: Invalid argument

Verify:
env:
    # rpm -q libvirt
    libvirt-4.5.0-20.el7.ppc64le
step:
    1. Edit the guest xml with "<numatune><memory mode="strict" placement="auto" /></numatune>"
    2. Try to start the guest
result:
    The guest started with message: Domain avocado-vt-vm1 started 

In summary, this problem has been fixed.

Comment 28 jiyan 2019-06-26 02:29:47 UTC

Also reproduced this bug on libvirt-4.5.0-19.el7.x86_64

Version:
libvirt-4.5.0-19.el7.x86_64
kernel-3.10.0-1057.el7.x86_64
qemu-kvm-rhev-2.12.0-33.el7.x86_64

Steps:
# virsh domstate avocado-vt-vm1
shut off

# virsh dumpxml avocado-vt-vm1 --inactive |grep "<vcpu" -A3
  <vcpu placement='static'>1</vcpu>
  <numatune>
    <memory mode='strict' nodeset='1'/>
  </numatune>

# cat /sys/devices/system/cpu/cpu1/online 
0

# virsh start avocado-vt-vm1
error: Failed to start domain avocado-vt-vm1
error: cannot set CPU affinity on process 3470: Invalid argument

Comment 29 jiyan 2019-06-26 02:54:23 UTC

Hi I am trying to verify this bug in x86_64, and I enountered the following err.
Could you please help to have a look at it? thx :)

Version:
kernel-3.10.0-1057.el7.x86_64
qemu-kvm-rhev-2.12.0-33.el7.x86_64
libvirt-4.5.0-23.el7.x86_64
kernel-3.10.0-1058.el7.x86_64

Steps:
# virsh domstate avocado-vt-vm1
shut off

# virsh dumpxml avocado-vt-vm1 --inactive |grep "<vcpu" -A3
  <vcpu placement='static'>1</vcpu>
  <numatune>
    <memory mode='strict' nodeset='1'/>
  </numatune>

# echo 0 > /sys/devices/system/cpu/cpu1/online 

# cat /sys/devices/system/cpu/cpu1/online 
0

# virsh start avocado-vt-vm1
error: Failed to start domain avocado-vt-vm1
error: An error occurred, but the cause is unknown

# virsh start avocado-vt-vm1
error: Failed to start domain avocado-vt-vm1
error: An error occurred, but the cause is unknown

# virsh start avocado-vt-vm1
error: Failed to start domain avocado-vt-vm1
error: An error occurred, but the cause is unknown

# virsh start avocado-vt-vm1
error: Failed to start domain avocado-vt-vm1
error: An error occurred, but the cause is unknown

# virsh start avocado-vt-vm1
error: Failed to start domain avocado-vt-vm1
error: An error occurred, but the cause is unknown

# virsh start avocado-vt-vm1
error: Failed to start domain avocado-vt-vm1
error: An error occurred, but the cause is unknown

# virsh start avocado-vt-vm1
error: Failed to start domain avocado-vt-vm1
error: An error occurred, but the cause is unknown

Comment 30 jiyan 2019-06-26 02:55:55 UTC

Created attachment 1584557 [details]
vm.log

Comment 31 jiyan 2019-06-26 02:58:00 UTC

Created attachment 1584558 [details]
libvirtd.log

Comment 33 Andrea Bolognani 2019-06-27 15:30:43 UTC

(In reply to jiyan from comment #29)
> Hi I am trying to verify this bug in x86_64, and I enountered the following
> err.
> Could you please help to have a look at it? thx :)
> 
> Version:
> kernel-3.10.0-1057.el7.x86_64
> qemu-kvm-rhev-2.12.0-33.el7.x86_64
> libvirt-4.5.0-23.el7.x86_64
> kernel-3.10.0-1058.el7.x86_64
> 
> Steps:
> # virsh domstate avocado-vt-vm1
> shut off
> 
> # virsh dumpxml avocado-vt-vm1 --inactive |grep "<vcpu" -A3
>   <vcpu placement='static'>1</vcpu>
>   <numatune>
>     <memory mode='strict' nodeset='1'/>
>   </numatune>
> 
> # echo 0 > /sys/devices/system/cpu/cpu1/online 
> 
> # cat /sys/devices/system/cpu/cpu1/online 
> 0
> 
> # virsh start avocado-vt-vm1
> error: Failed to start domain avocado-vt-vm1
> error: An error occurred, but the cause is unknown

Alright, this happens regardless of whether you have offlined CPU1
before trying to start the guest, and the underlying reason is that
you're asking libvirt to pin the guest to NUMA node 1 but the host
only has a single NUMA node (0).

Can you please open a separate bug to track this? libvirt is doing
the right thing by failing, and the only problem is that we're not
reporting a good enough error message when that happens.

Comment 34 jiyan 2019-06-28 03:01:35 UTC

Version:
libvirt-4.5.0-23.el7.x86_64
qemu-kvm-rhev-2.12.0-33.el7.x86_64
kernel-3.10.0-1058.el7.x86_64

Stesp:
1. Check numactl info and set host cpu 1 offline
# numactl --hard
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
node 0 size: 16362 MB
node 0 free: 14107 MB
node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
node 1 size: 16384 MB
node 1 free: 14888 MB
node distances:
node   0   1 
  0:  10  11 
  1:  11  10 

# echo 0 > /sys/devices/system/cpu/cpu1/online 

# cat /sys/devices/system/cpu/cpu1/online
0

2。 Prepare a shutdown Vm with the following conf and start VM
# virsh domstate vm1
shut off

# virsh dumpxml vm1 --inactive |grep "<vcpu" -A4
  <vcpu placement='static'>1</vcpu>
  <numatune>
    <memory mode='strict' nodeset='1'/>
  </numatune>

# virsh start vm1
Domain vm1 started

3. Downgrade VM and restart VM
# yum downgrade libvirt* -y

# rpm -qa libvirt qemu-kvm-rhev kernel
kernel-3.10.0-1058.el7.x86_64
qemu-kvm-rhev-2.12.0-33.el7.x86_64
libvirt-4.5.0-19.el7.x86_64

# virsh destroy vm1;virsh start vm1
Domain vm1 destroyed

error: Failed to start domain vm1
error: cannot set CPU affinity on process 13782: Invalid argument

The test result is as expected.

Comment 37 errata-xmlrpc 2019-08-06 13:14:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:2294