Bug 1254402 - libvirt should improve the way to bind cpu when specify nodeset in numatune
libvirt should improve the way to bind cpu when specify nodeset in numatune
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt (Show other bugs)
7.2
x86_64 Linux
medium Severity medium
: rc
: ---
Assigned To: Martin Kletzander
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-08-17 22:38 EDT by Luyao Huang
Modified: 2016-07-07 09:11 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-07-07 09:11:26 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Luyao Huang 2015-08-17 22:38:08 EDT
Description of problem:
libvirt should improve the way to bind cpu when specify nodeset in numatune, which somtimes (depends on numad return) will cause resource loss

Version-Release number of selected component (if applicable):
libvirt-1.2.17-5.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. prepare a guest like this in a NUMA machine:
# virsh dumpxml rhel7.0-rhel
...
  <vcpu placement='auto'>4</vcpu>
  <iothreads>2</iothreads>
  <iothreadids>
    <iothread id='1'/>
  </iothreadids>
  <numatune>
    <memory mode='strict' nodeset='0'/>
  </numatune>
...

2. check the numa topology

# numactl --hard
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
node 0 size: 65514 MB
node 0 free: 58344 MB
node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
node 1 size: 65536 MB
node 1 free: 57864 MB
node distances:
node   0   1 
  0:  10  11 
  1:  11  10 

3. start guest and recheck the cpu and mem in cgroup and taskset:

# cgget -g cpuset /machine.slice/machine-qemu\\x2drhel7.0\\x2drhel.scope/emulator
/machine.slice/machine-qemu\x2drhel7.0\x2drhel.scope/emulator:
cpuset.memory_spread_slab: 0
cpuset.memory_spread_page: 0
cpuset.memory_pressure: 0
cpuset.memory_migrate: 1
cpuset.sched_relax_domain_level: -1
cpuset.sched_load_balance: 1
cpuset.mem_hardwall: 0
cpuset.mem_exclusive: 0
cpuset.cpu_exclusive: 0
cpuset.mems: 0
cpuset.cpus: 8-15,24-31

# cgget -g cpuset /machine.slice/machine-qemu\\x2drhel7.0\\x2drhel.scope/vcpu1
/machine.slice/machine-qemu\x2drhel7.0\x2drhel.scope/vcpu1:
cpuset.memory_spread_slab: 0
cpuset.memory_spread_page: 0
cpuset.memory_pressure: 0
cpuset.memory_migrate: 1
cpuset.sched_relax_domain_level: -1
cpuset.sched_load_balance: 1
cpuset.mem_hardwall: 0
cpuset.mem_exclusive: 0
cpuset.cpu_exclusive: 0
cpuset.mems: 0                         <------
cpuset.cpus: 8-15,24-31                <--------cpus near node1

4. we can find libvirt use numad's return to bind cpus in libvirtd.log:

2015-08-18 02:07:11.033+0000: 16640: debug : virCommandRunAsync:2428 : About to run /bin/numad -w 4:19555
2015-08-18 02:07:11.035+0000: 16640: debug : virCommandRunAsync:2431 : Command result 0, with PID 16986
2015-08-18 02:07:13.042+0000: 16640: debug : virCommandRun:2279 : Result status 0, stdout: '1
' stderr: ''
2015-08-18 02:07:13.042+0000: 16640: debug : qemuProcessStart:4648 : Nodeset returned from numad: 1


Actual results:

libvirt still try to use numad to determine use which nodeset even we already specify the node in numatune, and then bind emulator/vcpu/iothread to cpu which in different node we bind memory

Expected results:

Do not use numad to determine use which node as we already specify it in numatune.

Additional info:
Comment 1 Martin Kletzander 2015-11-12 06:22:18 EST
What do you mean by that?  Do you mwan we should run numad with only the number of CPUs and not the memory size?  That could make sense, but using automatic vcpu placement with static strict memory binding doesn't make sense anyway.  I don't get what the use case for this kind of configuration is.
Comment 2 Luyao Huang 2015-11-13 01:59:10 EST
(In reply to Martin Kletzander from comment #1)
> What do you mean by that?  Do you mwan we should run numad with only the
> number of CPUs and not the memory size?  That could make sense, but using
> automatic vcpu placement with static strict memory binding doesn't make
> sense anyway.  I don't get what the use case for this kind of configuration
> is.

I think no need call numad in this case, user already specify the memory bind policy, numad just give a advise about which node is good to bind, but if we bind the memory and cpu in different node, it will waste some resource, shouldn't libvirt not use the numad advise in this case ? or forbid this use case ?
Comment 3 Martin Kletzander 2015-11-13 03:19:20 EST
(In reply to Luyao Huang from comment #2)
We need to call numad because the user specified vcpu placement='auto'.
Comment 4 Martin Kletzander 2016-06-22 11:48:10 EDT
The users are effectively shooting themselves in the feet by doing this and we generally allow such behaviour as long as the specification is correct for us.  Although we could provide a warning in the logs, so I'll add that.
Comment 5 Martin Kletzander 2016-06-22 12:38:00 EDT
Patch proposed upstream:

https://www.redhat.com/archives/libvir-list/2016-June/msg01537.html
Comment 6 Luyao Huang 2016-06-22 21:38:52 EDT
(In reply to Martin Kletzander from comment #4)
> The users are effectively shooting themselves in the feet by doing this and
> we generally allow such behaviour as long as the specification is correct
> for us.  Although we could provide a warning in the logs, so I'll add that.

Okay, make sense, a warning is good enough.
Comment 7 Martin Kletzander 2016-07-07 09:11:26 EDT
Looks like it is way too much trouble just for a warning that's seen only in the logs.  Since this behaviour might be intentional (although very unlikely, mostly for testing purposes only) we shouldn't forbid it.  Hence closing as NOTABUG.

More info here:

https://www.redhat.com/archives/libvir-list/2016-July/msg00173.html

Note You need to log in before you can comment on or make changes to this bug.