Bug 1254402

Summary:	libvirt should improve the way to bind cpu when specify nodeset in numatune
Product:	Red Hat Enterprise Linux 7	Reporter:	Luyao Huang <lhuang>
Component:	libvirt	Assignee:	Martin Kletzander <mkletzan>
Status:	CLOSED NOTABUG	QA Contact:	Virtualization Bugs <virt-bugs>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	7.2	CC:	dyuan, lhuang, mkletzan, mzhan, rbalakri
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-07-07 13:11:26 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Luyao Huang 2015-08-18 02:38:08 UTC

Description of problem:
libvirt should improve the way to bind cpu when specify nodeset in numatune, which somtimes (depends on numad return) will cause resource loss

Version-Release number of selected component (if applicable):
libvirt-1.2.17-5.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. prepare a guest like this in a NUMA machine:
# virsh dumpxml rhel7.0-rhel
...
  <vcpu placement='auto'>4</vcpu>
  <iothreads>2</iothreads>
  <iothreadids>
    <iothread id='1'/>
  </iothreadids>
  <numatune>
    <memory mode='strict' nodeset='0'/>
  </numatune>
...

2. check the numa topology

# numactl --hard
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
node 0 size: 65514 MB
node 0 free: 58344 MB
node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
node 1 size: 65536 MB
node 1 free: 57864 MB
node distances:
node   0   1 
  0:  10  11 
  1:  11  10 

3. start guest and recheck the cpu and mem in cgroup and taskset:

# cgget -g cpuset /machine.slice/machine-qemu\\x2drhel7.0\\x2drhel.scope/emulator
/machine.slice/machine-qemu\x2drhel7.0\x2drhel.scope/emulator:
cpuset.memory_spread_slab: 0
cpuset.memory_spread_page: 0
cpuset.memory_pressure: 0
cpuset.memory_migrate: 1
cpuset.sched_relax_domain_level: -1
cpuset.sched_load_balance: 1
cpuset.mem_hardwall: 0
cpuset.mem_exclusive: 0
cpuset.cpu_exclusive: 0
cpuset.mems: 0
cpuset.cpus: 8-15,24-31

# cgget -g cpuset /machine.slice/machine-qemu\\x2drhel7.0\\x2drhel.scope/vcpu1
/machine.slice/machine-qemu\x2drhel7.0\x2drhel.scope/vcpu1:
cpuset.memory_spread_slab: 0
cpuset.memory_spread_page: 0
cpuset.memory_pressure: 0
cpuset.memory_migrate: 1
cpuset.sched_relax_domain_level: -1
cpuset.sched_load_balance: 1
cpuset.mem_hardwall: 0
cpuset.mem_exclusive: 0
cpuset.cpu_exclusive: 0
cpuset.mems: 0                         <------
cpuset.cpus: 8-15,24-31                <--------cpus near node1

4. we can find libvirt use numad's return to bind cpus in libvirtd.log:

2015-08-18 02:07:11.033+0000: 16640: debug : virCommandRunAsync:2428 : About to run /bin/numad -w 4:19555
2015-08-18 02:07:11.035+0000: 16640: debug : virCommandRunAsync:2431 : Command result 0, with PID 16986
2015-08-18 02:07:13.042+0000: 16640: debug : virCommandRun:2279 : Result status 0, stdout: '1
' stderr: ''
2015-08-18 02:07:13.042+0000: 16640: debug : qemuProcessStart:4648 : Nodeset returned from numad: 1


Actual results:

libvirt still try to use numad to determine use which nodeset even we already specify the node in numatune, and then bind emulator/vcpu/iothread to cpu which in different node we bind memory

Expected results:

Do not use numad to determine use which node as we already specify it in numatune.

Additional info:

Comment 1 Martin Kletzander 2015-11-12 11:22:18 UTC

What do you mean by that?  Do you mwan we should run numad with only the number of CPUs and not the memory size?  That could make sense, but using automatic vcpu placement with static strict memory binding doesn't make sense anyway.  I don't get what the use case for this kind of configuration is.

Comment 2 Luyao Huang 2015-11-13 06:59:10 UTC

(In reply to Martin Kletzander from comment #1)
> What do you mean by that?  Do you mwan we should run numad with only the
> number of CPUs and not the memory size?  That could make sense, but using
> automatic vcpu placement with static strict memory binding doesn't make
> sense anyway.  I don't get what the use case for this kind of configuration
> is.

I think no need call numad in this case, user already specify the memory bind policy, numad just give a advise about which node is good to bind, but if we bind the memory and cpu in different node, it will waste some resource, shouldn't libvirt not use the numad advise in this case ? or forbid this use case ?

Comment 3 Martin Kletzander 2015-11-13 08:19:20 UTC

(In reply to Luyao Huang from comment #2)
We need to call numad because the user specified vcpu placement='auto'.

Comment 4 Martin Kletzander 2016-06-22 15:48:10 UTC

The users are effectively shooting themselves in the feet by doing this and we generally allow such behaviour as long as the specification is correct for us.  Although we could provide a warning in the logs, so I'll add that.

Comment 5 Martin Kletzander 2016-06-22 16:38:00 UTC

Patch proposed upstream:

https://www.redhat.com/archives/libvir-list/2016-June/msg01537.html

Comment 6 Luyao Huang 2016-06-23 01:38:52 UTC

(In reply to Martin Kletzander from comment #4)
> The users are effectively shooting themselves in the feet by doing this and
> we generally allow such behaviour as long as the specification is correct
> for us.  Although we could provide a warning in the logs, so I'll add that.

Okay, make sense, a warning is good enough.

Comment 7 Martin Kletzander 2016-07-07 13:11:26 UTC

Looks like it is way too much trouble just for a warning that's seen only in the logs.  Since this behaviour might be intentional (although very unlikely, mostly for testing purposes only) we shouldn't forbid it.  Hence closing as NOTABUG.

More info here:

https://www.redhat.com/archives/libvir-list/2016-July/msg00173.html