Bug 1360584 - Bind policy don't work well when numad is running [NEEDINFO]
Summary: Bind policy don't work well when numad is running
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: numad
Version: 7.3
Hardware: x86_64
OS: Linux
Target Milestone: rc
: ---
Assignee: Lukáš Nykrýn
QA Contact: qe-baseos-daemons
Yehuda Zimmerman
: 1361058 (view as bug list)
Depends On:
TreeView+ depends on / blocked
Reported: 2016-07-27 06:11 UTC by Yumei Huang
Modified: 2020-12-15 07:43 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
numad changes QEMU memory bindings Currently, the *numad* daemon cannot distinguish between memory bindings that *numad* sets and memory bindings set explicitly by the memory mappings of a process. As a consequence, *numad* changes QEMU memory bindings, even when the NUMA memory policy is specified in the QEMU command line. To work around this problem, if manual NUMA bindings are specified in the guest, disable *numad*. This ensures that manual bindings configured in virtual machines are not changed by *numad*.
Clone Of:
Last Closed: 2020-12-15 07:43:30 UTC
Target Upstream Version:
jsynacek: needinfo? (bgray)

Attachments (Terms of Use)

Description Yumei Huang 2016-07-27 06:11:06 UTC
Description of problem:
Boot guest with ram for one numa node and hugepages for another numa node, and bind them to two different host nodes. According to numa_maps, the hugepages are bound to wrong host node. 

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. prepare hugepages in host
# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
node 0 size: 16349 MB
node 0 free: 7775 MB
node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
node 1 size: 12287 MB
node 1 free: 6788 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 

# echo 3000 > /proc/sys/vm/nr_hugepages

# cat /proc/meminfo  | grep -i huge
AnonHugePages:     14336 kB
HugePages_Total:    3000
HugePages_Free:     3000
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

# mount -t hugetlbfs none /mnt/kvm_hugepage/

# cat /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages 

# cat /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages 

2.Boot guest with two numa nodes, specify 4G ram for node0, and bind to host node0; specify 2G hugepages for node1, and bind to host node1. 

# /usr/libexec/qemu-kvm -name rhel73 -m 6G,slots=240,maxmem=20G -smp 4  \

-realtime mlock=off  -no-user-config -nodefaults \

-drive file=/home/guest/RHEL-Server-7.3-64-virtio-scsi.qcow2,if=none,id=drive-disk,format=qcow2,cache=none  -device virtio-scsi-pci,id=scsi0 -device scsi-hd,drive=drive-disk,bus=scsi0.0,id=scsi-hd0 \

-netdev tap,id=hostnet1 -device virtio-net-pci,mac=42:ce:a9:d2:4d:d8,id=idlbq7eA,netdev=hostnet1 -usb -device usb-tablet,id=input0 -vga qxl -spice port=5902,addr=,disable-ticketing,image-compression=off,seamless-migration=on -monitor stdio \

-object memory-backend-ram,id=mem0,size=4G,prealloc=yes,policy=bind,host-nodes=0  -numa node,nodeid=0,memdev=mem0 \

-object memory-backend-file,id=mem1,size=2G,mem-path=/mnt/kvm_hugepage,prealloc=yes,host-nodes=1,policy=bind  -numa node,nodeid=1,memdev=mem1

3. check memdev in HMP 
(qemu) info memdev

4. check smaps and numa_maps under /proc 

Actual results:
HMP shows same with cmdline:
(qemu) info memdev 
memory backend: 0
  size:  2147483648
  merge: true
  dump: true
  prealloc: true
  policy: bind
  host nodes: 1
memory backend: 1
  size:  4294967296
  merge: true
  dump: true
  prealloc: true
  policy: bind
  host nodes: 0

numastat shows the guest use host node0's hugepage:
# numastat -p `pgrep qemu`

Per-node process memory usage (in MBs) for PID 38662 (qemu-kvm)
                           Node 0          Node 1           Total
                  --------------- --------------- ---------------
Huge                      2048.00            0.00         2048.00
Heap                       126.29            0.00          126.29
Stack                        2.12            0.00            2.12
Private                   4154.45            0.04         4154.48
----------------  --------------- --------------- ---------------
Total                     6330.86            0.04         6330.90

numa_maps shows the hugepages are bound to node0:

# grep -2 `expr 2048 \* 1024`  /proc/`pgrep qemu`/smaps
VmFlags: rd wr mr mw me ac sd 
7f1593a00000-7f1613a00000 rw-p 00000000 00:29 184518                     /mnt/kvm_hugepage/qemu_back_mem._objects_mem1.3HDQU4 (deleted)
Size:            2097152 kB
Rss:                   0 kB
Pss:                   0 kB

# grep  7f1593a00000 /proc/`pgrep qemu`/numa_maps
7f1593a00000 bind:0 file=/mnt/kvm_hugepage/qemu_back_mem._objects_mem1.3HDQU4\040(deleted) huge anon=1024 dirty=1024 N0=1024 kernelpagesize_kB=2048

Expected results:
Hugepages should be bound to host node 1. 

Additional info:
When boot guest without ram but hugepages, the bind policy can work as expected.

Comment 2 Yumei Huang 2016-07-28 08:37:52 UTC
Hit same issue when specify ram for both nodes with prealloc=yes. 

And when set prealloc=false,  the issue is gone.  So change the summary.

Comment 3 Yumei Huang 2016-08-08 05:49:38 UTC
QE retest again, seems it has nothing to do with prealloc. 
When numad is inactive, the bind policy can work well, both memory objects are bound to right host nodes. 
When numad is running, one of the two memory objects is bound to wrong host node.

Comment 5 Eduardo Habkost 2016-08-08 14:29:49 UTC
Moving to numad component.

Comment 6 Yehuda Zimmerman 2016-08-18 13:52:49 UTC
Doc Text updated for Release Notes

Comment 8 Eduardo Habkost 2016-09-19 17:48:40 UTC
*** Bug 1361058 has been marked as a duplicate of this bug. ***

Comment 13 RHEL Program Management 2020-12-15 07:43:30 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Note You need to log in before you can comment on or make changes to this bug.