Bug 1360584

Summary:	Bind policy don't work well when numad is running
Product:	Red Hat Enterprise Linux 7	Reporter:	Yumei Huang <yuhuang>
Component:	numad	Assignee:	Lukáš Nykrýn <lnykryn>
Status:	CLOSED WONTFIX	QA Contact:	qe-baseos-daemons
Severity:	medium	Docs Contact:	Yehuda Zimmerman <yzimmerm>
Priority:	unspecified
Version:	7.3	CC:	bgray, chayang, drjones, juzhang, knoel, qzhang, virt-maint, yzimmerm
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Known Issue
Doc Text:	numad changes QEMU memory bindings Currently, the numad daemon cannot distinguish between memory bindings that numad sets and memory bindings set explicitly by the memory mappings of a process. As a consequence, numad changes QEMU memory bindings, even when the NUMA memory policy is specified in the QEMU command line. To work around this problem, if manual NUMA bindings are specified in the guest, disable numad. This ensures that manual bindings configured in virtual machines are not changed by numad.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-12-15 07:43:30 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Yumei Huang 2016-07-27 06:11:06 UTC

Description of problem:
Boot guest with ram for one numa node and hugepages for another numa node, and bind them to two different host nodes. According to numa_maps, the hugepages are bound to wrong host node. 

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.6.0-15.el7
kernel-3.10.0-478.el7.x86_64

How reproducible:
always

Steps to Reproduce:
1. prepare hugepages in host
# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
node 0 size: 16349 MB
node 0 free: 7775 MB
node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
node 1 size: 12287 MB
node 1 free: 6788 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 

# echo 3000 > /proc/sys/vm/nr_hugepages

# cat /proc/meminfo  | grep -i huge
AnonHugePages:     14336 kB
HugePages_Total:    3000
HugePages_Free:     3000
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

# mount -t hugetlbfs none /mnt/kvm_hugepage/

# cat /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages 
1500

# cat /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages 
1500


2.Boot guest with two numa nodes, specify 4G ram for node0, and bind to host node0; specify 2G hugepages for node1, and bind to host node1. 

# /usr/libexec/qemu-kvm -name rhel73 -m 6G,slots=240,maxmem=20G -smp 4  \

-realtime mlock=off  -no-user-config -nodefaults \

-drive file=/home/guest/RHEL-Server-7.3-64-virtio-scsi.qcow2,if=none,id=drive-disk,format=qcow2,cache=none  -device virtio-scsi-pci,id=scsi0 -device scsi-hd,drive=drive-disk,bus=scsi0.0,id=scsi-hd0 \

-netdev tap,id=hostnet1 -device virtio-net-pci,mac=42:ce:a9:d2:4d:d8,id=idlbq7eA,netdev=hostnet1 -usb -device usb-tablet,id=input0 -vga qxl -spice port=5902,addr=0.0.0.0,disable-ticketing,image-compression=off,seamless-migration=on -monitor stdio \

-object memory-backend-ram,id=mem0,size=4G,prealloc=yes,policy=bind,host-nodes=0  -numa node,nodeid=0,memdev=mem0 \

-object memory-backend-file,id=mem1,size=2G,mem-path=/mnt/kvm_hugepage,prealloc=yes,host-nodes=1,policy=bind  -numa node,nodeid=1,memdev=mem1

3. check memdev in HMP 
(qemu) info memdev

4. check smaps and numa_maps under /proc 


Actual results:
HMP shows same with cmdline:
(qemu) info memdev 
memory backend: 0
  size:  2147483648
  merge: true
  dump: true
  prealloc: true
  policy: bind
  host nodes: 1
memory backend: 1
  size:  4294967296
  merge: true
  dump: true
  prealloc: true
  policy: bind
  host nodes: 0

numastat shows the guest use host node0's hugepage:
# numastat -p `pgrep qemu`

Per-node process memory usage (in MBs) for PID 38662 (qemu-kvm)
                           Node 0          Node 1           Total
                  --------------- --------------- ---------------
Huge                      2048.00            0.00         2048.00
Heap                       126.29            0.00          126.29
Stack                        2.12            0.00            2.12
Private                   4154.45            0.04         4154.48
----------------  --------------- --------------- ---------------
Total                     6330.86            0.04         6330.90


numa_maps shows the hugepages are bound to node0:

# grep -2 `expr 2048 \* 1024`  /proc/`pgrep qemu`/smaps
VmFlags: rd wr mr mw me ac sd 
7f1593a00000-7f1613a00000 rw-p 00000000 00:29 184518                     /mnt/kvm_hugepage/qemu_back_mem._objects_mem1.3HDQU4 (deleted)
Size:            2097152 kB
Rss:                   0 kB
Pss:                   0 kB

# grep  7f1593a00000 /proc/`pgrep qemu`/numa_maps
7f1593a00000 bind:0 file=/mnt/kvm_hugepage/qemu_back_mem._objects_mem1.3HDQU4\040(deleted) huge anon=1024 dirty=1024 N0=1024 kernelpagesize_kB=2048


Expected results:
Hugepages should be bound to host node 1. 

Additional info:
When boot guest without ram but hugepages, the bind policy can work as expected.

Comment 2 Yumei Huang 2016-07-28 08:37:52 UTC

Hit same issue when specify ram for both nodes with prealloc=yes. 

And when set prealloc=false,  the issue is gone.  So change the summary.

Comment 3 Yumei Huang 2016-08-08 05:49:38 UTC

QE retest again, seems it has nothing to do with prealloc. 
When numad is inactive, the bind policy can work well, both memory objects are bound to right host nodes. 
When numad is running, one of the two memory objects is bound to wrong host node.

Comment 5 Eduardo Habkost 2016-08-08 14:29:49 UTC

Moving to numad component.

Comment 6 Yehuda Zimmerman 2016-08-18 13:52:49 UTC

Doc Text updated for Release Notes

Comment 8 Eduardo Habkost 2016-09-19 17:48:40 UTC

*** Bug 1361058 has been marked as a duplicate of this bug. ***

Comment 13 RHEL Program Management 2020-12-15 07:43:30 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 14 Red Hat Bugzilla 2023-09-14 03:28:44 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days