Bug 1076990 (qemu-complex-mem)
Summary: | Enable complex memory requirements for virtual machines | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Stephen Gordon <sgordon> |
Component: | qemu-kvm-rhev | Assignee: | Eduardo Habkost <ehabkost> |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | high | ||
Version: | 7.0 | CC: | ehabkost, hhuang, juzhang, knoel, michen, mrezanin, rbalakri, virt-maint, xfu |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | qemu-kvm-rhev-2.1.2-2.el7 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | libvirt-complex-guest-mem | Environment: | |
Last Closed: | 2015-03-05 09:44:48 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 996259 | ||
Bug Blocks: | 1076989, 1110708 |
Description
Stephen Gordon
2014-03-17 00:28:20 UTC
Moving to qemu-kvm-rhev. Patches were included upstream and will be in QEMU 2.1.0. Hi Eduardo, kvm QE want to know how to test two checkpoint as below from qemu-kvm-rhev. would you please provide qemu-kvm-rhev command line for QE? 1. * A virtual machine using 2 NUMA nodes, with different huge pages number for each NUMA node 2. * A virtual machine with a specific number of huge pages and an additional amount of memory not backed by huge pages (this latter might be oversubscribed), guaranteeing that all memory comes from the same NUMA node BTW, QE didn't find patch file related in qemu-2.1. #rpm -qpi qemu-kvm-rhev-2.1.0-2.el7.src.rpm --changelog |grep 1076990 nothing #rpm -qpi qemu-kvm-2.1.0-1.el7.src.rpm --changelog |grep 1076990 nothing (In reply to FuXiangChun from comment #5) > 1. * A virtual machine using 2 NUMA nodes, with different huge pages number > for each NUMA node Use a different memdev for each numa node, each with a different hugetlbfs mountpoints and different "host-nodes" options. e.g.: -object memory-backend-file,host-nodes=0,id=mem-0,mem-path=/tmp/hugetlbfs1 \ -numa node,id=0,memdev=mem-0 \ -object memory-backend-file,host-nodes=1,id=mem-1,mem-path=/tmp/hugetlbfs2 \ -numa node,id=1,memdev=mem-1 \ > 2. * A virtual machine with a specific number of huge pages and an > additional amount of memory not backed by huge pages (this latter might be > oversubscribed), guaranteeing that all memory comes from the same NUMA node Just use a different memdev for each numa node. Some nodes may point to memory-backend-file objects, other nodes may point to memory-backend-ram objects. > > BTW, QE didn't find patch file related in qemu-2.1. > #rpm -qpi qemu-kvm-rhev-2.1.0-2.el7.src.rpm --changelog |grep 1076990 > nothing > #rpm -qpi qemu-kvm-2.1.0-1.el7.src.rpm --changelog |grep 1076990 > nothing The package was rebased and the patches are already in upstream QEMU version 2.1.0. (In reply to Eduardo Habkost from comment #6) > > 2. * A virtual machine with a specific number of huge pages and an > > additional amount of memory not backed by huge pages (this latter might be > > oversubscribed), guaranteeing that all memory comes from the same NUMA node > > Just use a different memdev for each numa node. Some nodes may point to > memory-backend-file objects, other nodes may point to memory-backend-ram > objects. Example command-line options: -object memory-backend-ram,host-nodes=0,id=mem-0 \ -numa node,id=0,memdev=mem-0 \ -object memory-backend-file,host-nodes=1,id=mem-1,mem-path=/tmp/hugetlbfs2 \ -numa node,id=1,memdev=mem-1 \ Tested qemu-kvm-rhev-2.1.0-3.el7.x86_64 & 3.10.0-148.el7.x86_64 /usr/libexec/qemu-kvm -object memory-backend-file,host-nodes=0,id=mem-0,mem-path=/mnt/kvm_hugepage/ -numa node,id=0,memdev=mem-0 -object memory-backend-file,host-nodes=1,id=mem-1,mem-path=/mnt/kvm_hugepage -numa node,id=1,memdev=mem-1 result: qemu-kvm: -numa node,id=0,memdev=mem-0: Parameter 'id' expects an identifier /usr/libexec/qemu-kvm -object memory-backend-file,host-nodes=0,id=mem-0,mem-path=/mnt/kvm_hugepage/ -numa node,memdev=mem-0 -object memory-backend-file,host-nodes=1,id=mem-1,mem-path=/mnt/kvm_hugepage -numa node,memdev=mem-1 result: qemu-kvm: -object memory-backend-file,host-nodes=0,id=mem-0,mem-path=/mnt/kvm_hugepage/: NUMA node binding are not supported by this QEMU qemu-kvm: -object memory-backend-file,host-nodes=1,id=mem-1,mem-path=/mnt/kvm_hugepage: NUMA node binding are not supported by this QEMU Eduardo, Base on this result. Do I need to re-assign this bug? (In reply to FuXiangChun from comment #8) > Tested qemu-kvm-rhev-2.1.0-3.el7.x86_64 & 3.10.0-148.el7.x86_64 > /usr/libexec/qemu-kvm > -object > memory-backend-file,host-nodes=0,id=mem-0,mem-path=/mnt/kvm_hugepage/ -numa > node,id=0,memdev=mem-0 > -object memory-backend-file,host-nodes=1,id=mem-1,mem-path=/mnt/kvm_hugepage > -numa node,id=1,memdev=mem-1 > > result: > qemu-kvm: -numa node,id=0,memdev=mem-0: Parameter 'id' expects an identifier This was my mistake. Proper format is: -node nodeid=X,memdev=Y. > > /usr/libexec/qemu-kvm > -object > memory-backend-file,host-nodes=0,id=mem-0,mem-path=/mnt/kvm_hugepage/ -numa > node,memdev=mem-0 > -object memory-backend-file,host-nodes=1,id=mem-1,mem-path=/mnt/kvm_hugepage > -numa node,memdev=mem-1 > > result: > qemu-kvm: -object > memory-backend-file,host-nodes=0,id=mem-0,mem-path=/mnt/kvm_hugepage/: NUMA > node binding are not supported by this QEMU > qemu-kvm: -object > memory-backend-file,host-nodes=1,id=mem-1,mem-path=/mnt/kvm_hugepage: NUMA > node binding are not supported by this QEMU This is a bug. QEMU should be compiled with --enable-numa. Reopening. Re-tested this issue with private build qemu-kvm-rhev-2.1.0-3.el7.numa.buildrequires.v1.x86_64. QE tested 2 scenarios. S1. /usr/libexec/qemu-kvm -M pc-i440fx-rhel7.0.0 -name RHEL-Server-7.0-64 -m 27G -smp 4,maxcpus=160 \ -object memory-backend-file,host-nodes=0,id=mem-0,policy=bind,prealloc=yes,mem-path=/mnt/kvm_hugepage/,size=1024M -numa node,nodeid=0,memdev=mem-0 -object memory-backend-file,policy=bind,host-nodes=1,id=mem-1,mem-path=/mnt/kvm_hugepage2,size=1024M -numa node,nodeid=1,memdev=mem-1 result: # grep -2 1048576 smaps 2aaaaac00000-2aaaeac00000 rw-p 00000000 00:26 667856 /mnt/kvm_hugepage/qemu_back_mem._objects_mem-0.srrTfd (deleted) Size: 1048576 kB Rss: 0 kB Pss: 0 kB -- VmFlags: rd wr mr mw me dc de ht 2aaaeac00000-2aab2ac00000 rw-p 00000000 00:27 666757 /mnt/kvm_hugepage2/qemu_back_mem._objects_mem-1.XElhru (deleted) Size: 1048576 kB Rss: 0 kB Pss: 0 kB # cat numa_maps |grep 2aaaaac00000 2aaaaac00000 bind:0 file=/mnt/kvm_hugepage/qemu_back_mem._objects_mem-0.srrTfd\040(deleted) huge anon=512 dirty=512 N0=512 # cat numa_maps |grep 2aaaeac00000 2aaaeac00000 bind:1 file=/mnt/kvm_hugepage2/qemu_back_mem._objects_mem-1.XElhru\040(deleted) huge anon=502 dirty=502 N1=502 S2. /usr/libexec/qemu-kvm -M pc-i440fx-rhel7.0.0 -name RHEL-Server-7.0-64 -m 2G -smp 4,maxcpus=160 -object memory-backend-file,host-nodes=0,id=mem-0,policy=bind,prealloc=yes,mem-path=/mnt/kvm_hugepage/,size=1536M -numa node,nodeid=0,memdev=mem-0 -object memory-backend-file,policy=bind,host-nodes=1,id=mem-1,mem-path=/mnt/kvm_hugepage2,size=512M -numa node,nodeid=1,memdev=mem-1 result: # grep -2 1572864 smaps 2aaaaac00000-2aab0ac00000 rw-p 00000000 00:26 669656 /mnt/kvm_hugepage/qemu_back_mem._objects_mem-0.DPAbez (deleted) Size: 1572864 kB Rss: 0 kB Pss: 0 kB # grep -2 524288 smaps VmFlags: rd wr mr mw me dc de ht 2aab0ac00000-2aab2ac00000 rw-p 00000000 00:27 669657 /mnt/kvm_hugepage2/qemu_back_mem._objects_mem-1.DrCFi1 (deleted) Size: 524288 kB Rss: 0 kB Pss: 0 kB # cat numa_maps |grep 2aaaaac00000 2aaaaac00000 bind:0 file=/mnt/kvm_hugepage/qemu_back_mem._objects_mem-0.DPAbez\040(deleted) huge anon=768 dirty=768 N0=768 # cat numa_maps |grep 2aab0ac00000 2aab0ac00000 bind:1 file=/mnt/kvm_hugepage2/qemu_back_mem._objects_mem-1.DrCFi1\040(deleted) huge anon=135 dirty=135 N1=135 Eduardo, can this result verify this bug? Another, about "an additional amount of memory not backed by huge pages", can you provide a qemu-kvm cli example to QE? QE don't know how to trigger it. It is from 2. * A virtual machine with a specific number of huge pages and an additional amount of memory not backed by huge pages (In reply to FuXiangChun from comment #11) > Re-tested this issue with private build > qemu-kvm-rhev-2.1.0-3.el7.numa.buildrequires.v1.x86_64. QE tested 2 > scenarios. > > S1. > /usr/libexec/qemu-kvm -M pc-i440fx-rhel7.0.0 -name RHEL-Server-7.0-64 -m 27G > -smp 4,maxcpus=160 \ > > -object > memory-backend-file,host-nodes=0,id=mem-0,policy=bind,prealloc=yes,mem-path=/ > mnt/kvm_hugepage/,size=1024M > -numa node,nodeid=0,memdev=mem-0 > -object > memory-backend-file,policy=bind,host-nodes=1,id=mem-1,mem-path=/mnt/ > kvm_hugepage2,size=1024M > -numa node,nodeid=1,memdev=mem-1 I see you didn't use prealloc on mem-1. > # cat numa_maps |grep 2aaaaac00000 > 2aaaaac00000 bind:0 > file=/mnt/kvm_hugepage/qemu_back_mem._objects_mem-0.srrTfd\040(deleted) huge > anon=512 dirty=512 N0=512 That means 512 2MB pages, or 1024MB. Looks OK. > > # cat numa_maps |grep 2aaaeac00000 > 2aaaeac00000 bind:1 > file=/mnt/kvm_hugepage2/qemu_back_mem._objects_mem-1.XElhru\040(deleted) > huge anon=502 dirty=502 N1=502 This one doesn't have all pages allocated because it doesn't have prealloc=yes. But they are all on node 1. Looks OK. Note that for the above test case, you will need the hugepages to be preallocated on the right nodes. You can do that very early on boot by using: # echo 512 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages # echo 512 > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages Please also check the contents of those files to ensure enough hugepages were allocated. > > S2. > /usr/libexec/qemu-kvm -M pc-i440fx-rhel7.0.0 -name RHEL-Server-7.0-64 -m 2G > -smp 4,maxcpus=160 -object > memory-backend-file,host-nodes=0,id=mem-0,policy=bind,prealloc=yes,mem-path=/ > mnt/kvm_hugepage/,size=1536M -numa node,nodeid=0,memdev=mem-0 -object > memory-backend-file,policy=bind,host-nodes=1,id=mem-1,mem-path=/mnt/ > kvm_hugepage2,size=512M -numa node,nodeid=1,memdev=mem-1 > > result: > # grep -2 1572864 smaps > 2aaaaac00000-2aab0ac00000 rw-p 00000000 00:26 669656 > /mnt/kvm_hugepage/qemu_back_mem._objects_mem-0.DPAbez (deleted) > Size: 1572864 kB > Rss: 0 kB > Pss: 0 kB > # grep -2 524288 smaps > VmFlags: rd wr mr mw me dc de ht > 2aab0ac00000-2aab2ac00000 rw-p 00000000 00:27 669657 > /mnt/kvm_hugepage2/qemu_back_mem._objects_mem-1.DrCFi1 (deleted) > Size: 524288 kB > Rss: 0 kB > Pss: 0 kB > > # cat numa_maps |grep 2aaaaac00000 > 2aaaaac00000 bind:0 > file=/mnt/kvm_hugepage/qemu_back_mem._objects_mem-0.DPAbez\040(deleted) huge > anon=768 dirty=768 N0=768 768*2MB = 1536MB. All on N0. Looks OK. > > # cat numa_maps |grep 2aab0ac00000 > 2aab0ac00000 bind:1 > file=/mnt/kvm_hugepage2/qemu_back_mem._objects_mem-1.DrCFi1\040(deleted) > huge anon=135 dirty=135 N1=135 Again, you didn't use prealloc for node 1, so not all pages were allocated. But the ones that were allocated are all on node 1. Looks good. > > Eduardo, > can this result verify this bug? > Another, about "an additional amount of memory not backed by huge pages", > can you provide a qemu-kvm cli example to QE? QE don't know how to trigger > it. See comment #7: -object memory-backend-ram,host-nodes=0,id=mem-0 \ -numa node,id=0,memdev=mem-0 \ -object memory-backend-file,host-nodes=1,id=mem-1,mem-path=/tmp/hugetlbfs2 \ -numa node,id=1,memdev=mem-1 \ That will use hugepages for guest node 1, and normal pages for guest node 0. In the example above the pages can come from any host node, but you can use policy=bind,host-nodes=X to make sure they come from a specific host node (you can do that for normal pages and for hugepages). I forgot to add the "size" parameters to the memory-backend-* objects, but you can choose reasonable sizes for each one. Fix included in qemu-kvm-rhev-2.1.2-2.el7 Re-verified bug with 3.10.0-195.el7.x86_64 and qemu-kvm-rhev-2.1.2-5.el7.x86_64. host info: # cat /proc/buddyinfo Node 0, zone DMA 0 1 1 1 1 1 1 0 1 1 3 Node 0, zone DMA32 175 96 139 68 11 5 17 15 7 4 1 Node 0, zone Normal 122 87 168 92 36 8 8 20 4 1 0 Node 1, zone Normal 890 552 301 136 69 14 6 7 9 5 3 Node 2, zone Normal 309 225 195 138 102 54 19 15 9 2 1 Node 3, zone Normal 373 303 200 148 54 32 13 10 11 3 S1. with the same huge pages number for each NUMA node 1. echo 2048 >/sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages 2. echo 2048 >/sys/devices/system/node/node2/hugepages/hugepages-2048kB/nr_hugepages 3. qemu-kvm cli /usr/libexec/qemu-kvm -m 8G \ -object memory-backend-file,host-nodes=0,id=mem-0,policy=bind,prealloc=yes,mem-path=/mnt/kvm_hugepage1,size=4096M -numa node,nodeid=0,memdev=mem-0 \ -object memory-backend-file,policy=bind,host-nodes=1,id=mem-1,prealloc=yes,mem-path=/mnt/kvm_hugepage2,size=4096M -numa node,nodeid=1,memdev=mem-1 \ result: # grep -2 4194304 smaps 2aaaaac00000-2aabaac00000 rw-p 00000000 00:26 43472 /mnt/kvm_hugepage1/qemu_back_mem._objects_mem-0.BCQSYP (deleted) Size: 4194304 kB Rss: 0 kB Pss: 0 kB -- VmFlags: rd wr mr mw me dc de ht 2aabaac00000-2aacaac00000 rw-p 00000000 00:27 43473 /mnt/kvm_hugepage2/qemu_back_mem._objects_mem-1.pZ3s4r (deleted) Size: 4194304 kB Rss: 0 kB Pss: 0 kB # grep 2aaaaac00000 numa_maps 2aaaaac00000 bind:0 file=/mnt/kvm_hugepage1/qemu_back_mem._objects_mem-0.BCQSYP\040(deleted) huge anon=2048 dirty=2048 N0=2048 # grep 2aabaac00000 numa_maps 2aabaac00000 bind:1 file=/mnt/kvm_hugepage2/qemu_back_mem._objects_mem-1.pZ3s4r\040(deleted) huge anon=2048 dirty=2048 N1=2048 S2. with different huge pages number for each NUMA node 1./usr/libexec/qemu-kvm -m 4.5G \ -object memory-backend-file,host-nodes=0,id=mem-0,policy=bind,prealloc=yes,mem-path=/mnt/kvm_hugepage1,size=4096M -numa node,nodeid=0,memdev=mem-0 \ -object memory-backend-file,policy=bind,host-nodes=1,id=mem-1,prealloc=yes,mem-path=/mnt/kvm_hugepage2,size=512M -numa node,nodeid=1,memdev=mem-1 \ 2.# grep -2 524288 smaps VmFlags: rd wr mr mw me dc de ht 2aabaac00000-2aabcac00000 rw-p 00000000 00:27 21992 /mnt/kvm_hugepage2/qemu_back_mem._objects_mem-1.A1Oe9q (deleted) Size: 524288 kB Rss: 0 kB Pss: 0 kB 3.# grep 2aabaac00000 numa_maps 2aabaac00000 bind:1 file=/mnt/kvm_hugepage2/qemu_back_mem._objects_mem-1.A1Oe9q\040(deleted) huge anon=256 dirty=256 N1=256 S3. with normal memory and hugepage for each NUMA node /usr/libexec/qemu-kvm -m 4.5G \ -object memory-backend-file,host-nodes=0,id=mem-0,policy=bind,prealloc=yes,mem-path=/mnt/kvm_hugepage1,size=4096M -numa node,nodeid=0,memdev=mem-0 \ -object memory-backend-ram,policy=bind,host-nodes=1,id=mem-1,prealloc=yes,size=512M -numa node,nodeid=1,memdev=mem-1 \ result: # grep -2 4194304 smaps 2aaaaac00000-2aabaac00000 rw-p 00000000 00:26 61977 /mnt/kvm_hugepage1/qemu_back_mem._objects_mem-0.HcgDMS (deleted) Size: 4194304 kB Rss: 0 kB Pss: 0 kB # grep 7f29ec200000 numa_maps 7f29ec200000 bind:1 anon=131072 dirty=131072 active=76800 N1=131072 Additional. 1. RHEL7.1 guest show numa info is good via numactl -H 2. windows guest works well, but not found any way to check numa information inside guest. According to 3 scenarios's test result. this bug is fixed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0624.html |