Bug 1135396

Summary: Honor hugepage settings on UMA guest
Product: Red Hat Enterprise Linux 7 Reporter: Jincheng Miao <jmiao>
Component: libvirtAssignee: Michal Privoznik <mprivozn>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.1CC: berrange, dyuan, honzhang, jdenemar, jiahu, mprivozn, mzhan, rbalakri
Target Milestone: rcKeywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-1.2.8-3.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-03-05 07:43:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jincheng Miao 2014-08-29 08:17:42 UTC
Description of problem:
hugepages page size can be specified in NUMA host.

For UMA host, it will not take effect, and the configuration still exists,
that will make user confused.

Therefore should forbid user to specify hugepage size on UMA host.

Version-Release number of selected component (if applicable):
libvirt-1.2.7-2.el7.x86_64
qemu-kvm-1.5.3-60.el7.4.x86_64

How reproducible:
100%

Steps to Reproduce:
1. add <hugepages> with page size to guest
# virsh edit r7
...
 <memoryBacking>
    <hugepages>
      <page size='2048' unit='KiB' nodeset='0'/>
    </hugepages>
  </memoryBacking>
...

2. start guest
# virsh start r7

3. guest start without hugepage
# ps -ef | grep qemu
qemu     14013     1 90 12:17 ?        00:00:05 /usr/libexec/qemu-kvm -name r7 -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid d622ee07-e04a-4f1f-9c8d-4a91822e0f86 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/r7b.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/images/r7.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=24 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:07:f9:ca,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 127.0.0.1:0 -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on


Expect result:
in step 2,
it should report an error "cannot specify page size on UMA machine"

Comment 1 Daniel Berrangé 2014-08-29 08:55:18 UTC
While it is not possible to specify a node number that is greater than 0 on a UMA machine, it *is* valid to want to request a specific page size. So it is not appropriate to reject use of the <page> element. We should already have code that validates that the node number the user requests is valid.

IOW

   <memoryBacking>
    <hugepages>
      <page size='2048' unit='KiB' nodeset='0'/>
    </hugepages>
  </memoryBacking

*is* valid on UMA machines but


   <memoryBacking>
    <hugepages>
      <page size='2048' unit='KiB' nodeset='1'/>
    </hugepages>
  </memoryBacking

should raise an error about node '1' not existing.

Comment 2 Michal Privoznik 2014-09-01 14:44:59 UTC
(In reply to Daniel Berrange from comment #1)
> While it is not possible to specify a node number that is greater than 0 on
> a UMA machine, it *is* valid to want to request a specific page size. So it
> is not appropriate to reject use of the <page> element. We should already
> have code that validates that the node number the user requests is valid.
> 
> IOW
> 
>    <memoryBacking>
>     <hugepages>
>       <page size='2048' unit='KiB' nodeset='0'/>
>     </hugepages>
>   </memoryBacking
> 
> *is* valid on UMA machines but
> 
> 
>    <memoryBacking>
>     <hugepages>
>       <page size='2048' unit='KiB' nodeset='1'/>
>     </hugepages>
>   </memoryBacking
> 
> should raise an error about node '1' not existing.

In fact, @nodeset refers to *guest* node not the host one. So as long as we are okay with creating NUMA guest on UMA host I think this is NOTABUG. Moreover, if domain is configured to bind to a nonexistent node (e.g. node #1 on UMA host), qemu will fail to start and libvirt will report correct error:

# ls /sys/devices/system/node/node
node0/ node1/ node2/ node3/ 

virsh # start migt10
error: Failed to start domain migt10
error: internal error: process exited while connecting to monitor: 2014-09-01T14:43:36.164594Z qemu-system-x86_64: -object memory-backend-file,prealloc=yes,mem-path=/hugepages2/libvirt/qemu,size=1024M,id=ram-node1,host-nodes=1,policy=bind: unable to map backing store for hugepages: Cannot allocate memory
2014-09-01T14:43:36.379604Z qemu-system-x86_64: -object memory-backend-file,prealloc=yes,mem-path=/hugepages/libvirt/qemu,size=1024M,id=ram-node3,host-nodes=4,policy=bind: cannot bind memory to host NUMA nodes: Invalid argument

Comment 3 Jincheng Miao 2014-09-02 05:51:38 UTC
Hi Michal,

I got the different result on UMA:

# virsh dumpxml r7b
<domain type='kvm'>
  <name>r7b</name>
  <uuid>d622ee07-e04a-4f1f-9c8d-4a91822e0f86</uuid>
  <memory unit='KiB'>1048576</memory>
  <currentMemory unit='KiB'>400000</currentMemory>
  <memoryBacking>
    <hugepages>
      <page size='2048' unit='KiB' nodeset='1'/>
    </hugepages>
  </memoryBacking>
  <vcpu placement='static'>1</vcpu>
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.0.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <cpu>
    <topology sockets='1' cores='2' threads='1'/>
  </cpu>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/var/lib/libvirt/images/r7.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </disk>
    <controller type='usb' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </controller>
    <interface type='network'>
      <mac address='52:54:00:07:f9:ca'/>
      <source network='default'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <input type='tablet' bus='usb'/>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes'/>
    <sound model='ich6'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </sound>
    <video>
      <model type='qxl' ram='65536' vram='65536' heads='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='none'/>
  </devices>
</domain>

# virsh start r7b
Domain r7b started

After started guest, the qemu-kvm didn't use hugetlbfs at all:
# ps -ef | grep qemu
qemu     22380     1 88 13:46 ?        00:00:03 /usr/libexec/qemu-kvm -name r7b -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=2,threads=1 -uuid d622ee07-e04a-4f1f-9c8d-4a91822e0f86 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/r7b.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/images/r7.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=24 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:07:f9:ca,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 127.0.0.1:0 -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -msg timestamp=on
root     22389  8326  0 13:46 pts/4    00:00:00 grep --color=auto qemu

# mount | grep hugetlbfs
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel)

# cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
550

# cat /sys/kernel/mm/hugepages/hugepages-2048kB/free_hugepages 
550


So this configuration causes qemu-kvm doesn't use hugepages.

Comment 4 Jincheng Miao 2014-09-02 06:34:55 UTC
I think the question of hugepage could be asked as:

1. <hugepages> element is making qemu-kvm using host's hugepage.
But <page> element under <hugepages> will configure guest NUMA node, then lost the using of  host's hugepages. Does it OK?

2. If guest NUMA cell is not set, does <hugepages> <page> take effect?

Comment 5 Michal Privoznik 2014-09-02 15:12:44 UTC
(In reply to Jincheng Miao from comment #3)
> Hi Michal,
> 
> I got the different result on UMA:
> 
> # virsh dumpxml r7b
> <domain type='kvm'>
>   <name>r7b</name>
>   <uuid>d622ee07-e04a-4f1f-9c8d-4a91822e0f86</uuid>
>   <memory unit='KiB'>1048576</memory>
>   <currentMemory unit='KiB'>400000</currentMemory>
>   <memoryBacking>
>     <hugepages>
>       <page size='2048' unit='KiB' nodeset='1'/>
>     </hugepages>
>   </memoryBacking>
>   <vcpu placement='static'>1</vcpu>
>   <os>
>     <type arch='x86_64' machine='pc-i440fx-rhel7.0.0'>hvm</type>
>     <boot dev='hd'/>
>   </os>
>   <features>
>     <acpi/>
>     <apic/>
>     <pae/>
>   </features>
>   <cpu>
>     <topology sockets='1' cores='2' threads='1'/>
>   </cpu>

A-ha! This is the part that's been missing. In that case you're right. This is a bug. I've proposed patches upstream:

https://www.redhat.com/archives/libvir-list/2014-September/msg00089.html

Moreover, I'm adjusting the bug summary to match the problem more accurately.

Comment 10 Jincheng Miao 2014-09-24 10:21:17 UTC
The expected result is that if page element of hugepages is configured, and
there is no guest NUMA setting, libvirtd will choose global way of
'--mem-path' argument, and page element will not be used. 

reproduce steps:
1. configure guest with page element of hugepages, and without guest NUMA:
# virsh dumpxml r7b
...
  <memoryBacking>
    <hugepages>
      <page size='2048' unit='KiB' nodeset='0'/>
    </hugepages>
  </memoryBacking>
...
  <cpu>
  </cpu>
...

2. start it
# virsh start r7b

3. check qemu command line, it has '-mem-path':
# ps -ef | grep qemu
qemu     31300     1 74 18:20 ?        00:00:02 /usr/libexec/qemu-kvm -name r7b -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off -m 1024 -mem-prealloc -mem-path /dev/hugepages/libvirt/qemu -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid d622ee07-e04a-4f1f-9c8d-4a91822e0f86 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/r7b.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/images/r7.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=24 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:07:f9:ca,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 127.0.0.1:0 -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -msg timestamp=on

Comment 11 Jincheng Miao 2014-11-19 09:11:47 UTC
According to comment 10, and testing in latest libvirt-1.2.8-7.el7.x86_64, this bug is fixed, so change the status to VERIFIED.

Comment 13 errata-xmlrpc 2015-03-05 07:43:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0323.html