Bug 518099 - RFE: [LTC 5.5] libvirt control hugepages [202016]
Summary: RFE: [LTC 5.5] libvirt control hugepages [202016]
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: libvirt
Version: 5.5
Hardware: All
OS: All
high
high
Target Milestone: beta
: 5.5
Assignee: Daniel Veillard
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 481160 533941
TreeView+ depends on / blocked
 
Reported: 2009-08-18 21:20 UTC by IBM Bug Proxy
Modified: 2018-12-09 16:41 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-03-30 08:09:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
backported patch to 0.6.3 (14.53 KB, patch)
2009-11-18 15:28 UTC, Daniel Veillard
no flags Details | Diff
detail information (7.10 KB, text/plain)
2010-01-05 10:39 UTC, Alex Jia
no flags Details
New patch adding -mem-prealloc to kvm command line if using hugepage (14.62 KB, patch)
2010-01-06 13:19 UTC, Daniel Veillard
no flags Details | Diff
kvm domain is successful (3.88 KB, text/plain)
2010-01-08 05:50 UTC, Alex Jia
no flags Details
backported patch to 0.6.3 (14.53 KB, text/plain)
2010-01-19 08:24 UTC, IBM Bug Proxy
no flags Details
detail information (7.10 KB, application/octet-stream)
2010-01-19 08:24 UTC, IBM Bug Proxy
no flags Details
New patch adding -mem-prealloc to kvm command line if using hugepage (14.62 KB, text/plain)
2010-01-19 08:24 UTC, IBM Bug Proxy
no flags Details
kvm domain is successful (3.88 KB, text/plain)
2010-01-19 08:24 UTC, IBM Bug Proxy
no flags Details


Links
System ID Private Priority Status Summary Last Updated
IBM Linux Technology Center 55536 0 None None None Never
Red Hat Product Errata RHBA-2010:0205 0 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2010-03-29 12:27:37 UTC

Description IBM Bug Proxy 2009-08-18 21:20:51 UTC
=Comment: #0=================================================
Emily J. Ratliff <ratliff.com> - 
1. Feature Overview:
Feature Id:	[202016]
a. Name of Feature:	libvirt control hugepages
b. Feature Description
Have libvirt provide controls over the use of hugepages to back VMs

2. Feature Details:
Sponsor:	LTC
Architectures:
x86
x86_64

Arch Specificity: Both
Delivery Mechanism: Direct from community
Category:	Xen
Request Type:	Package - Feature from Upstream
d. Upstream Acceptance:	In Progress
Sponsor Priority	2
f. Severity: Medium
IBM Confidential:	no
Code Contribution:	3rd party code
g. Component Version Target:	latest version of libvirt

3. Business Case
On systems with hardware paging support (EPT/NPT) using largepages through hugetlbfs to back the VM
can provide a significant perforamce improvement for some workloads.  Currently this isn't exposed
through libvirt as a guest configurable option.  Add support for this will allow managemnt software
greater flexibility to control quest config in performance sensitive areas.

4. Primary contact at Red Hat: 
John Jarvis
jjarvis

5. Primary contacts at Partner:
Project Management Contact:
Stephanie Glass, sglass.com

Technical contact(s):
Ryan Harper, raharper.com

IBM Manager:
Warren Grunbok II, grunbok.com

Comment 1 Daniel Berrangé 2009-08-19 08:48:41 UTC
FYI corresponding RHEL-6.0 RFE bug

https://bugzilla.redhat.com/show_bug.cgi?id=515285

and upstream patch status

http://www.redhat.com/archives/libvir-list/2009-July/msg00753.html

Comment 2 Daniel Berrangé 2009-08-26 13:06:48 UTC
An updated patch has been submitted upstream with greater capabilities

http://www.redhat.com/archives/libvir-list/2009-August/msg00480.html

Comment 3 John Jarvis 2009-10-29 15:43:44 UTC
IBM is signed up to test and provide feedback

Comment 4 John Jarvis 2009-10-29 16:28:31 UTC
This enhancement request was evaluated by the full Red Hat Enterprise Linux 
team for inclusion in a Red Hat Enterprise Linux minor release.   As a 
result of this evaluation, Red Hat has tentatively approved inclusion of 
this feature in the next Red Hat Enterprise Linux Update minor release.   
While it is a goal to include this enhancement in the next minor release 
of Red Hat Enterprise Linux, the enhancement is not yet committed for 
inclusion in the next minor release pending the next phase of actual 
code integration and successful Red Hat and partner testing.

Comment 5 Daniel Veillard 2009-11-18 15:28:42 UTC
Created attachment 370103 [details]
backported patch to 0.6.3

It's a backported patch to the RHEL-5.4 version, basically the
patch which was applied upstream but without the support for running
qemu as a different user/group than root as this wasn't available
in 0.6.3 and some massaging to get the thing to build properly on
RHEL-5.x

Comment 6 Daniel Veillard 2009-11-25 16:09:48 UTC
libvirt-0.6.3-22.el5 has been built in dist-5E-qu-candidate with
the fixes,

Daniel

Comment 8 Alex Jia 2010-01-05 10:19:53 UTC
This bug can't be fixed with libvirt-0.6.3-28.el5 on RHEL5.5.
For xen and kvm hypervisor,it will raise different result:

1.xen hypervisor:
I find the current xen kernel seems to don't support huge page assignment,may be I missing some stuff or need to recompile xen kernel to support it?
[root@dhcp-66-70-62 ~]# cat /boot/config-2.6.18-183.el5xen | grep HUGE
# CONFIG_HUGETLB_PAGE is not set

in addition,the following xml block can't be added into guest configuration.
[root@dhcp-66-70-62 ~]# cat hugepage.xml
  <memoryBacking>
    <hugepages/>
  </memoryBacking>

[root@dhcp-66-70-62 ~]# virsh edit rhel5u5_x86_64_xenfv
Domain rhel5u5_x86_64_xenfv XML configuration edited.

or

[root@dhcp-66-70-62 ~]# virsh edit rhel5u5_x86_64_xenpv
Domain rhel5u5_x86_64_xenpv XML configuration edited.

although the above result display successful editor,using 'virsh dumpxml' can't see it.

see attachment detail.


2.kvm hypervisor:
when starting the guest by virsh cmd,it will raise a error information:
[root@dhcp-66-70-62 ~]# virsh start rhel5u5
error: Failed to start domain rhel5u5
error: internal error unable to start guest:

I use the same qemu-kvm cmd arguments(by libvirt using) to start the guest,a segfault will be raised:

[root@dhcp-66-70-62 ~]# /usr/libexec/qemu-kvm -S -M pc -no-kvm -m 511 -mem-path /dev/hugepages/libvirt/qemu -smp 1 -name rhel5u5 -uuid c0e805d9-e288-e4e8-2357-13b39d57cb6e -no-kvm-pit-reinjection -monitor pty -pidfile /var/run/libvirt/qemu//rhel5u5.pid -boot c -drive file=/var/lib/libvirt/images/rhel5u5.img,if=ide,index=0,boot=on -net none -serial pty -parallel none -usb -vnc 127.0.0.1:0 -k en-us
Segmentation fault

I think some arguments raise the qemu exception such as "-no-kvm" argument,so I try to delete some arguments and start the guest successfully.

[root@dhcp-66-70-62 ~]# /usr/libexec/qemu-kvm -m 511 -mem-path /dev/hugepages/libvirt/qemu -smp 1 -name rhel5u5 -uuid c0e805d9-e288-e4e8-2357-13b39d57cb6e -no-kvm-pit-reinjection -monitor pty -pidfile /var/run/libvirt/qemu//rhel5u5.pid -boot c -drive file=/var/lib/libvirt/images/rhel5u5.img,if=ide,index=0,boot=on -net none -serial pty -parallel none -usb -vnc 127.0.0.1:0 -k en-us --daemonize
char device redirected to /dev/pts/7
char device redirected to /dev/pts/8
[root@dhcp-66-70-62 ~]# vncviewer :0

VNC Viewer Free Edition 4.1.2 for X - built Jan 26 2009 11:52:08
Copyright (C) 2002-2005 RealVNC Ltd.
See http://www.realvnc.com for information on VNC.

Tue Jan  5 18:00:47 2010
 CConn:       connected to host localhost port 5900
 CConnection: Server supports RFB protocol version 3.8
 CConnection: Using RFB protocol version 3.8
 TXImage:     Using default colormap and visual, TrueColor, depth 24.
 CConn:       Using pixel format depth 6 (8bpp) rgb222
 CConn:       Using ZRLE encoding
 CConn:       Throughput 20000 kbit/s - changing to hextile encoding
 CConn:       Throughput 20000 kbit/s - changing to full colour

see attachment detail.

Comment 9 Alex Jia 2010-01-05 10:39:49 UTC
Created attachment 381733 [details]
detail information

Comment 10 Daniel Veillard 2010-01-05 14:51:04 UTC
This patch is not for Xen this is limited to Qemu/KVM so 1. xen problem is
normal and expected.

For text attachment please do not use Mime Type  application/octet-stream
as this is a PITA to look at, use plain/text or whatever text Mime description
your prefer. application/octet-stream is for tarballs, or compressed binaries
i.e. impose a download to file to look at stuff, this makes your feedback 
a pain to read !

Daniel

Comment 11 Daniel Veillard 2010-01-05 15:00:17 UTC
From the log:

=============================================================================

[root@dhcp-66-70-62 ~]# /usr/libexec/qemu-kvm -S -M pc -no-kvm -m 1024 -mem-path /dev/hugepages/libvirt/qemu -smp 1 -name rhel5u5 -uuid c0e805d9-e288-e4e8-2357-13b39d57cb6e -no-kvm-pit-reinjection -monitor pty -pidfile /var/run/libvirt/qemu//rhel5u5.pid -boot c -drive file=/var/lib/libvirt/images/rhel5u5.img,if=ide,index=0,boot=on -net none -serial pty -parallel none -usb -vnc 127.0.0.1:0 -k en-us 
Segmentation fault

Note: remove some arguments and set memory to 511M
[root@dhcp-66-70-62 ~]# /usr/libexec/qemu-kvm -m 511 -mem-path /dev/hugepages/libvirt/qemu -smp 1 -name rhel5u5 -uuid c0e805d9-e288-e4e8-2357-13b39d57cb6e -no-kvm-pit-reinjection -monitor pty -pidfile /var/run/libvirt/qemu//rhel5u5.pid -boot c -drive file=/var/lib/libvirt/images/rhel5u5.img,if=ide,index=0,boot=on -net none -serial pty -parallel none -usb -vnc 127.0.0.1:0 -k en-us --daemonize
char device redirected to /dev/pts/7
char device redirected to /dev/pts/8
[root@dhcp-66-70-62 ~]# vncviewer :0

VNC Viewer Free Edition 4.1.2 for X - built Jan 26 2009 11:52:08
Copyright (C) 2002-2005 RealVNC Ltd.
See http://www.realvnc.com for information on VNC.

Tue Jan  5 18:00:47 2010
 CConn:       connected to host localhost port 5900
 CConnection: Server supports RFB protocol version 3.8
 CConnection: Using RFB protocol version 3.8
 TXImage:     Using default colormap and visual, TrueColor, depth 24.
 CConn:       Using pixel format depth 6 (8bpp) rgb222
 CConn:       Using ZRLE encoding
 CConn:       Throughput 20000 kbit/s - changing to hextile encoding
 CConn:       Throughput 20000 kbit/s - changing to full colour
 CConn:       Using pixel format depth 24 (32bpp) little-endian rgb888
 CConn:       Using hextile encoding
.......

[root@dhcp-66-70-62 ~]# tail -n 5 /proc/meminfo 
VmallocChunk: 34359456007 kB
HugePages_Total:   512
HugePages_Free:    282
HugePages_Rsvd:     36
Hugepagesize:     2048 kB

=============================================================================

 It seems that the second command indeed works and the only change I can spot
is that you lowered the amount of RAM used by the guest. Considering that
you allocated 1GB for hugepage (512 x 2MB) it seems that you just exhausted
the availble memory for launching a 1G guest on that machine but that
a 512MB guest still work. Not knowing the amount of RAM on the test box and
without further informations, 1/ libvirt hugepage seems to work 2/ kvm seems
to crash if there isn't enough RAM available on the machine (maybe they
are not talking hugepage into account when computing the free size),
but that one can't be considered a libvirt problem

Daniel

Comment 12 Daniel Berrangé 2010-01-05 15:28:45 UTC
Based on discussions with John Cooper, there could be an argument for including the -mem-prealloc flag to QEMU when launching it with hugepages enabled. This ensures it tries to allocate all required memory immediately, rather than when first used. The latter mode will crash if hugepages aren't available when accessed, while the former should gracefully fallback to non-hugepages

Comment 13 Alex Jia 2010-01-06 03:03:54 UTC
(In reply to comment #10)
> This patch is not for Xen this is limited to Qemu/KVM so 1. xen problem is
> normal and expected.
> 
> For text attachment please do not use Mime Type  application/octet-stream
> as this is a PITA to look at, use plain/text or whatever text Mime description
> your prefer. application/octet-stream is for tarballs, or compressed binaries
> i.e. impose a download to file to look at stuff, this makes your feedback 
> a pain to read !
> 
> Daniel  


Hi,Daniel,I haven't chosen any Mime Type,instead of using default "auto-detect" mode.if bring any inconvenient to you,I am sorry.meanwhile thanks for your help to modify attachment Mime Type.

Comment 14 Alex Jia 2010-01-06 05:33:43 UTC
(In reply to comment #11)
> From the log:
> 
> =============================================================================
> 
> [root@dhcp-66-70-62 ~]# /usr/libexec/qemu-kvm -S -M pc -no-kvm -m 1024
> -mem-path /dev/hugepages/libvirt/qemu -smp 1 -name rhel5u5 -uuid
> c0e805d9-e288-e4e8-2357-13b39d57cb6e -no-kvm-pit-reinjection -monitor pty
> -pidfile /var/run/libvirt/qemu//rhel5u5.pid -boot c -drive
> file=/var/lib/libvirt/images/rhel5u5.img,if=ide,index=0,boot=on -net none
> -serial pty -parallel none -usb -vnc 127.0.0.1:0 -k en-us 
> Segmentation fault
> 
> Note: remove some arguments and set memory to 511M
> [root@dhcp-66-70-62 ~]# /usr/libexec/qemu-kvm -m 511 -mem-path
> /dev/hugepages/libvirt/qemu -smp 1 -name rhel5u5 -uuid
> c0e805d9-e288-e4e8-2357-13b39d57cb6e -no-kvm-pit-reinjection -monitor pty
> -pidfile /var/run/libvirt/qemu//rhel5u5.pid -boot c -drive
> file=/var/lib/libvirt/images/rhel5u5.img,if=ide,index=0,boot=on -net none
> -serial pty -parallel none -usb -vnc 127.0.0.1:0 -k en-us --daemonize
> char device redirected to /dev/pts/7
> char device redirected to /dev/pts/8
> [root@dhcp-66-70-62 ~]# vncviewer :0
> 
> VNC Viewer Free Edition 4.1.2 for X - built Jan 26 2009 11:52:08
> Copyright (C) 2002-2005 RealVNC Ltd.
> See http://www.realvnc.com for information on VNC.
> 
> Tue Jan  5 18:00:47 2010
>  CConn:       connected to host localhost port 5900
>  CConnection: Server supports RFB protocol version 3.8
>  CConnection: Using RFB protocol version 3.8
>  TXImage:     Using default colormap and visual, TrueColor, depth 24.
>  CConn:       Using pixel format depth 6 (8bpp) rgb222
>  CConn:       Using ZRLE encoding
>  CConn:       Throughput 20000 kbit/s - changing to hextile encoding
>  CConn:       Throughput 20000 kbit/s - changing to full colour
>  CConn:       Using pixel format depth 24 (32bpp) little-endian rgb888
>  CConn:       Using hextile encoding
> .......
> 
> [root@dhcp-66-70-62 ~]# tail -n 5 /proc/meminfo 
> VmallocChunk: 34359456007 kB
> HugePages_Total:   512
> HugePages_Free:    282
> HugePages_Rsvd:     36
> Hugepagesize:     2048 kB
> 
> =============================================================================
> 
>  It seems that the second command indeed works and the only change I can spot
> is that you lowered the amount of RAM used by the guest. Considering that
> you allocated 1GB for hugepage (512 x 2MB) it seems that you just exhausted
> the availble memory for launching a 1G guest on that machine but that
> a 512MB guest still work. Not knowing the amount of RAM on the test box and
> without further informations, 1/ libvirt hugepage seems to work 2/ kvm seems
> to crash if there isn't enough RAM available on the machine (maybe they
> are not talking hugepage into account when computing the free size),
> but that one can't be considered a libvirt problem
> 
> Daniel  


Hi,Daniel,libvirt hugepage can't work,on the contrary,kvm will work well if we provide correct argument into qemu-kvm.the above correct result is raised by qemu-kvm operation.

About arguments:
1.it seems that current qemu can't deal with some arguments such as "-S -M pc -no-kvm"
2.Libvirt shouldn't assign full memory to guest such as 1024M(I allocated 512 x 2MB for hugepage),if assign full memory to guest,it will raise the error "alloc_mem_area: can't mmap hugetlbfs pages: Cannot allocate memory",although I am not sure how many memory(maximum) the guest will use,at least it should lower to 1024M,so I think that it is the error that libvirt passes 1024M memory to qemu-kvm. 

Alex

Comment 15 john cooper 2010-01-06 08:06:48 UTC
(In reply to comment #12)
> Based on discussions with John Cooper, there could be an argument for including
> the -mem-prealloc flag to QEMU when launching it with hugepages enabled. This
> ensures it tries to allocate all required memory immediately, rather than when
> first used. The latter mode will crash if hugepages aren't available when
> accessed, while the former should gracefully fallback to non-hugepages  

The preexisting behavior in this scenario was exactly
the same, namely allocation of huge page backed guest
memory during startup would succeed even if insufficient
hugepage memory existed (at the time of setup mmap(2)).
The reverse-engineered rationale being memory was still
allocated upon guest demand and the hugepage resource
was a dynamically varying quantity, possibly increasing
before a guest would demand allocation.

The "-mem-prealloc" wart was intended as a partial solution
to this situation which forces immediate memory commit
resulting in either success or failure.  While admittedly
pessimistic it does avoid the unpredictable segv guest
termination during runtime in the event of a hugepage
allocation failure.

Comment 16 Daniel Veillard 2010-01-06 13:19:36 UTC
Created attachment 381981 [details]
New patch adding -mem-prealloc to kvm command line if using hugepage

The only difference is that this patch add the new option suggested by
John -mem-prealloc when we are using hugepage to avoid the problem raised.

Daniel

Comment 17 Daniel Veillard 2010-01-06 17:33:03 UTC
libvirt-0.6.3-29.el5 has been built in dist-5E-qu-candidate with the fix,

Daniel

Comment 18 Alex Jia 2010-01-07 03:41:52 UTC
Although memory allocation issue has been resolved,but the problem still exist,it seems that the current qemu can't deal with "-S -M pc -no-kvm" argument.

[root@dhcp-66-70-62 virt]# virsh start rhel5u5
error: Failed to start domain rhel5u5
error: internal error unable to start guest: 

[root@dhcp-66-70-62 virt]# cat /var/log/libvirt/qemu/rhel5u5.log 
LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin /usr/libexec/qemu-kvm -S -M pc -no-kvm -m 1024 -mem-prealloc -mem-path /dev/hugepages/libvirt/qemu -smp 1 -name rhel5u5 -uuid c0e805d9-e288-e4e8-2357-13b39d57cb6e -no-kvm-pit-reinjection -monitor pty -pidfile /var/run/libvirt/qemu//rhel5u5.pid -boot c -drive file=/var/lib/libvirt/images/rhel5u5.img,if=ide,index=0,boot=on -net none -serial pty -parallel none -usb -vnc 127.0.0.1:0 -k en-us 

Note:run qemu-kvm cmd directly,the segfault still be raised.
[root@dhcp-66-70-62 virt]# /usr/libexec/qemu-kvm -S -M pc -no-kvm -m 1024 -mem-prealloc -mem-path /dev/hugepages/libvirt/qemu -smp 1 -name rhel5u5 -uuid c0e805d9-e288-e4e8-2357-13b39d57cb6e -no-kvm-pit-reinjection -monitor pty -pidfile /var/run/libvirt/qemu//rhel5u5.pid -boot c -drive file=/var/lib/libvirt/images/rhel5u5.img,if=ide,index=0,boot=on -net none -serial pty -parallel none -usb -vnc 127.0.0.1:0 -k en-us 
Segmentation fault

Note:remove "-S -M pc -no-kvm" argument and keep memory to 1024M
[root@dhcp-66-70-62 virt]# /usr/libexec/qemu-kvm -m 1024 -mem-prealloc -mem-path /dev/hugepages/libvirt/qemu -smp 1 -name rhel5u5 -uuid c0e805d9-e288-e4e8-2357-13b39d57cb6e -no-kvm-pit-reinjection -monitor pty -pidfile /var/run/libvirt/qemu//rhel5u5.pid -boot c -drive file=/var/lib/libvirt/images/rhel5u5.img,if=ide,index=0,boot=on -net none -serial pty -parallel none -usb -vnc 127.0.0.1:0 -k en-us 
char device redirected to /dev/pts/5
char device redirected to /dev/pts/6
......

[root@dhcp-66-70-62 virt]# ps -ef|grep qemu
root      9575  9332 25 11:31 pts/3    00:00:05 /usr/libexec/qemu-kvm -m 1024 -mem-prealloc -mem-path /dev/hugepages/libvirt/qemu -smp 1 -name rhel5u5 -uuid c0e805d9-e288-e4e8-2357-13b39d57cb6e -no-kvm-pit-reinjection -monitor pty -pidfile /var/run/libvirt/qemu//rhel5u5.pid -boot c -drive file=/var/lib/libvirt/images/rhel5u5.img,if=ide,index=0,boot=on -net none -serial pty -parallel none -usb -vnc 127.0.0.1:0 -k en-us

[root@dhcp-66-70-62 ~]# tail -n 5 /proc/meminfo 
VmallocChunk: 34359454979 kB
HugePages_Total:   512
HugePages_Free:    345
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

Comment 19 john cooper 2010-01-07 04:29:10 UTC
Does the problem relate to hugepage usage in the sense the
segfault doesn't exist with "-S -M pc -no-kvm" and
without some combination of -mem-prealloc and/or -mem-path?

qemu sizes memory a bit differently in the case of -no-kvm
but I didn't see anything obviously out of kilter.  Could
you attach a gdb stacktrace from the segv if you still have
the above test at hand?

Comment 20 Alex Jia 2010-01-07 05:04:09 UTC
[root@dhcp-66-70-62 virt]# valgrind /usr/libexec/qemu-kvm -S -M pc -no-kvm -m 1024 -mem-prealloc -mem-path /dev/hugepages/libvirt/qemu -smp 1 -name rhel5u5 -uuid c0e805d9-e288-e4e8-2357-13b39d57cb6e -no-kvm-pit-reinjection -monitor pty -pidfile /var/run/libvirt/qemu//rhel5u5.pid -boot c -drive file=/var/lib/libvirt/images/rhel5u5.img,if=ide,index=0,boot=on -net none -serial pty -parallel none -usb -vnc 127.0.0.1:0 -k en-us 
==9948== Memcheck, a memory error detector
==9948== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==9948== Using Valgrind-3.5.0 and LibVEX; rerun with -h for copyright info
==9948== Command: /usr/libexec/qemu-kvm -S -M pc -no-kvm -m 1024 -mem-prealloc -mem-path /dev/hugepages/libvirt/qemu -smp 1 -name rhel5u5 -uuid c0e805d9-e288-e4e8-2357-13b39d57cb6e -no-kvm-pit-reinjection -monitor pty -pidfile /var/run/libvirt/qemu//rhel5u5.pid -boot c -drive file=/var/lib/libvirt/images/rhel5u5.img,if=ide,index=0,boot=on -net none -serial pty -parallel none -usb -vnc 127.0.0.1:0 -k en-us
==9948== 
==9948== Syscall param timer_create(evp) points to uninitialised byte(s)
==9948==    at 0x338F20492E: timer_create@@GLIBC_2.3.3 (in /lib64/librt-2.5.so)
==9948==    by 0x407EFF: ??? (in /usr/libexec/qemu-kvm)
==9948==    by 0x40CC62: ??? (in /usr/libexec/qemu-kvm)
==9948==    by 0x338DE1D993: (below main) (in /lib64/libc-2.5.so)
==9948==  Address 0x7feffedb4 is on thread 1's stack
==9948== 
==9948== Invalid read of size 4
==9948==    at 0x529F7A: ??? (in /usr/libexec/qemu-kvm)
==9948==    by 0x40D003: ??? (in /usr/libexec/qemu-kvm)
==9948==    by 0x338DE1D993: (below main) (in /lib64/libc-2.5.so)
==9948==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==9948== 
==9948== 
==9948== Process terminating with default action of signal 11 (SIGSEGV)
==9948==  Access not within mapped region at address 0x0
==9948==    at 0x529F7A: ??? (in /usr/libexec/qemu-kvm)
==9948==    by 0x40D003: ??? (in /usr/libexec/qemu-kvm)
==9948==    by 0x338DE1D993: (below main) (in /lib64/libc-2.5.so)
==9948==  If you believe this happened as a result of a stack
==9948==  overflow in your program's main thread (unlikely but
==9948==  possible), you can try to increase the size of the
==9948==  main thread stack using the --main-stacksize= flag.
==9948==  The main thread stack size used in this run was 10485760.
==9948== 
==9948== HEAP SUMMARY:
==9948==     in use at exit: 1,396 bytes in 40 blocks
==9948==   total heap usage: 67 allocs, 27 frees, 8,788 bytes allocated
==9948== 
==9948== LEAK SUMMARY:
==9948==    definitely lost: 0 bytes in 0 blocks
==9948==    indirectly lost: 0 bytes in 0 blocks
==9948==      possibly lost: 301 bytes in 10 blocks
==9948==    still reachable: 1,095 bytes in 30 blocks
==9948==         suppressed: 0 bytes in 0 blocks
==9948== Rerun with --leak-check=full to see details of leaked memory
==9948== 
==9948== For counts of detected and suppressed errors, rerun with: -v
==9948== Use --track-origins=yes to see where uninitialised values come from
==9948== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 4 from 4)
Segmentation fault

Comment 21 Alex Jia 2010-01-07 06:00:00 UTC
(In reply to comment #19)
> Does the problem relate to hugepage usage in the sense the
> segfault doesn't exist with "-S -M pc -no-kvm" and
> without some combination of -mem-prealloc and/or -mem-path?
> 
> qemu sizes memory a bit differently in the case of -no-kvm
> but I didn't see anything obviously out of kilter.  Could
> you attach a gdb stacktrace from the segv if you still have
> the above test at hand?  

Hi,John,I don't clear why "-S -M pc -no-kvm" arguments raise segfault,but when removing these arguments,the guest indeed starting successfully and hugepage works well.

libvirt 0.6.3-28 version don't exist "-mem-prealloc" argument,only have -mem-path argument,but segfault still be raised. so I think that -mem-prealloc and/or -mem-path shouldn't exist combination problem in libvirt 0.6.3-29.

I can't trace any informaion by gdb,so instead of using valgrind,I hope these information is helpful to you.

Comment 22 Alex Jia 2010-01-07 09:56:29 UTC
(In reply to comment #21)
> (In reply to comment #19)
> > Does the problem relate to hugepage usage in the sense the
> > segfault doesn't exist with "-S -M pc -no-kvm" and
> > without some combination of -mem-prealloc and/or -mem-path?
> > 
> > qemu sizes memory a bit differently in the case of -no-kvm
> > but I didn't see anything obviously out of kilter.  Could
> > you attach a gdb stacktrace from the segv if you still have
> > the above test at hand?  
> 
> Hi,John,I don't clear why "-S -M pc -no-kvm" arguments raise segfault,but when
> removing these arguments,the guest indeed starting successfully and hugepage
> works well.
> 
> libvirt 0.6.3-28 version don't exist "-mem-prealloc" argument,only have
> -mem-path argument,but segfault still be raised. so I think that -mem-prealloc
> and/or -mem-path shouldn't exist combination problem in libvirt 0.6.3-29.
> 
> I can't trace any informaion by gdb,so instead of using valgrind,I hope these
> information is helpful to you.  


Some debug information comes from gdb:
Program received signal SIGSEGV, Segmentation fault.
0x0000000000529f7a in snd_pcm_hw_params_set_channels_near ()
(gdb) bt
#0  0x0000000000529f7a in snd_pcm_hw_params_set_channels_near ()
#1  0x000000000040d004 in snd_pcm_hw_params_set_channels_near ()
#2  0x000000338de1d994 in __libc_start_main () from /lib64/libc.so.6
#3  0x0000000000406cc9 in snd_pcm_hw_params_set_channels_near ()
#4  0x00007fffffffe508 in ?? ()
#5  0x0000000000000000 in ?? ()

Comment 23 Daniel Veillard 2010-01-07 14:08:11 UTC
I don't understand why the problem reported in #21 and #22 stay in the way
of this patch:

qemu-kvm help states:

-no-kvm         disable KVM hardware virtualization

  clearly that should never be the case on RHEL-5 we support QEmu only for KVM
  and on platforms with hardware virtualization it's a requirement not an option

-S              freeze CPU at startup (use 'c' to start execution)

  That's just a mode of execution used by libvirt, I don't see how this
  could lead to a memory crash

-M machine      select emulated machine (-M ? for list)

  -M pc should be one of the default options, I don't see how that could
        be related to huge pages


In any case we really should not see -no-kvm in a RHEL-6 environment, that's
wrong and should not block validation of this fix, double check your domain
definition, make sure it's really a KVM domain and please retry !

Daniel

Comment 24 Daniel Berrangé 2010-01-07 15:03:08 UTC
We don't officially support -no-kvm in RHEL, but it is always present in the KVM binaries. If the user sets  type='qemu' in libvirt, then this will be used.

Comment 25 Chris Lalancette 2010-01-07 19:06:01 UTC
What DanB says in Comment #24 is correct.  Using qemu-kvm with the -no-kvm flag is not supported, and would be my guess for the source of the problem.  Hugepages were probably never tested with straight Qemu emulation.

So, if it works *without* -no-kvm, I would say that it is satisfying the request.  What's left is to add the -mem-prealloc flag to RHEL-5 libvirt so we avoid the out-of-memory situation.

Chris Lalancette

Comment 26 Alex Jia 2010-01-08 05:45:55 UTC
(In reply to comment #23)
> I don't understand why the problem reported in #21 and #22 stay in the way
> of this patch:
> 
> qemu-kvm help states:
> 
> -no-kvm         disable KVM hardware virtualization
> 
>   clearly that should never be the case on RHEL-5 we support QEmu only for KVM
>   and on platforms with hardware virtualization it's a requirement not an
> option
> 
> -S              freeze CPU at startup (use 'c' to start execution)
> 
>   That's just a mode of execution used by libvirt, I don't see how this
>   could lead to a memory crash
> 
> -M machine      select emulated machine (-M ? for list)
> 
>   -M pc should be one of the default options, I don't see how that could
>         be related to huge pages
> 
> 
> In any case we really should not see -no-kvm in a RHEL-6 environment, that's
> wrong and should not block validation of this fix, double check your domain
> definition, make sure it's really a KVM domain and please retry !
> 
> Daniel  

Hi,Daniel,If I use a KVM domain/guest,the test result is successful,the guest can be started correctly and hugepage worked well.(see attachment) 

In addition,I have a question about guest type,the current guest type supports qemu and kvm on RHEL5.5,if user sets type='kvm' in libvirt,everything is ok,but if sets type='qemu',-no-kvm will be passed into qemu-kvm by libvirt,if we nothing to do for the type='qemu',user also will meet the previous issue.may be we need to disable qemu type domain or fix the issue.

Comment 27 Alex Jia 2010-01-08 05:50:33 UTC
Created attachment 382393 [details]
kvm domain is successful

Comment 28 Daniel Veillard 2010-01-08 08:26:42 UTC
Only type="kvm" is supported in RHEL (5 or 6 no matter), we really require
hardware virtualization support, fallback to just emulation is way too slow
to be something we support, and as you noted it makes the testing twice as hard.

Final evaluation of hugepages would be related to checking that the speed
improvement on the targeted workload for this feature is actually achieved,
but that's probably outside the realm of the QE team,

Daniel

Comment 29 Alex Jia 2010-01-11 08:37:53 UTC
(In reply to comment #28)
> Only type="kvm" is supported in RHEL (5 or 6 no matter), we really require
> hardware virtualization support, fallback to just emulation is way too slow
> to be something we support, and as you noted it makes the testing twice as
> hard.
> 
> Final evaluation of hugepages would be related to checking that the speed
> improvement on the targeted workload for this feature is actually achieved,
> but that's probably outside the realm of the QE team,
> 
> Daniel    

Hi,Daniel,I know your meaning,we only need to verify guest type="kvm"(without regard to type="qemu") on RHEL 5 or 6 for the hugepage feature,May I set the bug status to VERIFIED?

Comment 30 Daniel Veillard 2010-01-12 16:57:55 UTC
I think yes, the performance team will look at the hugepage too making sure
the expected improvement are showing up,

 thanks,

Daniel

Comment 32 Adam Litke 2010-01-14 16:23:51 UTC
I ran some basic functional tests against the RHEL-5.5 20091227 snapshot and can verify that the feature is working as designed.  Libvirt will successfully create a domain backed with huge pages when there are enough available.  If enough huge pages are not available, a domain backed by regular pages will be created instead.

In addition, I conducted a basic performance test using stream (available from: http://www.cs.virginia.edu/stream/FTP/Code/stream.c).  The results, as expected for this workload, show a clear positive impact from using huge pages.

Regular page backed quest:
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        1951.2191       0.1655       0.1640       0.1720
Scale:       1939.3944       0.1659       0.1650       0.1680
Add:         2105.2638       0.2289       0.2280       0.2300
Triad:       2000.0006       0.2421       0.2400       0.2510


Huge page backed guest:
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        3555.5589       0.0979       0.0900       0.1540
Scale:       3478.2606       0.0996       0.0920       0.1530
Add:         3636.3645       0.1420       0.1320       0.2230
Triad:       2566.8461       0.1967       0.1870       0.2870

Comment 33 Daniel Veillard 2010-01-18 10:09:56 UTC
Excellent, thanks Adam !

Daniel

Comment 35 IBM Bug Proxy 2010-01-19 08:24:10 UTC
Created attachment 385333 [details]
backported patch to 0.6.3

Comment 36 IBM Bug Proxy 2010-01-19 08:24:20 UTC
Created attachment 385334 [details]
detail information

Comment 37 IBM Bug Proxy 2010-01-19 08:24:26 UTC
Created attachment 385335 [details]
New patch adding -mem-prealloc to kvm command line if using hugepage

Comment 38 IBM Bug Proxy 2010-01-19 08:24:33 UTC
Created attachment 385336 [details]
kvm domain is successful

Comment 39 Sanjay Rao 2010-01-25 18:03:16 UTC
Huge pages with libvirt was tested by the performance team with Oracle database workloads and performance improvement up to 30% was seen as memory pressure was increased by increasing user count. This is in line with results seen during tests run with huge pages and qemu options, so the huge pages are being used properly with libvirt.




OS	No huge pages	Libvirt – hugepages	% diff
10U	168785.27	185604.47	        9.96
20U	232233.00	250334.93	        7.79
40U	246727.20	291846.80	        18.29
60U	244680.47	299492.93	        22.4
80U	216268.40	282959.73	        30.84
100U	207184.07	271678.20	        31.13

The 1st column shows the user count and the 2nd column shows numbers without huge pages, 3rd column shows numbers with huge pages using libvirt and the 4th column shows the % difference between column 3 and 4.

Comment 41 IBM Bug Proxy 2010-02-23 21:41:25 UTC
------- Comment From aliguori.com 2010-02-23 16:30 EDT-------
This feature is broken in snap1.  We've created another bugzilla and mirrored it.  Here's the description of that bug:

---Problem Description---
Libvirt does not back kvm guests with huge pages even though
<memoryBacking><hugepages /></memoryBacking> is specified in the domain XML.

There are 2 interesting things to note about this problem:
1.  When looking at the src.rpm, I see that the patch to implement this feature
is indeed present and specified for application.  However, the binary package
does not seem to have the support.

2.  On my system, when I type 'rpm -qa | grep libvirt', I get two matches:
libvirt-0.6.3-20.1.el5_4
libvirt-0.6.3-20.1.el5_4

Contact Information = Adam Litke <agl.com>

Comment 42 IBM Bug Proxy 2010-02-24 15:01:04 UTC
------- Comment From agl.com 2010-02-24 09:51 EDT-------
(In reply to comment #38)
> This feature is broken in snap1.  We've created another bugzilla and mirrored
> it.  Here's the description of that bug:

The previously reported bug was due to an internal error.  This feature remains verified.  Thanks.

Comment 44 errata-xmlrpc 2010-03-30 08:09:19 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0205.html


Note You need to log in before you can comment on or make changes to this bug.