Bug 674602 - [libvirt] [scale] increase ulimit depending on # of qemu running
Summary: [libvirt] [scale] increase ulimit depending on # of qemu running
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt
Version: 6.1
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Jiri Denemark
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 682015
TreeView+ depends on / blocked
 
Reported: 2011-02-02 15:43 UTC by Haim
Modified: 2014-01-13 00:48 UTC (History)
16 users (show)

Fixed In Version: libvirt-0.8.7-16.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-05-19 13:26:38 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0596 0 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2011-05-18 17:56:36 UTC

Description Haim 2011-02-02 15:43:57 UTC
Description of problem:

libvirt should increase ulimit depending of the number of running qemu processes. 
just hit 674594, where block layer crashes when max user processes limit is reached, and in my host, i had 1644 processes running, and ulimit was set to 1024. 
please note that host had 175 qemu processes running. 

#0  0x00000030b1e329a5 in raise (sig=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00000030b1e34185 in abort () at abort.c:92
#2  0x00000000004819dc in die2 (err=<value optimized out>, what=0x62c0de
"pthread_create") at posix-aio-compat.c:80
#3  0x0000000000481d6c in thread_create (aiocb=0x7f6e2800fa40) at
posix-aio-compat.c:118
#4  spawn_thread (aiocb=0x7f6e2800fa40) at posix-aio-compat.c:379
#5  qemu_paio_submit (aiocb=0x7f6e2800fa40) at posix-aio-compat.c:390
#6  0x0000000000481ecb in paio_submit (bs=<value optimized out>, fd=9,
sector_num=<value optimized out>, qiov=0x7f6e2800f070,
    nb_sectors=<value optimized out>, cb=<value optimized out>,
opaque=0x7f6e2800f010, type=2) at posix-aio-compat.c:584
#7  0x00000000004979d7 in raw_aio_submit (bs=0x135b920, sector_num=65224,
qiov=0x7f6e2800f070, nb_sectors=8, cb=0x4906e0 <qcow_aio_write_cb>,
    opaque=<value optimized out>, type=2) at block/raw-posix.c:546
#8  0x0000000000497a50 in raw_aio_writev (bs=<value optimized out>,
sector_num=<value optimized out>, qiov=<value optimized out>,
    nb_sectors=<value optimized out>, cb=<value optimized out>, opaque=<value
optimized out>) at block/raw-posix.c:562
#9  0x000000000047c641 in bdrv_aio_writev (bs=0x135b920, sector_num=65224,
qiov=0x7f6e2800f070, nb_sectors=8, cb=<value optimized out>,
    opaque=<value optimized out>) at block.c:1923
#10 0x0000000000490902 in qcow_aio_write_cb (opaque=0x7f6e2800f010, ret=0) at
block/qcow2.c:657
#11 0x0000000000490a74 in qcow_aio_writev (bs=<value optimized out>,
sector_num=<value optimized out>, qiov=<value optimized out>,
    nb_sectors=<value optimized out>, cb=<value optimized out>, opaque=<value
optimized out>) at block/qcow2.c:691
#12 0x000000000047c641 in bdrv_aio_writev (bs=0x135b010, sector_num=1352008,
qiov=0x7f6e280606c0, nb_sectors=8, cb=<value optimized out>,
    opaque=<value optimized out>) at block.c:1923
#13 0x000000000047d5dc in bdrv_aio_multiwrite (bs=0x135b010,
reqs=0x7f6e2e33b5f0, num_reqs=<value optimized out>) at block.c:2132
#14 0x000000000041dd9e in do_multiwrite (bs=<value optimized out>,
blkreq=0x7f6e2e33b5f0, num_writes=3)
    at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio-blk.c:236
#15 0x000000000041e448 in virtio_blk_handle_output (vdev=0x13da010, vq=<value
optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio-blk.c:363
#16 0x000000000042af59 in kvm_handle_io (env=0x138e2f0) at
/usr/src/debug/qemu-kvm-0.12.1.2/kvm-all.c:538
#17 kvm_run (env=0x138e2f0) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:975
#18 0x000000000042aff9 in kvm_cpu_exec (env=<value optimized out>) at
/usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1664
#19 0x000000000042bd2f in kvm_main_loop_cpu (_env=0x138e2f0) at
/usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1932
#20 ap_main_loop (_env=0x138e2f0) at
/usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1982
#21 0x00000030b22077e1 in start_thread (arg=0x7f6e2e33c710) at
pthread_create.c:301
#22 0x00000030b1ee153d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:115

# ps -elf|grep root|wc -l
# 1644

# ulimit -u
# 1024

libvirt-0.8.7-3.el6.x86_64
qemu-kvm-0.12.1.2-2.129.el6.x86_64

Comment 1 Eric Blake 2011-03-01 15:02:35 UTC
Calling ulimit from the shell parent process is one approach; another is to have libvirt itself use setrlimit() from <sys/resource.h> at runtime.  However, if libvirt increases the limit at runtime, this doesn't affect any child processes already spawned prior to the increase; it matters if libvirt needs to pass a higher-value fd to a child qemu than the child's ulimit that was inherited from an earlier point under the smaller limit.

Perhaps we need to add a line to libvirtd.conf that lists the ulimit to be set at libvirt startup, prior to creating any qemu children.

Comment 2 Jiri Denemark 2011-03-01 16:01:14 UTC
The limit cannot be increased in runtime since qemu-kvm processes which are already running when we decide to increase the limit will still be running with the old limit and checked against it so they will not be allowed to clone() until number of processes owned by qemu user gets back under the old limit.

IMHO, from libvirt's point of view this should be a documentation issue.

For vdsm, the best solution is to provide and install a file under /etc/security/limits.d/ directory which would set a limit for user qemu (or whatever user qemu-kvm is usued to run as).

Something like the following (you can consult /etc/security/limits.conf for the syntax):

/etc/security/limits.d/99-vdsm.conf:
qemu   soft   nproc   30000

Comment 3 Haim 2011-03-10 14:13:56 UTC
(In reply to comment #2)
> The limit cannot be increased in runtime since qemu-kvm processes which are
> already running when we decide to increase the limit will still be running with
> the old limit and checked against it so they will not be allowed to clone()
> until number of processes owned by qemu user gets back under the old limit.
> 
> IMHO, from libvirt's point of view this should be a documentation issue.
> 
> For vdsm, the best solution is to provide and install a file under
> /etc/security/limits.d/ directory which would set a limit for user qemu (or
> whatever user qemu-kvm is usued to run as).
> 
> Something like the following (you can consult /etc/security/limits.conf for the
> syntax):
> 
> /etc/security/limits.d/99-vdsm.conf:
> qemu   soft   nproc   30000

works - as suggested, created 99-vdsm.conf under /etc/security/limits.d/ with qemu user, restarted vdsmd\libvirtd, and managed to create 190 guests (no qemu unexpected deaths).
also, login to system as qemu user, ulimit -u showed 30000.

Comment 4 Daniel Berrangé 2011-03-10 14:22:23 UTC
> works - as suggested, created 99-vdsm.conf under /etc/security/limits.d/ with
> qemu user, restarted vdsmd\libvirtd, and managed to create 190 guests (no qemu
> unexpected deaths).

I'm struggling to see *how* this worked.  AFAICT, the only thing in Linux which ever loads the rules set in /etc/security/limits.d is the PAM limits module. Nothing in libvirt uses PAM when spawning QEMU processes, so I don't see how the QEMU process will ever see the raised limit.

> also, login to system as qemu user, ulimit -u showed 30000.

This is not a good confirmation of behaviour of QEMU spawned from libvirt. When you login as 'qemu', the login program is using PAM, and thus loading the limits.

Comment 5 Daniel Berrangé 2011-03-10 14:46:00 UTC
What is actually happening is that QEMU is inheriting the ulimit settings from libvirtd, and the settings libvirt has are inherited from the process which started libvirt.

If 'init' started libvirtd at system boot, then (on my Fedora 14 system) it gets  nprocesses=30732

If the admin started libvirtd from a login shell with 'service libvirtd start', then it gets the nprocesses=1024.


If we want to reliably control the limits that QEMU sees, we need to manually call setrlimit() when spawning QEMU.

Comment 6 Daniel Berrangé 2011-03-10 14:47:52 UTC
BTW, to see what QEMU is actually running with you need to look in /proc

  PID=`pgrep qemu`
  grep process /proc/$PID/limits

Comment 7 Haim 2011-03-11 13:25:12 UTC
(In reply to comment #6)
> BTW, to see what QEMU is actually running with you need to look in /proc
> 
>   PID=`pgrep qemu`
>   grep process /proc/$PID/limits

well, you are correct, as you anticipated, it has only 1024. 
probably running 185 gusts is not the real reason for qemu to consume its limits. 

[root@rhev-i32c-01 core]# cat /proc/21678/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            10485760             unlimited            bytes
Max core file size        unlimited            unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             515177               515177               processes
Max open files            1024                 1024                 files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       515177               515177               signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

Comment 8 Jiri Denemark 2011-04-05 14:13:11 UTC
Patch proposed upstream: https://www.redhat.com/archives/libvir-list/2011-April/msg00245.html

Comment 12 Vivian Bian 2011-04-12 10:27:36 UTC
Hi Eric , 
Would you please help check the steps to verify this bug . If it is correct ,
I'll move the bug status to VERIFIED . 

Thanks
Vivian 


tested with libvirt-0.8.7-16.el6.x86_64 

with libvirt-0.8.7-15.el6.x86_64

in /etc/libvirt/qemu.conf file there isn't the 
# If max_processes is set to a positive integer, libvirt will use it to set
# maximum number of processes that can be run by qemu user. This can be used to
# override default value set by host OS.
#
# max_processes = 0


But with libvirt-0.8.7-16.el6.x86_64 we got the "max_processes" option  

[Steps to check up the function]
1. set max_processes = 1 
2. virsh start guest (get a hang of course)
   # cat /proc/60741/limits 
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            10485760             unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             1                    1                    processes 
Max open files            1024                 1024                 files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       8247730              8247730              signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us   

3.  set max_process = 5
4.  virsh start guest (could start up guest)

5. leave the max_processes = 0 commented , and restart libvirtd , 
the default 1024 max processes will be set , and I got 257 guest started , and
worked well on my testing machine . 

So we can manually set the ulimit depending on # , and make the qemu-kvm guest
run successfully .

Comment 13 Eric Blake 2011-04-12 13:59:26 UTC
(In reply to comment #12)
> Hi Eric , 
> Would you please help check the steps to verify this bug . If it is correct ,
> I'll move the bug status to VERIFIED . 

> 1. set max_processes = 1 
> 2. virsh start guest (get a hang of course)
>    # cat /proc/60741/limits 

> Max processes             1                    1                    processes 

> 
> 3.  set max_process = 5
> 4.  virsh start guest (could start up guest)
> 
> 5. leave the max_processes = 0 commented , and restart libvirtd , 
> the default 1024 max processes will be set , and I got 257 guest started , and
> worked well on my testing machine . 

Looks like a good test to me.

Comment 14 Vivian Bian 2011-04-13 01:54:28 UTC
According to comment #12 and comment #13 . Set this bug status to VERIFIED .

Comment 15 Vivian Bian 2011-04-19 07:43:24 UTC
tested with libvirt-0.8.7-18.el6.x86_64
In /etc/libvirt/qemu.conf file , there is the  set max_processes = 1 . Remove the comment # , and restart libvirtd , it could make effort on qemu process ulimit number . 

So keep the VERIFIED status

Comment 18 errata-xmlrpc 2011-05-19 13:26:38 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0596.html


Note You need to log in before you can comment on or make changes to this bug.