Bug 771424

Summary: RFE: Resident Set Size (RSS) limits on qemu guests
Product: Red Hat Enterprise Linux 6 Reporter: Avi Kivity <avi>
Component: libvirtAssignee: Michal Privoznik <mprivozn>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.4CC: acathrow, ajia, aliguori, berrange, dallan, dyasny, dyuan, eblake, juzhang, knoel, mprivozn, mzhan, rjones, rwu, zhpeng
Target Milestone: rcKeywords: FutureFeature
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-0.10.0-0rc1.el6 Doc Type: Enhancement
Doc Text:
Feature: Set reasonable limit for RSS by default. Reason: The RSS limits controls how much RAM can a process use. If there is a leak in process, this limit will not let it influence other processes within the system. Result (if any): RSS limit is guessed based on how much RAM and video RAM is configured for a domain.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-21 07:07:13 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Avi Kivity 2012-01-03 17:47:15 UTC
Description of problem:

Currently, if qemu has a memory leak, it will fill up memory and potentially push other guests into swap.

Version-Release number of selected component (if applicable):
libvirt-0.8.7-18.el6_1.4.x86_64

How reproducible:
Difficult

Steps to Reproduce:
1. Find a guest exploitable memory leak in qemu 
2. Exploit it repeatedly
  
Actual results:
System thrashes

Expected results:
Guest killed

Additional info:

Suggest setting RSS limit of (1+k) * (guest memory) + F, where k = 0.02 and F = 200MB.

Comment 2 Daniel Berrangé 2012-01-03 17:54:05 UTC
We already expose 4 memory limit tunables to applications using libvirt:

 * VIR_DOMAIN_MEMORY_HARD_LIMIT: Macro for the memory tunable hard_limit: it represents the maximum memory the guest can use, as a ullong.

 * VIR_DOMAIN_MEMORY_SOFT_LIMIT: Macro for the memory tunable soft_limit: it represents the memory upper limit enforced during memory contention, as a ullong.

 * VIR_DOMAIN_MEMORY_MIN_GUARANTEE: Macro for the memory tunable min_guarantee: it represents the minimum memory guaranteed to be reserved for the guest, as a ullong.

 * VIR_DOMAIN_MEMORY_SWAP_HARD_LIMIT: Macro for the swap tunable swap_hard_limit: it represents the maximum swap plus memory the guest can use, as a ullong. This limit has to be more than VIR_DOMAIN_MEMORY_HARD_LIMIT.


We do not, however, set any memory limit by default.

Comment 3 Eric Blake 2012-01-03 17:56:06 UTC
(In reply to comment #2)
> We already expose 4 memory limit tunables to applications using libvirt:

Yes, but also we currently enforce those only through cgroups.  It might be _also_ worth enforcing VIR_DOMAIN_MEMORY_HARD_LIMIT via RSS even on systems where cgroups is not enabled.

> 
> We do not, however, set any memory limit by default.

Agreed - it is up to the guest XML to use <memtune> properly:
http://libvirt.org/formatdomain.html#elementsMemoryTuning

Comment 4 Eric Blake 2012-01-03 17:58:14 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > We already expose 4 memory limit tunables to applications using libvirt:
> 
> Yes, but also we currently enforce those only through cgroups.  It might be
> _also_ worth enforcing VIR_DOMAIN_MEMORY_HARD_LIMIT via RSS even on systems
> where cgroups is not enabled.

Or, add yet another memory tunable so that the RSS and cgroups limitations can be independent.

> 
> > 
> > We do not, however, set any memory limit by default.
> 
> Agreed - it is up to the guest XML to use <memtune> properly:
> http://libvirt.org/formatdomain.html#elementsMemoryTuning

At any rate, this bug should remain open until we decide which tunable (as exposed under <memtune> in the XML) is tied to an RSS limit, but you may need to spawn another bug against any management app that isn't using <memtune>.

Comment 5 Avi Kivity 2012-01-03 18:15:12 UTC
Can we not set a reasonable default?  Better to be secure by default.

Also, we should consider doing it in qemu itself.

Comment 6 Daniel Berrangé 2012-01-03 18:17:16 UTC
> > We already expose 4 memory limit tunables to applications using libvirt:
>
> Yes, but also we currently enforce those only through cgroups.  It might be
> _also_ worth enforcing VIR_DOMAIN_MEMORY_HARD_LIMIT via RSS even on systems
> where cgroups is not enabled.

How would it be enforced if cgroups aren't available ?  ulimit can't be used to limit RSS on Linux

[quote src="setrlimit(2)"]
  RLIMIT_RSS
       Specifies the limit (in pages) of the  process's  resident  set  (the
       number of virtual pages resident in RAM).  This limit only has effect
       in Linux 2.4.x, x < 30, and there only affects  calls  to  madvise(2)
       specifying MADV_WILLNEED.
[/quote]

IMHO, mandating use of cgroups for these kind of memory limits is fine.

Comment 7 Daniel Berrangé 2012-01-03 18:18:44 UTC
> Can we not set a reasonable default?  Better to be secure by default.

If we're reasonably conservative, I think we could set a limit in cgroups for this, without risk of breaking stuff / hurting performance. Apps that care could easily override it with stricter limits if they desire.

Comment 10 Michal Privoznik 2012-07-17 16:46:04 UTC
Patch proposed upstream:

https://www.redhat.com/archives/libvir-list/2012-July/msg00687.html

Comment 11 Michal Privoznik 2012-08-03 09:54:28 UTC
Yet another version:

https://www.redhat.com/archives/libvir-list/2012-August/msg00167.html

Obviously, we have to agree on what's 'reasonable limit' to be set by default as choosing it wrong may hurt performance.

Comment 12 Michal Privoznik 2012-08-06 06:23:09 UTC
Pushed upstream, hence moving to POST:

commit addeb7cd0502b8d69c9f50c97e87f4563ddbe25a
Author:     Michal Privoznik <mprivozn>
AuthorDate: Tue Jul 17 18:38:47 2012 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Mon Aug 6 08:06:44 2012 +0200

    qemu: Set reasonable RSS limit on domain startup
    
    If there's a memory leak in qemu or qemu is exploited the host's
    system will sooner or later start trashing instead of killing
    the bad process. This however has impact on performance and other
    guests as well. Therefore we should set a reasonable RSS limit
    even when user hasn't set any. It's better to be secure by default.

v0.10.0-rc0-35-gaddeb7c

Comment 15 Alex Jia 2012-08-23 09:41:17 UTC
I just verified this from libvirt POV, it's okay for me:

1. WRT libvirt-0.10.0-0rc0.el6

# virsh start foo
Domain foo started

# virsh memtune foo
hard_limit     : unlimited
soft_limit     : unlimited
swap_hard_limit: unlimited

Notes, 'hard_limit' is 'unlimited'.

2. WRT libvirt-0.10.0-0rc1.el6

# virsh start foo
Domain foo started

# virsh dominfo foo
Id:             2
Name:           foo
UUID:           492d9815-7eba-3d09-d857-63b1dc423ec7
OS Type:        hvm
State:          running
CPU(s):         1
CPU time:       0.5s
Max memory:     1048576 KiB
Used memory:    1048576 KiB
Persistent:     yes
Autostart:      disable
Managed save:   no
Security model: dac
Security DOI:   0
Security label: unconfined_u:system_r:svirt_t:s0:c351,c494 (enforcing)

Notes, domain memory is '1048576 KiB'.

# virsh dumpxml foo
<domain type='kvm'>
  ......
    <video>
      <model type='cirrus' vram='9216' heads='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
  ......
</domain>

Notes, total video memory is '9216' KiB.

# virsh memtune foo
hard_limit     : 1283748
soft_limit     : unlimited
swap_hard_limit: unlimited

Notes, hard_limit = (1 + k) * (domain memory + total video memory) + F, where k = 0.02 and F = 200MB.  so hard_limit = 1283747.8400000001, the '1283748' is correct in here.

Michal, Is it enough for you? I haven't test it with qemu memory leak scenario, because it's a little hard to find a actual qemu memory leak then verify this for me.

Thanks,
Alex

Comment 16 Michal Privoznik 2012-08-23 09:58:01 UTC
Alex,

yeah, that is actually the exact way I've tested this too because - you're right - it's way too hard to find usable mem leak in qemu. Anyway, if you can find the correct value in /sys/fs/cgroup/memory/libvirt/qemu/f17/memory.limit_in_bytes you can set this bug to VERIFIED as it proves limit is set. Note that actual path may change depending where you have cgroups mounted, and of course substitute f17 with actual domain name.

Comment 17 Alex Jia 2012-08-23 10:05:43 UTC
(In reply to comment #16)
> Alex,
> 
> yeah, that is actually the exact way I've tested this too because - you're
> right - it's way too hard to find usable mem leak in qemu. Anyway, if you
> can find the correct value in
> /sys/fs/cgroup/memory/libvirt/qemu/f17/memory.limit_in_bytes you can set
> this bug to VERIFIED as it proves limit is set. Note that actual path may
> change depending where you have cgroups mounted, and of course substitute
> f17 with actual domain name.

Michal, thanks for your comment, I know this :)

# cgget -nr memory.limit_in_bytes libvirt/qemu/foo
memory.limit_in_bytes: 1314557952

Notes, 1283748(KiB)*1024 = 1314557952(B)

So move this bug to VERIFIED status.

Comment 19 errata-xmlrpc 2013-02-21 07:07:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0276.html

Comment 20 Richard W.M. Jones 2013-05-24 12:44:16 UTC
FYI: Setting a limit has caused two bugs:
bug 903432, bug 966939