Bug 1316774 - vm failed to start with mlock
vm failed to start with mlock
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt (Show other bugs)
7.3
ppc64le Linux
high Severity high
: rc
: ---
Assigned To: Andrea Bolognani
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-03-10 22:46 EST by Wayne Sun
Modified: 2017-08-01 19:51 EDT (History)
8 users (show)

See Also:
Fixed In Version: libvirt-3.1.0-1.el7
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-08-01 13:09:12 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
vm xml with mlock (2.24 KB, text/plain)
2016-03-11 02:56 EST, Wayne Sun
no flags Details
vm xml with mlock (2.29 KB, text/plain)
2016-03-11 02:59 EST, Wayne Sun
no flags Details

  None (edit)
Description Wayne Sun 2016-03-10 22:46:03 EST
Description of problem:
vm failed to start with mlock

Version-Release number of selected component (if applicable):
# rpm -q libvirt qemu-kvm-rhev kernel
libvirt-1.3.2-1.el7.ppc64le
qemu-kvm-rhev-2.5.0-2.el7.ppc64le
kernel-3.10.0-362.el7.ppc64le

How reproducible:
always

Steps to Reproduce:
1. start a vm with mlock config
# virsh dumpxml avocado-vt-vm1
...
  <memoryBacking>
    <locked/>
  </memoryBacking>
...

# virsh start avocado-vt-vm1
error: Failed to start domain avocado-vt-vm1
error: internal error: process exited while connecting to monitor: mlockall: Cannot allocate memory
2016-03-11T03:14:19.638234Z qemu-kvm: locking memory failed

strace log:
61177 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=61364, si_status=0, si_utime=0, si_stime=0} ---
61451 setrlimit(RLIMIT_MEMLOCK, {rlim_cur=20480*1024, rlim_max=20480*1024}) = 0
61450 +++ exited with 0 +++
61177 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=61450, si_status=0, si_utime=0, si_stime=0} ---
61451 syscall_360(0x48f168d8, 0x1, 0x3ffffd31bed4, 0x48b75b70, 0x48b760d0, 0xffffffffffffffff) = -1 (errno 38)
61460 +++ exited with 1 +++
61461 +++ exited with 1 +++

2.
3.

Actual results:
failed to start vm

Expected results:
succeed

Additional info:
Comment 2 Jiri Denemark 2016-03-11 02:20:02 EST
Could you waste a little bit more disk space and always attach the full XML of the domain? According to the strace log it looks like you tried to set 20 MB memory limit, which is likely not something you wanted to do.
Comment 3 Wayne Sun 2016-03-11 02:56 EST
Created attachment 1135133 [details]
vm xml with mlock

Attached the vm xml, it's only with mlock config.
Yes, 20M is not expected.
Comment 4 Wayne Sun 2016-03-11 02:59 EST
Created attachment 1135134 [details]
vm xml with mlock

sorry, paste wrong one in last comment, pls check the update one.
Comment 5 Jiri Denemark 2016-03-11 03:16:22 EST
I think the documentation is pretty clear at http://libvirt.org/formatdomain.html#elementsMemoryBacking

...
locked
    When set and supported by the hypervisor, memory pages belonging to the domain will be locked in host's memory and the host will not be allowed to swap them out. For QEMU/KVM this requires hard_limit memory tuning element to be used and set to the maximum memory configured for the domain plus any memory consumed by the QEMU process itself.

I'd say it's a user error, but I'm not sure whether we perhaps relaxed the requirement to set hard_limit for Power... Andrea?
Comment 6 Andrea Bolognani 2016-03-21 09:34:41 EDT
(In reply to Jiri Denemark from comment #5)
> I think the documentation is pretty clear at
> http://libvirt.org/formatdomain.html#elementsMemoryBacking
> 
> ...
> locked
>     When set and supported by the hypervisor, memory pages belonging to the
> domain will be locked in host's memory and the host will not be allowed to
> swap them out. For QEMU/KVM this requires hard_limit memory tuning element
> to be used and set to the maximum memory configured for the domain plus any
> memory consumed by the QEMU process itself.
> 
> I'd say it's a user error, but I'm not sure whether we perhaps relaxed the
> requirement to set hard_limit for Power... Andrea?

Not at all, the requirement is the same as x86 and as documented.

The call to setrlimit() the reporter is seeing is because, on
ppc64, some memory has to be locked regardless of whether the
<memoryBacking><locked> element is present or not.

20 MiB make sense here because only some small caches and lookup
tables need to be locked all the time - the bulk of the guest
memory doesn't have such requirement.

So I agree it's user error; on the other hand, maybe we should
take this chance to improve libvirt so that some kind of
meaningful error message is reported whenever the user has
specified <memoryBacking><locked> but <memtune><hardlimit> is
not present in the guest configuration? With much lower
severity and priority, of course :)
Comment 8 Andrea Bolognani 2017-02-06 12:31:35 EST
Patch posted upstream:

  https://www.redhat.com/archives/libvir-list/2017-February/msg00180.html
Comment 9 Andrea Bolognani 2017-02-07 06:38:51 EST
v2 patch posted upstream:

  https://www.redhat.com/archives/libvir-list/2017-February/msg00214.html
Comment 10 Jaroslav Suchanek 2017-02-07 07:15:31 EST
(In reply to Andrea Bolognani from comment #9)
> v2 patch posted upstream:
> 
>   https://www.redhat.com/archives/libvir-list/2017-February/msg00214.html

Was it accepted?
Comment 11 Andrea Bolognani 2017-02-07 08:17:48 EST
Nope :)

But v3 has just been posted upstream:

  https://www.redhat.com/archives/libvir-list/2017-February/msg00227.html
Comment 12 Andrea Bolognani 2017-02-07 12:45:06 EST
The fix has been pushed upstream.

commit c2e60ad0e5124482942164e5fec088157f5e716a
Author: Andrea Bolognani <abologna@redhat.com>
Date:   Mon Feb 6 17:54:49 2017 +0100

    qemu: Forbid <memoryBacking><locked> without <memtune><hard_limit>
    
    In order for memory locking to work, the hard limit on memory
    locking (and usage) has to be set appropriately by the user.
    
    The documentation mentions the requirement already: with this
    patch, it's going to be enforced by runtime checks as well,
    by forbidding a non-compliant guest from being defined as well
    as edited and started.
    
    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1316774

v3.0.0-123-gc2e60ad
Comment 14 Dan Zheng 2017-03-06 04:47:47 EST
Test packages:

libvirt-3.1.0-1.el7.ppc64le
qemu-kvm-rhev-2.8.0-5.el7.ppc64le
kernel-3.10.0-578.el7.ppc64le

Steps:
1. Configure the guest XML with below part:

...
  <memory unit='KiB'>1048576</memory>
  <currentMemory unit='KiB'>1048576</currentMemory>
  <memtune>
    <hard_limit unit='KiB'>2000000</hard_limit>
  </memtune>
  <memoryBacking>
    <locked/>
  </memoryBacking>

...
  <os>
    <type arch='ppc64le' machine='pseries-rhel7.4.0'>hvm</type>
    <boot dev='hd'/>
  </os>

...

2. Start the guest and guest can start up successfully.
3. Log on the guest and normal operations have no problem.
# virsh start dd
Domain dd started

# virsh console dd
Connected to domain dd
Escape character is ^]
CF000012
CF000015ch
Linux ppc64le
#1 SMP Wed Oct 1
Red Hat Enterprise Linux Server 7.3 (Maipo)
Kernel 3.10.0-514.el7.ppc64le on an ppc64le

localhost login: root
Password: 
[root@localhost ~]# ls
original-ks.cfg
[root@localhost ~]# pwd
/root
[root@localhost ~]# 

# virsh destroy dd
Domain dd destroyed


So make it pass.
Comment 15 errata-xmlrpc 2017-08-01 13:09:12 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1846
Comment 16 errata-xmlrpc 2017-08-01 19:51:16 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1846

Note You need to log in before you can comment on or make changes to this bug.