Bug 1046833

Summary:	Warn users against setting memory hard limit too high when used for mlock or rdma-pin-all
Product:	Red Hat Enterprise Linux 7	Reporter:	Qunfang Zhang <qzhang>
Component:	libvirt	Assignee:	Jiri Denemark <jdenemar>
Status:	CLOSED ERRATA	QA Contact:	Virtualization Bugs <virt-bugs>
Severity:	high	Docs Contact:
Priority:	high
Version:	7.0	CC:	dyuan, eblake, fjin, hhuang, jdenemar, juzhang, juzhou, michen, mzhan, pbonzini, quintela, rbalakri, virt-maint, xfu, ydu, zpeng
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	libvirt-2.0.0-1.el7	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-11-03 18:07:24 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1013055, 1138485
Bug Blocks:

Description Qunfang Zhang 2013-12-27 03:45:57 UTC

Description of problem:
Boot up a guest with v-mem equals to or near to host memory. Migrate the guest with rdma protocol and pin all guest memory. Then the qemu will be killed. We'd suggest to have a more friend method to handle such situation, for example, give a prompt, instead of killing the qemu. 

Version-Release number of selected component (if applicable):
kernel-3.10.0-64.el7.x86_64
qemu-kvm-1.5.3-30.el7.x86_64

How reproducible:
ALways

Steps to Reproduce:
1. Boot a guest with v-mem and host p-mem.
Eg:
/usr/libexec/qemu-kvm -cpu SandyBridge -enable-kvm -m 4G -smp 2,sockets=2,cores=1,threads=1 -enable-kvm -name rhel7.0 -uuid 61b6c504-5a8b-4fe1-8347-6c929b750dde -k en-us -rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=input0 -drive file=/mnt/rhel7.0-64-20131222.qcow2,if=none,id=disk0,format=qcow2,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=disk0,id=disk0,bootindex=1  -drive file=/mnt/boot.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,drive=drive-ide0-1-0,bus=ide.1,unit=0,id=cdrom -netdev tap,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=44:37:E6:5E:91:85,bus=pci.0,addr=0x4 -monitor stdio -qmp tcp:0:6666,server,nowait -chardev socket,path=/tmp/isa-serial,server,nowait,id=isa1 -device isa-serial,chardev=isa1,id=isa-serial1 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -chardev socket,id=charchannel0,path=/tmp/serial-socket,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,path=/tmp/foo,server,nowait,id=foo -device virtconsole,chardev=foo,id=console0  -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -vnc :10 -k en-us -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device virtserialport,bus=virtio-serial0.0,chardev=qga0,name=org.qemu.guest_agent.0  -global  PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0

2. Boot the guest on destination host with "-incoming x-rdma:0:5800"

3. Migrate the guest
(qemu) migrate -d x-rdma:192.168.1.3:5800

Actual results:
Qemu in src side is killed.
(qemu) migrate -d x-rdma:192.168.1.3:5800
source_resolve_host RDMA Device opened: kernel name mlx4_0 uverbs device name uverbs0, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs0, infiniband class device path /sys/class/infiniband/mlx4_0, transport: (1) Infiniband
(qemu) Killed


Expected results:
Qemu should not be killed, we should have a more reasonable method to make the guest still living.

Additional info:

[root@localhost ~]# free -m
             total       used       free     shared    buffers     cached
Mem:          3495        335       3160          4          0         35
-/+ buffers/cache:        300       3195
Swap:         3855        114       3741

#cat /proc/cpuinfo
(8 cpu in total, here only list the last one)
Host info:
processor	: 7
vendor_id	: GenuineIntel
cpu family	: 6
model		: 58
model name	: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
stepping	: 9
microcode	: 0x19
cpu MHz		: 1702.390
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 3
cpu cores	: 4
apicid		: 7
initial apicid	: 7
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
bogomips	: 6784.56
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

Comment 2 Paolo Bonzini 2014-01-04 17:04:50 UTC

I think QEMU is being OOM-killed, which means this bugs really cannot be fixed.

Comment 3 Qunfang Zhang 2014-01-06 03:24:29 UTC

(In reply to Paolo Bonzini from comment #2)
> I think QEMU is being OOM-killed, which means this bugs really cannot be
> fixed.

Could we give some prompt in advance when the guest memory is too large and when we set the "x-rdma-pin-all" on?

Comment 4 Qunfang Zhang 2014-01-06 03:29:01 UTC

(In reply to Qunfang Zhang from comment #0)

> Steps to Reproduce:
> 1. Boot a guest with v-mem and host p-mem.

Adding one step here:
Enable "x-rdma-pin-all":
(qemu) migrate_set_capability x-rdma-pin-all on 

> 
> 2. Boot the guest on destination host with "-incoming x-rdma:0:5800"
> 
> 3. Migrate the guest
> (qemu) migrate -d x-rdma:192.168.1.3:5800
>

Comment 5 juzhang 2014-01-06 03:34:47 UTC

(In reply to Qunfang Zhang from comment #3)
> (In reply to Paolo Bonzini from comment #2)
> > I think QEMU is being OOM-killed, which means this bugs really cannot be
> > fixed.
> 
> Could we give some prompt in advance when the guest memory is too large and
> when we set the "x-rdma-pin-all" on?

From KVM QE POV, Even we could not do at qemu-kvm level, should be alerted or controlled in upper management tools level, aborted/killed without message could cause final user loss.

Best Regards,
Junyi

Comment 6 Paolo Bonzini 2014-01-07 10:09:30 UTC

Right, you can move this to libvirt.

Comment 7 Qunfang Zhang 2014-01-08 03:00:39 UTC

(In reply to Paolo Bonzini from comment #6)
> Right, you can move this to libvirt.

Moving to libvirt component for a friendly solution.

Comment 8 Eric Blake 2014-01-08 05:17:00 UTC

Use of x-rdma-pin-all admits that the feature is experimental; libvirt refuses to drive this option.  When rdma-pin-all is made non-experimental and libvirt is enhanced to drive it, we should make sure to avoid letting the user do this, but for now, there is no libvirt bug.  I'm not sure what the best resolution is for this bug.

Comment 9 Jiri Denemark 2014-11-11 13:29:47 UTC

Libvirt requires a user or management application to set memory hard limit to be able to start RDMA migration with rdma-pin-all. And we document that the limit has to be high enough for both guest memory and memory consumed by QEMU. While setting the limit close to host memory size does not do anything bad in general, trying to mlock memory of QEMU with such limit may result in QEMU being killed. However, I don't think there's any way libvirt could check if the limit is OK or already too high (the maximum usable limit with be host memory size minus something). The only think we could do is to document this...

Comment 12 Jiri Denemark 2016-06-30 10:58:06 UTC

Documented upstream by v2.0.0-rc2-7-g60a545f.

Comment 15 Fangge Jin 2016-08-08 07:25:29 UTC

Compare the docs of libvirt-docs-1.2.17-13.el7.x86_64 and libvirt-docs-2.0.0-3.el7.x86_64, a new sentence is added for mlock case in file formatdomain.html, line 1043:

"Beware of setting the memory limit too high (and thus allowing the domain to lock most of the host's memory). Doing so may be dangerous to both the domain and the host itself since the host's kernel may run out of memory."

But I can't find the sentence for rdma-pin-all case, is it missing?

Comment 16 Jiri Denemark 2016-08-10 12:19:40 UTC

The warning is where the documentation talks about setting memory limits, which is correct. However, it looks like we don't really have any documentation specific to RDMA migration that would point to the memory limits section.

Comment 17 Fangge Jin 2016-08-15 02:18:05 UTC

(In reply to Jiri Denemark from comment #16)
> The warning is where the documentation talks about setting memory limits,
> which is correct. However, it looks like we don't really have any
> documentation specific to RDMA migration that would point to the memory
> limits section.

So do you plan to add it?

Comment 18 Jiri Denemark 2016-09-05 11:24:15 UTC

Yes, eventually, but not in 7.3.

Comment 19 Fangge Jin 2016-09-07 06:57:09 UTC

Track the issue in comment 15/16 in a separate bug: Bug 1373783 - Warn users against setting memory hard limit too high or too low when used for rdma-pin-all

Comment 21 errata-xmlrpc 2016-11-03 18:07:24 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2577.html