Bug 1046833
| Summary: | Warn users against setting memory hard limit too high when used for mlock or rdma-pin-all | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Qunfang Zhang <qzhang> |
| Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> |
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 7.0 | CC: | dyuan, eblake, fjin, hhuang, jdenemar, juzhang, juzhou, michen, mzhan, pbonzini, quintela, rbalakri, virt-maint, xfu, ydu, zpeng |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | libvirt-2.0.0-1.el7 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-11-03 18:07:24 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1013055, 1138485 | ||
| Bug Blocks: | |||
I think QEMU is being OOM-killed, which means this bugs really cannot be fixed. (In reply to Paolo Bonzini from comment #2) > I think QEMU is being OOM-killed, which means this bugs really cannot be > fixed. Could we give some prompt in advance when the guest memory is too large and when we set the "x-rdma-pin-all" on? (In reply to Qunfang Zhang from comment #0) > Steps to Reproduce: > 1. Boot a guest with v-mem and host p-mem. Adding one step here: Enable "x-rdma-pin-all": (qemu) migrate_set_capability x-rdma-pin-all on > > 2. Boot the guest on destination host with "-incoming x-rdma:0:5800" > > 3. Migrate the guest > (qemu) migrate -d x-rdma:192.168.1.3:5800 > (In reply to Qunfang Zhang from comment #3) > (In reply to Paolo Bonzini from comment #2) > > I think QEMU is being OOM-killed, which means this bugs really cannot be > > fixed. > > Could we give some prompt in advance when the guest memory is too large and > when we set the "x-rdma-pin-all" on? From KVM QE POV, Even we could not do at qemu-kvm level, should be alerted or controlled in upper management tools level, aborted/killed without message could cause final user loss. Best Regards, Junyi Right, you can move this to libvirt. (In reply to Paolo Bonzini from comment #6) > Right, you can move this to libvirt. Moving to libvirt component for a friendly solution. Use of x-rdma-pin-all admits that the feature is experimental; libvirt refuses to drive this option. When rdma-pin-all is made non-experimental and libvirt is enhanced to drive it, we should make sure to avoid letting the user do this, but for now, there is no libvirt bug. I'm not sure what the best resolution is for this bug. Libvirt requires a user or management application to set memory hard limit to be able to start RDMA migration with rdma-pin-all. And we document that the limit has to be high enough for both guest memory and memory consumed by QEMU. While setting the limit close to host memory size does not do anything bad in general, trying to mlock memory of QEMU with such limit may result in QEMU being killed. However, I don't think there's any way libvirt could check if the limit is OK or already too high (the maximum usable limit with be host memory size minus something). The only think we could do is to document this... Documented upstream by v2.0.0-rc2-7-g60a545f. Compare the docs of libvirt-docs-1.2.17-13.el7.x86_64 and libvirt-docs-2.0.0-3.el7.x86_64, a new sentence is added for mlock case in file formatdomain.html, line 1043: "Beware of setting the memory limit too high (and thus allowing the domain to lock most of the host's memory). Doing so may be dangerous to both the domain and the host itself since the host's kernel may run out of memory." But I can't find the sentence for rdma-pin-all case, is it missing? The warning is where the documentation talks about setting memory limits, which is correct. However, it looks like we don't really have any documentation specific to RDMA migration that would point to the memory limits section. (In reply to Jiri Denemark from comment #16) > The warning is where the documentation talks about setting memory limits, > which is correct. However, it looks like we don't really have any > documentation specific to RDMA migration that would point to the memory > limits section. So do you plan to add it? Yes, eventually, but not in 7.3. Track the issue in comment 15/16 in a separate bug: Bug 1373783 - Warn users against setting memory hard limit too high or too low when used for rdma-pin-all Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2577.html |
Description of problem: Boot up a guest with v-mem equals to or near to host memory. Migrate the guest with rdma protocol and pin all guest memory. Then the qemu will be killed. We'd suggest to have a more friend method to handle such situation, for example, give a prompt, instead of killing the qemu. Version-Release number of selected component (if applicable): kernel-3.10.0-64.el7.x86_64 qemu-kvm-1.5.3-30.el7.x86_64 How reproducible: ALways Steps to Reproduce: 1. Boot a guest with v-mem and host p-mem. Eg: /usr/libexec/qemu-kvm -cpu SandyBridge -enable-kvm -m 4G -smp 2,sockets=2,cores=1,threads=1 -enable-kvm -name rhel7.0 -uuid 61b6c504-5a8b-4fe1-8347-6c929b750dde -k en-us -rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=input0 -drive file=/mnt/rhel7.0-64-20131222.qcow2,if=none,id=disk0,format=qcow2,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=disk0,id=disk0,bootindex=1 -drive file=/mnt/boot.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,drive=drive-ide0-1-0,bus=ide.1,unit=0,id=cdrom -netdev tap,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=44:37:E6:5E:91:85,bus=pci.0,addr=0x4 -monitor stdio -qmp tcp:0:6666,server,nowait -chardev socket,path=/tmp/isa-serial,server,nowait,id=isa1 -device isa-serial,chardev=isa1,id=isa-serial1 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -chardev socket,id=charchannel0,path=/tmp/serial-socket,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,path=/tmp/foo,server,nowait,id=foo -device virtconsole,chardev=foo,id=console0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -vnc :10 -k en-us -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device virtserialport,bus=virtio-serial0.0,chardev=qga0,name=org.qemu.guest_agent.0 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 2. Boot the guest on destination host with "-incoming x-rdma:0:5800" 3. Migrate the guest (qemu) migrate -d x-rdma:192.168.1.3:5800 Actual results: Qemu in src side is killed. (qemu) migrate -d x-rdma:192.168.1.3:5800 source_resolve_host RDMA Device opened: kernel name mlx4_0 uverbs device name uverbs0, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs0, infiniband class device path /sys/class/infiniband/mlx4_0, transport: (1) Infiniband (qemu) Killed Expected results: Qemu should not be killed, we should have a more reasonable method to make the guest still living. Additional info: [root@localhost ~]# free -m total used free shared buffers cached Mem: 3495 335 3160 4 0 35 -/+ buffers/cache: 300 3195 Swap: 3855 114 3741 #cat /proc/cpuinfo (8 cpu in total, here only list the last one) Host info: processor : 7 vendor_id : GenuineIntel cpu family : 6 model : 58 model name : Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz stepping : 9 microcode : 0x19 cpu MHz : 1702.390 cache size : 8192 KB physical id : 0 siblings : 8 core id : 3 cpu cores : 4 apicid : 7 initial apicid : 7 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms bogomips : 6784.56 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: