Bug 1043631
| Summary: | A 'Cannot allocate memory' error from qemu-kvm-rhev is not caught or reported by libvirtd to vdsmd. | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Lee Yarwood <lyarwood> |
| Component: | libvirt | Assignee: | Peter Krempa <pkrempa> |
| Status: | CLOSED WONTFIX | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 6.4 | CC: | acathrow, codong, dyuan, juzhang, lyarwood, pkrempa, zhwang |
| Target Milestone: | rc | Keywords: | Upstream |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2014-04-04 21:27:04 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Upstream improves the error message with commits:
commit 59e21e973fdbfc9065ff5aa421ae36dbd1c4073a
Author: Peter Krempa <pkrempa>
Date: Mon Sep 30 11:47:47 2013 +0200
qemu: process: Silence coverity warning when rewinding log file
The change in ef29de14c37d14abc546e90555a0093797facfdd that introduced
better error logging from qemu introduced a warning from coverity about
unused return value from lseek. Silence this warning and fix typo in the
corresponding error message.
Reported by: John Ferlan
commit ef29de14c37d14abc546e90555a0093797facfdd
Author: Peter Krempa <pkrempa>
Date: Wed Sep 18 16:23:14 2013 +0200
qemu: Wire up better early error reporting
The previous patches added infrastructure to report better errors from
monitor in some cases. This patch finalizes this "feature" by enabling
this enhanced error reporting on early phases of VM startup. In these
phases the possibility of qemu producing a useful error message is
really high compared to running it during the whole life cycle. After
the start up is complete, the feature is disabled to provide the usual
error messages so that users are not confused by possibly irrelevant
messages that may be in the domain log.
The original motivation to do this enhancement is to capture errors when
using VFIO device passthrough, where qemu reports errors after the
monitor is initialized and the existing error catching code couldn't
catch this producing a unhelpful message:
# virsh start test
error: Failed to start domain test
error: Unable to read from monitor: Connection reset by peer
With this change, the message is changed to:
# virsh start test
error: Failed to start domain test
error: internal error: early end of file from monitor: possible problem:
qemu-system-x86_64: -device vfio-pci,host=00:1a.0,id=hostdev0,bus=pci.0,addr=0x5: vfio: error, group 8 is not viable, please ensure all devices within the iommu_group are bound to their vfio bus driver.
qemu-system-x86_64: -device vfio-pci,host=00:1a.0,id=hostdev0,bus=pci.0,addr=0x5: vfio: failed to get group 8
qemu-system-x86_64: -device vfio-pci,host=00:1a.0,id=hostdev0,bus=pci.0,addr=0x5: Device 'vfio-pci' could not be initialized
commit 90139a6236eb20a5cd2595af39ea11adae7d54de
Author: Peter Krempa <pkrempa>
Date: Wed Sep 18 16:17:39 2013 +0200
qemu: monitor: Produce better errors on monitor hangup
Change the monitor error code to add the ability to access the qemu log
file using a file descriptor so that we can dig in it for a more useful
error message. The error is now logged on monitor hangups and overwrites
a possible lesser error. A hangup on the monitor usualy means that qemu
has crashed and there's a significant chance it produced a useful error
message.
The functionality will be latent until the next patch.
commit 8519e9ecdcc26f9753dbd85e897daabbaa82dee8
Author: Peter Krempa <pkrempa>
Date: Wed Sep 18 16:12:17 2013 +0200
qemu: monitor: Add infrastructure to access VM logs for better err msgs
Early VM startup errors usually produce a better error message in the
machine log file. Currently we were accessing it only when the process
exited during certain phases of startup. This will help adding a more
comprehensive error extraction for early qemu startup phases.
This patch adds infrastructure to keep a file descriptor for the machine
log file that will be used in case an error happens.
commit 310651a5e346b23db9015061452b1887335aed67
Author: Peter Krempa <pkrempa>
Date: Wed Sep 18 14:43:52 2013 +0200
qemu_process: Make qemuProcessReadLog() more versatile and reusable
Teach the function to skip character device definitions printed by qemu
at startup in addition to libvirt log messages and make it usable from
outside of qemu_process.c. Also add documentation about the func.
I cannot reproduce this with: # rpm -qa | egrep '(libvirt|vdsm|kvm)' vdsm-4.10.2-27.0.el6ev.x86_64 vdsm-cli-4.10.2-27.0.el6ev.noarch vdsm-bootstrap-4.10.2-27.0.el6ev.noarch libvirt-java-0.4.9-1.el6.noarch vdsm-hook-faqemu-4.10.2-27.0.el6ev.noarch vdsm-tests-4.10.2-27.0.el6ev.noarch qemu-kvm-rhev-debuginfo-0.12.1.2-2.355.el6_4.7.x86_64 libvirt-snmp-0.0.2-3.el6.x86_64 vdsm-xmlrpc-4.10.2-27.0.el6ev.noarch qemu-kvm-rhev-0.12.1.2-2.355.el6_4.7.x86_64 vdsm-hook-vhostmd-4.10.2-27.0.el6ev.noarch vdsm-debuginfo-4.10.2-27.0.el6ev.x86_64 vdsm-reg-4.10.2-27.0.el6ev.noarch vdsm-python-4.10.2-27.0.el6ev.x86_64 libvirt-0.10.2-18.el6_4.9.x86_64 libvirt-lock-sanlock-0.10.2-18.el6_4.9.x86_64 libvirt-python-0.10.2-18.el6_4.9.x86_64 libvirt-debuginfo-0.10.2-18.el6_4.9.x86_64 vdsm-debug-plugin-4.10.2-27.0.el6ev.noarch libvirt-client-0.10.2-18.el6_4.9.x86_64 qemu-kvm-tools-0.12.1.2-2.355.el6.x86_64 vdsm-hook-qemucmdline-4.10.2-27.0.el6ev.noarch libvirt-devel-0.10.2-18.el6_4.9.x86_64 Steps: 1. Install a rhel guest 2. Shutdown the guest and configure the memery to 820000MB 3. Try to start the guest Result: Cannot start the guest and got error message: error: Failed to start domain rhel6u4 error: internal error Process exited while reading console log output: char device redirected to /dev/pts/6 Failed to allocate 859832320000 B: Cannot allocate memory # cat /var/log/libvirt/qemu/rhel6u4.log LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name rhel6u4 -S -M rhel6.4.0 -enable-kvm -m 820000 -smp 1,sockets=1,cores=1,threads=1 -uuid 8e05273c-2adc-d344-0390-f8ffb01aaff9 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/rhel6u4.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive file=/var/lib/libvirt/images/rhel6u4.img,if=none,id=drive-virtio-disk0,format=raw,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:44:a4:06,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -device usb-tablet,id=input0 -spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 char device redirected to /dev/pts/6 Failed to allocate 859832320000 B: Cannot allocate memory 2014-02-25 05:51:49.230+0000: shutting down #cat /var/log/libvirt/libvirtd.log 2014-02-25 05:53:23.700+0000: 18383: error : qemuProcessReadLogOutput:1473 : internal error Process exited while reading console log output: char device redirected to /dev/pts/6 Failed to allocate 859832320000 B: Cannot allocate memory As the result, libvirt catch and report the error as qemu successfully. You still can reproduce the problem now? (In reply to CongDong from comment #4) > As the result, libvirt catch and report the error as qemu successfully. > You still can reproduce the problem now? I believe I had the same error logged in libvirtd.log, the issue was that vdsm.log only contained a qemu monitor connection error. The env in question has been upgraded now so I can't easily reproduce this. Development Management has reviewed and declined this request. You may appeal this decision by reopening this request. |
Description of problem: When starting a guest within RHEV on a host without sufficient memory qemu-kvm-rhev reports the following error : LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name tvm-lyarwood-rhevm33 -S -M rhel6.4.0 -cpu Nehalem -enable-kvm -m 8200 -smp 2,sockets=2,cores=1,threads=1 -uuid 94e463f6-52d2-401a-a27c-b4e752840c50 -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=6Server-6.4.0.4.el6,serial=72E1C5CB-85CC-11E1-AF97-D345A24BC104_34:40:b5:9f:f9:3c,uuid=94e463f6-52d2-401a-a27c-b4e752840c50 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/tvm-lyarwood-rhevm33.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2013-12-16T18:05:15,driftfix=slew -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,serial= -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/rhev/data-center/dab6c34c-51a9-4e02-92de-4489a307ce17/7e082516-faf5-428c-bc06-c4e5ba82b6ad/images/40bf1147-1c26-4e56-8dc1-4c7c62b0fd1e/e3c5ccfb-8747-42bb-ae94-326ac469541b,if=none,id=drive-virtio-disk0,format=qcow2,serial=40bf1147-1c26-4e56-8dc1-4c7c62b0fd1e,cache=none,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=36,id=hostnet0,vhost=on,vhostfd=39 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:21:08:63,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/tvm-lyarwood-rhevm33.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/tvm-lyarwood-rhevm33.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -spice port=5906,tls-port=5907,addr=0,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -k en-us -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 Failed to allocate 8598323200 B: Cannot allocate memory 2013-12-16 18:05:17.585+0000: shutting down However libvirtd only reports a loss of connectivity to the underlying qemu process as the error : Thread-975231::ERROR::2013-12-16 13:05:18,336::vm::710::vm.Vm::(_startUnderlyingVm) vmId=`94e463f6-52d2-401a-a27c-b4e752840c50`::The vm start process failed Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 670, in _startUnderlyingVm self._run() File "/usr/share/vdsm/libvirtvm.py", line 1535, in _run self._connection.createXML(domxml, flags), File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 104, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2645, in createXML if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self) libvirtError: Unable to read from monitor: Connection reset by peer I'm not sure if libvirt is really at fault here so feel free to move up or down the stack as you see fit. Version-Release number of selected component (if applicable): # rpm -qa | egrep '(libvirt|vdsm|kvm)' libvirt-client-0.10.2-18.el6_4.9.x86_64 libvirt-python-0.10.2-18.el6_4.9.x86_64 libvirt-lock-sanlock-0.10.2-18.el6_4.9.x86_64 libvirt-0.10.2-18.el6_4.9.x86_64 qemu-kvm-rhev-0.12.1.2-2.355.el6_4.7.x86_64 qemu-kvm-rhev-tools-0.12.1.2-2.355.el6_4.7.x86_64 vdsm-cli-4.10.2-27.0.el6ev.noarch vdsm-xmlrpc-4.10.2-27.0.el6ev.noarch vdsm-4.10.2-27.0.el6ev.x86_64 vdsm-python-4.10.2-27.0.el6ev.x86_64 How reproducible: Always in my env. Steps to Reproduce: 1. Attempt to start a guest on a host without sufficient memory. 2. 'Unable to read from monitor: Connection reset by peer' reported back to vdsmd from libvirtd. Actual results: 'Unable to read from monitor: Connection reset by peer' reported back to vdsmd from libvirtd. Expected results: 'Cannot allocate memory' reported back to vdsmd from libvirtd. Additional info: I've not had time to verify this on 6.5 as yet.