Bug 1481595
Summary: | [7.4-Alt] Unable to execute QEMU command 'dump-guest-memory': dump: failed to save memory | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | yilzhang | ||||
Component: | qemu-kvm-rhev | Assignee: | Laurent Vivier <lvivier> | ||||
Status: | CLOSED ERRATA | QA Contact: | Minjia Cai <micai> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 7.4-Alt | CC: | abologna, bugproxy, coli, dgibson, gchakkar, hannsj_uhl, knoel, lvivier, michen, mrezanin, ngu, qzhang, virt-maint | ||||
Target Milestone: | rc | Keywords: | Patch, ZStream | ||||
Target Release: | 7.6 | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | qemu-kvm-rhev-2.12.0-1.el7 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1572554 (view as bug list) | Environment: | |||||
Last Closed: | 2018-11-01 11:01:10 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1513404, 1528344, 1572554, 1578741 | ||||||
Attachments: |
|
Description
yilzhang
2017-08-15 07:09:21 UTC
Power8 + qemu-kvm-rhev doesn't have this issue. Host kernel: 3.10.0-693.el7.ppc64le qemu-kvm-rhev-2.9.0-16.el7_4.3.ppc64le Guest kernel: 3.10.0-675.el7.ppc64le Could you provide logs from libvirt? Created attachment 1314585 [details]
libvirt log. Time is not correct on Host
(In reply to yilzhang from comment #4) > Created attachment 1314585 [details] > libvirt log. Time is not correct on Host Thank you. Could you check you have enough space on the disk (with "df -h /var/lib/libvirt/qemu/dump/")? As your VM is defined with 8GB of memory, you need at least 8GB of free space on the disk. Yes, there is not enough space left: [root@virt8 ~]# df -h /var/lib/libvirt/qemu/dump/ Filesystem Size Used Avail Use% Mounted on /dev/mapper/rhel7--pegas-root 16G 11G 5.3G 68% / I decreased VM's memory to 4G, and successfully got the dump file just now. I don't know how upper layer(e.g. virt-manager) handles this kind of failure. I just want to know is the error message printed by libvirtd expected? Probably it should print some ENOSPC message in this case, I think. As well, the incomplete dump file disappears automatically, which is a bit confusing to me. Please help to clarify it, thank you very much. (In reply to yilzhang from comment #6) > Yes, there is not enough space left: > [root@virt8 ~]# df -h /var/lib/libvirt/qemu/dump/ > Filesystem Size Used Avail Use% Mounted on > /dev/mapper/rhel7--pegas-root 16G 11G 5.3G 68% / > > I decreased VM's memory to 4G, and successfully got the dump file just now. > > > I don't know how upper layer(e.g. virt-manager) handles this kind of failure. > > I just want to know is the error message printed by libvirtd expected? > Probably it should print some ENOSPC message in this case, I think. > > As well, the incomplete dump file disappears automatically, which is a bit > confusing to me. Please help to clarify it, thank you very much. So this bug is not specific to ppc64le. We can modify QEMU to report a more accurate error message, but will it be used by libvirt? Andrea, any comment? In any case this isn't especially urgent, since it's just about making an error message nicer. Deferring. (In reply to Laurent Vivier from comment #7) > We can modify QEMU to report a more accurate error message, but will it be > used by libvirt? It depends on your expectations. If you initiate the dump manually from the host using virsh, the QEMU error will be displayed: # sudo virsh dump guest /var/lib/libvirt/qemu/dump/guest --format elf --memory-only error: Failed to core dump domain guest to /var/lib/libvirt/qemu/dump/guest error: internal error: unable to execute QEMU command 'dump-guest-memory': dump: failed to save memory However, in the situation described above there is no client connected, so the only way libvirt can report the error is through the log. So the error message won't be any more visible to the user than it is now, but at least it will be more helpful. Move to qemu-kvm-rhev. This fix will apply to both RHEL KVM and qemu-kvm-rhev for RHV and RHOSP. Both packages are using the same code base. ------- Comment From yasmins.com 2018-01-25 08:48 EDT------- I am working on it. ------- Comment From yasmins.com 2018-02-09 14:40 EDT------- Sent the patch 'dump: Show custom message for ENOSPC' to qemu-devel for review. ------- Comment From yasmins.com 2018-03-05 14:03 EDT------- The patch has been reviewed and approved. I'll update the bug status as soon as it gets merged to master. Yasmin, As your patch has not been merged I've sent a new patch addressing comments given by Eric Blake: dump: display cause of write failure http://patchwork.ozlabs.org/patch/888783/ ------- Comment From hannsj_uhl.com 2018-03-22 10:04 EDT------- (In reply to comment #14) > Yasmin, > As your patch has not been merged I've sent a new patch addressing comments > given by Eric Blake: > dump: display cause of write failure > http://patchwork.ozlabs.org/patch/888783/ . ... which I think is now finally upstream accepted as git commit https://git.qemu.org/gitweb.cgi?p=qemu.git;a=commit;h=0c33659d09f4a8ab926846295538d6a67e8c2c63 ("dump.c: allow fd_write_vmcore to return errno on failure") ... please correct me if I am wrong ... (In reply to IBM Bug Proxy from comment #15) > ------- Comment From hannsj_uhl.com 2018-03-22 10:04 EDT------- > (In reply to comment #14) > > Yasmin, > > As your patch has not been merged I've sent a new patch addressing comments > > given by Eric Blake: > > dump: display cause of write failure > > http://patchwork.ozlabs.org/patch/888783/ > . > ... which I think is now finally upstream accepted as git commit > https://git.qemu.org/gitweb.cgi?p=qemu.git;a=commit; > h=0c33659d09f4a8ab926846295538d6a67e8c2c63 > ("dump.c: allow fd_write_vmcore to return errno on failure") > ... please correct me if I am wrong ... In fact, the one merged is the v3 of the patch from Yasmin, but it does the same thing, I'm going to backport it. Move state to POST as the fix will come with the rebase on qemu v2.12.0 Reproduce: Version-Release number of selected component (if applicable): Host: kernel: 3.10.0-862.el7.ppc64le qemu-kvm-ma-2.10.0-21.el7.ppc64le SLOF-20170724-2.git89f519f.el7.noarch Guest kernel: kernel: 3.10.0-862.el7.ppc64le How reproducible: 100% Steps to Reproduce: 1. Define one vm and boot up it, for example: virsh define guest.xml [root@ibm-p8-07 micai]# cat guest.xml <domain type='kvm'> <name>rhel75</name> <memory unit='GB'>30</memory> <currentMemory unit='GB'>30</currentMemory> <vcpu placement='static'>24</vcpu> <os> <type arch='ppc64le'>hvm</type> <boot dev='hd'/> </os> <features> <acpi/> <apic/> <pae/> </features> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>coredump-restart</on_crash> <devices> <emulator>/usr/libexec/qemu-kvm</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/home/micai/rhel75.qcow2'/> <target dev='vda' bus='virtio'/> </disk> <graphics type='vnc' port='1' autoport='yes' listen='0.0.0.0'> <listen type='address' address='0.0.0.0'/> </graphics> </devices> </domain> virsh start rhel75 2. Inside guest, issue command to make guest crash [Guest] # systemctl stop kdump [Guest] # echo c > /proc/sysrq-trigger 3. Check the crash coredump file is automatically created on Host [root@ibm-p8-07 micai]# ls -lh /var/lib/libvirt/qemu/dump/ total 21G -rw------- 1 root root 21G Apr 26 03:50 5-rhel75-2018-04-26-03:50:40 [root@ibm-p8-07 micai]# ls -lh /var/lib/libvirt/qemu/dump/ total 0 This is the same result as comment 0. (In reply to Minjia Cai from comment #18) > Reproduce: > > Version-Release number of selected component (if applicable): > Host: > kernel: 3.10.0-862.el7.ppc64le > qemu-kvm-ma-2.10.0-21.el7.ppc64le > SLOF-20170724-2.git89f519f.el7.noarch > > > Guest kernel: kernel: 3.10.0-862.el7.ppc64le > > How reproducible: 100% ... > This is the same result as comment 0. The fix will be in qemu-kvm-rhev-2.12.0 (coming with the rebase). For qemu-kvm-ma-2.10.0, the rhel-7.5.z must be set to + and the BZ cloned. And it will not change the behavior, the error message is only more explicit but I don't know if libvirt (virsh) will report it to you. (In reply to Laurent Vivier from comment #19) > (In reply to Minjia Cai from comment #18) > > Reproduce: > > > > Version-Release number of selected component (if applicable): > > Host: > > kernel: 3.10.0-862.el7.ppc64le > > qemu-kvm-ma-2.10.0-21.el7.ppc64le > > SLOF-20170724-2.git89f519f.el7.noarch > > > > > > Guest kernel: kernel: 3.10.0-862.el7.ppc64le > > > > How reproducible: 100% > ... > > This is the same result as comment 0. > > The fix will be in qemu-kvm-rhev-2.12.0 (coming with the rebase). > > For qemu-kvm-ma-2.10.0, the rhel-7.5.z must be set to + and the BZ cloned. > > And it will not change the behavior, the error message is only more explicit > but I don't know if libvirt (virsh) will report it to you. Sorry, For comment 18, you are misunderstood. I just took over the feature, and I plan to reproduce it myself, and then I will be in fix's qemu-2.12 verify. Version-Release number of selected component (if applicable): Host: kernel: 3.10.0-883.el7.ppc64le qemu-kvm-rhev-2.12.0-1.el7.ppc64le SLOF-20170724-2.git89f519f.el7.noarch Guest kernel: kernel: 3.10.0-883.el7.ppc64le Steps to verify: 1. Define one vm and boot up it, for example: virsh define guest.xml [root@ibm-p8-rhevm-13 micai]# cat guest.xml <domain type='kvm'> <name>rhel75</name> <memory unit='GB'>45</memory> <currentMemory unit='GB'>30</currentMemory> <vcpu placement='static'>24</vcpu> <os> <type arch='ppc64le'>hvm</type> <boot dev='hd'/> </os> <features> <acpi/> <apic/> <pae/> </features> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>coredump-restart</on_crash> <devices> <emulator>/usr/libexec/qemu-kvm</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/home/micai/rhel75.qcow2'/> <target dev='vda' bus='virtio'/> </disk> <graphics type='vnc' port='1' autoport='yes' listen='0.0.0.0'> <listen type='address' address='0.0.0.0'/> </graphics> </devices> </domain> virsh start rhel75 2. Inside guest, issue command to make guest crash [Guest] # systemctl stop kdump [Guest] # echo c > /proc/sysrq-trigger 3. Check the crash coredump file is automatically created on Host [root@ibm-p8-rhevm-13 dump]# df -h /var/lib/libvirt/qemu/dump/ Filesystem Size Used Avail Use% Mounted on /dev/mapper/rhel_ibm--p8--rhevm--13-root 50G 50G 745M 99% / [root@ibm-p8-rhevm-13 dump]# ls -lh /var/lib/libvirt/qemu/dump/ total 44G -rw------- 1 root root 43G May 9 05:57 8-rhel75-2018-05-09-05:53:26 Wait ten minutes. [root@ibm-p8-rhevm-13 dump]# ls -lh /var/lib/libvirt/qemu/dump/ total 43G -rw------- 1 root root 43G May 9 05:57 8-rhel75-2018-05-09-05:53:26 [root@ibm-p8-rhevm-13 dump]# df -h /var/lib/libvirt/qemu/dump/ Filesystem Size Used Avail Use% Mounted on /dev/mapper/rhel_ibm--p8--rhevm--13-root 50G 48G 2.8G 95% / The coredump file is created on host.It doesn't go away.This bug has been proven successful. I use the qemu command to start the guest. (qemu) dump-guest-memory /var/lib/libvirt/qemu/dump/test dump: failed to save memory: No space left on device (qemu) This is a clear reminder. According to comment25. When using libvirt, where should I view the error message? (In reply to Minjia Cai from comment #26) > I use the qemu command to start the guest. > (qemu) dump-guest-memory /var/lib/libvirt/qemu/dump/test > dump: failed to save memory: No space left on device > (qemu) > > This is a clear reminder. According to comment25. When using libvirt, where > should I view the error message? I think the answer is in comment 9: libvirt logs. But perhaps Andrea can give more details? (In reply to Laurent Vivier from comment #27) > (In reply to Minjia Cai from comment #26) > > I use the qemu command to start the guest. > > (qemu) dump-guest-memory /var/lib/libvirt/qemu/dump/test > > dump: failed to save memory: No space left on device > > (qemu) > > > > This is a clear reminder. According to comment25. When using libvirt, where > > should I view the error message? > > I think the answer is in comment 9: libvirt logs. But perhaps Andrea can > give more details? I too expected the error message to be in the guest log, but it's not there. It looks like libvirt is not able to retrieve the return value for the dump job (which is started asyncronously) correctly, despite QEMU reporting it: # In shell #1, run $ sudo virsh qemu-monitor-event guest --loop # In shell #2, run $ sudo virsh qemu-monitor-command guest '{"execute": "dump-guest-memory", "arguments": {"protocol": "file:/small/guest.dump", "paging": "false", "detach": true}}' {"return":{},"id":"libvirt-25"} # Back to shell #1, we now see event DUMP_COMPLETED at 1527170776.767879 for domain guest: {"result":{"total":4294967296,"status":"failed","completed":950140928},"error":"dump: failed to save memory: No space left on device"} I'll look into it, but QEMU is clearly reporting all the expected information at this point and libvirt not exposing it to the user is the remaining issue; see Bug 1578741 for the latter. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3443 |