Bug 1359324
Summary: | qemu-system-x86 dumped core upon normal shutdown of guest | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Chris Murphy <bugzilla> | ||||||
Component: | qemu | Assignee: | Fedora Virtualization Maintainers <virt-maint> | ||||||
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 24 | CC: | amit.shah, berrange, bugzilla, cfergeau, crobinso, dwmw2, itamar, pbonzini, rjones, virt-maint, zbyszek | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1359325 (view as bug list) | Environment: | |||||||
Last Closed: | 2017-03-14 20:08:47 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1359325 | ||||||||
Attachments: |
|
Description
Chris Murphy
2016-07-22 20:26:45 UTC
So I'm guessing (because you're using nocow=on) that you are using btrfs on the host? I would first look for btrfs problems on the host. Are there any messages in the system dmesg or system journal pointing to btrfs / host filesystem problems, I/O errors, etc.? Host /var/lib/libvirt/images is on Btrfs. There are no Btrfs messages since mount time at last startup. There are no libata messages related to the SSD since startup. The file system passes a scrub with no errors; and also an offline btrfs check. If it's a Btrfs bug, it doesn't know about it, and is continuing to let me use the file system unabated. Created attachment 1183020 [details]
gdb coredump
Unfortunately the core dump is truncated for some reason, so this gdm attempt is probably useless.
BFD: Warning: /var/tmp/coredump-S2qIGu is truncated: expected core file size >= 4115308544, found: 2147483648.
2147483648 is 0x80000000 or exactly 2GiB. That's suspicious. Misconfiguration somewhere causing the truncation?
2GiB is the systemd-coredump default for ProcessSizeMax= and ExternalSizeMax=. Was this coredump captured by systemd-coredump? (In reply to Zbigniew Jędrzejewski-Szmek from comment #4) > 2GiB is the systemd-coredump default for ProcessSizeMax= and > ExternalSizeMax=. OK should I change both values to something higher like 4GiB? The VM is allocated 3GiB, but gdb expects a ~3.8GiB core dump file. > Was this coredump captured by systemd-coredump? Yes. (In reply to Chris Murphy from comment #5) > OK should I change both values to something higher like 4GiB? The VM is > allocated 3GiB, but gdb expects a ~3.8GiB core dump file. Yes. But like I wrote in the e-mail thread, I don't think truncating coredumps like that makes sense. OK filed bug 1359410 for the coredump file truncation. I'll change the coredump file limits and try to reproduce this bug to try and figure out why qemu crashed. Look what I found. <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='unsafe' io='threads'/> <source file='/var/lib/libvirt/images/uefi_opensuseleap42.2a3-1.qcow2'/> <target dev='vda' bus='virtio'/> <boot order='2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/> </disk> <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='unsafe' io='threads'/> <source file='/var/lib/libvirt/images/uefi_opensuseleap42.2a3-1.qcow2'/> <target dev='vdb' bus='virtio'/> <boot order='3'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/> </disk> The same qcow2 file associated with two drives, and those two drives were setup in the VM to be part of an mdadm RAID 1. Well, that explains why the -2 file wasn't being written to. Neither virsh nor virt-manager warn or complain about this. It appears to permit the same file being used as backing for two virtual devices. So, a.) user error, b.) no warnings, c.) qemu blows up well after d.) totally corrupting the target qcow2, e.) results in qcow2 becoming astronomically large. (In reply to Chris Murphy from comment #8) > Look what I found. > > <disk type='file' device='disk'> > <driver name='qemu' type='qcow2' cache='unsafe' io='threads'/> > <source > file='/var/lib/libvirt/images/uefi_opensuseleap42.2a3-1.qcow2'/> > <target dev='vda' bus='virtio'/> > <boot order='2'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x09' > function='0x0'/> > </disk> > <disk type='file' device='disk'> > <driver name='qemu' type='qcow2' cache='unsafe' io='threads'/> > <source > file='/var/lib/libvirt/images/uefi_opensuseleap42.2a3-1.qcow2'/> > <target dev='vdb' bus='virtio'/> > <boot order='3'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' > function='0x0'/> > </disk> > > The same qcow2 file associated with two drives, and those two drives were > setup in the VM to be part of an mdadm RAID 1. Well, that explains why the > -2 file wasn't being written to. Neither virsh nor virt-manager warn or > complain about this. virt-manager would have warned if you used the UI to attach the disk images to the VM. But it won't warn at start time, neither will virsh/libvirt like you say, with the default config. You can enable virtlockd and it will catch issues like this... there's been discussions occasionally about enabling it by default but it hasn't happened yet. But that's the proper place to handle this type of validation. It appears to permit the same file being used as > backing for two virtual devices. So, a.) user error, b.) no warnings, c.) > qemu blows up well after d.) totally corrupting the target qcow2, e.) > results in qcow2 becoming astronomically large. The interesting bit here is the qemu crash... we don't want qemu to crash even if the disk image is outrageously sized. So please update if you get a complete backtrace Chris, have you seen this since, or captured a backtrace? |