Hide Forgot
Description of problem: Trying to start my guest, I got the following error message. # virsh start knoel1 error: Failed to start domain knoel1 error: Unable to read from monitor: Connection reset by peer Error in /var/log/libvirt/qemu/knoel1.log: 2012-03-27 15:39:28.900+0000: starting up LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -S -M rhel6.2.0 -enable-kvm -m 2048 -smp 4,sockets=4,cores=1,threads=1 -name knoel1 -uuid 3e1d5d47-4963-dcd8-a3f1-311d1b3c677f -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/knoel1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/libvirt/images/knoel1.img,if=none,id=drive-virtio-disk0,format=raw -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/var/lib/libvirt/images/knoel1-1.img,if=none,id=drive-virtio-disk1,format=raw -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk1,id=virtio-disk1 -netdev tap,fd=22,id=hostnet0,vhost=on,vhostfd=23 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:a6:c4:4e,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 127.0.0.1:0,password -vga cirrus -incoming fd:20 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 char device redirected to /dev/pts/1 qemu: warning: error while loading state section id 3 load of migration failed 2012-03-27 15:39:31.108+0000: shutting down "Section id 3" doesn't give me a clue what's wrong. "Load of migration" is also misleading. I never migrated the guest. To work around the problem, I moved /var/lib/libvirt/qemu/save/knoel1.save away to another directory. I saved it for debugging. Avi debugged and found that "section id 3" is "mem". The guess is that the .save file was truncated. Version-Release number of selected component (if applicable): How reproducible: Can reproduce the error with my guest's .save file. Cannot reproduce the bad .save file. Steps to Reproduce: "virsh start knoel1" with bad /var/lib/libvirt/qemu/save/knoel1.save on virtlab204. Not sure how to reproduce the bad .save file. Maybe a developer can reproduce a truncated .save file? I have the above .save file on my system - virtlab204. Contact me for access. Actual results: Obscure error message. Expected results: Clear error message so the user knows how to fix the problem. Additional info:
Hi, Karen Would you please provides the qemu-kvm version? thanks
This happened with 6.2 as well as the latest 6.3 qemu-kvm. I was using qemu-kvm-0.12.1.2-2.265.el6.
I take it to libvirt, as libvirt is expected to recongize the corrupted domain state file, and ignores it when domain starting. See #BZ 730750.
Can you please attach the first 4k of the corrupted save file to this BZ (should just be some binary data followed by the guest XML, limiting to 4k will avoid sending any actual saved guest state, so you don't have to worry about that security aspect)? Libvirt detects corrupted state files by writing a temporary header, then doing migrate to file, then finally rewriting the header to the proper value. Also, what version of libvirt was running at the time the guest was previously subject to a managedsave operation? I assume you are using the libvirt-guests service, which does a guest managedsave on host shutdown?
Karen, can you please provide the first 4K of the corrupted save file as requested in comment #5? Thanks!
This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux.
This request was erroneously removed from consideration in Red Hat Enterprise Linux 6.4, which is currently under development. This request will be evaluated for inclusion in Red Hat Enterprise Linux 6.4.
Jiri, this is a dup of the fs corruption on shutdown that you worked on, right?
It looks like it could be the bug. Eric Sandeen identified and fixed an issue in writeback code, which he believes was the reason for fs corruption. See bug 818172 (the original fs corruption bug 749527 is likely a duplicate of that bug). The bug is supposed to be fixed in kernel-2.6.32-280.el6
I also think this is a dup. *** This bug has been marked as a duplicate of bug 818172 ***