Red Hat Bugzilla – Bug 542954
Guest suffers kernel panic when save snapshot then restart guest
Last modified: 2013-01-09 17:05:08 EST
Created attachment 375015 [details]
Guest kernel panic when reboot
Description of problem:
After saved the snapshot of guest, then restart it,guest failed to boot and suffers kernel panic.(Attachment will be uploaded.)
This issue only happens when guest using virtio block,ide block is ok.
This issue happens on both RHEL and windows guests.
/usr/libexec/qemu-kvm -no-hpet -usbdevice tablet -rtc-td-hack -smp 2 -m 2G -drive file=/root/RHEL5.4-64-64K.qcow2,media=disk,if=virtio,index=0,boot=on,snapshot=on -net nic,vlan=0,macaddr=10:1a:4a:10:20:4d,model=virtio -net tap,vlan=0,script=/etc/qemu-ifup -uuid `uuidgen` -cpu qemu64,+sse2 -boot c -balloon none -monitor stdio -vnc :10
Version-Release number of selected component (if applicable):
# rpm -qa | grep kvm
Steps to Reproduce:
1.Boot a vm with virtio block using the command above.
3.Restart the guest, "system_reset" and reboot inside guest both can reproduce it.
Guest suffers kernel panic.
Guest should restart sucessfully.
processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz
stepping : 10
cpu MHz : 2826.231
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 3
cpu cores : 4
apicid : 3
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 lahf_lm
bogomips : 5652.48
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
I think I can reproduce this one. However, I don't even need the savevm, just starting the VM with a virtio-blk disk with snapshot=on is enough. This doesn't seem to happen on a manually created external snapshot - the differrence here is that with snapshot=on writeback caching is used, and indeed when patching this part out, it seems to work with snapshot=on as well.
"Seems to work" obviously doesn't mean that no bug exists there, it may just be hidden, for example by timing differences, so that it's much less likely to trigger it.
I can confirm that the image metadata is correct, but data gets corrupted (possibly by writes to the wrong virtual disk offset?). I haven't identified yet when exactly the fatal write is happening, but it seems to hit the start of the partition relatively often, sometimes corrupting my superblock.
Verified on kvm-83-180.el5, pass. With the same steps and command line.
Sorry for I make a operation mistake and change the status, now change it back.
*** Bug 578869 has been marked as a duplicate of this bug. ***
I can reproduce this bug on RHEL5.5 64 server with old KVM: 83-161.el5 and Kernel: 2.6.18-194.el5.
I add above comments just to record the test process.
Based on comment#5, change status to verified.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.