Description of problem: VDS- rhel 5.5 (kernel 194), guest -rhel 5.5 when creating a snapshot using rhevm of the guest vm it becomes corrupted (kernel panik, segmantation faults). same proccess done with ide driver works fine. host: intel xeon core i7 12GB Version-Release number of selected component (if applicable): kernel: 2.6.18 -194 kvm: 83 -164 How reproducible: always, on that system Steps to Reproduce: 1.create a vm from blank with rhel 5.5 2.create template/snapshot from it 3.the vm (from template or after snapshot) becomes corrupted
qemu-img check went fine on problematic images.
Can you please post the panic message? Also, what exactly do you mean by "segmentation faults"? qemu dies or random processes in the VM die? If the former, a backtrace would be helpful.
All problems occurs in the VM itself and not on the host, attached is screenshot of the kernel panic and random failures- most of the problems are seen are related to disk/fs.
Created attachment 404639 [details] kernel panic screenshot
The subject line says that this is on iscsi (I missed this at first because it's not in the bug description). Is iscsi needed, or do you see the same with the image in a local file or LV?
(In reply to comment #5) > The subject line says that this is on iscsi (I missed this at first because > it's not in the bug description). Is iscsi needed, or do you see the same with > the image in a local file or LV? I have tried it with local file and all worked well.
What about LVs? To qemu they should look the same as iscsi, I think - just a block device.
Moran, can you retest with kvm-83-179.el5? This is possibly a duplicate of bug 542954 which is fixed in this version.
(In reply to comment #8) > Moran, can you retest with kvm-83-179.el5? This is possibly a duplicate of bug > 542954 which is fixed in this version. This will require a whole new kernel and stuff, which we don't have at the moment. Lawrence, please have someone from your team take over this, if possible. Have you reproduced this?
The fix is in the userspace part, so you could just install that part and keep the old kernel. Even just extracting the binary from the new RPM should be enough.
Re-test this issue on kernel: 2.6.18 -194, kvm: 83 -164, can not reproduce. Steps: 1. Install a guest on iscsi. /usr/libexec/qemu-kvm -no-hpet -usbdevice tablet -rtc-td-hack -no-kvm-pit-reinjection -startdate now -drive file=RHEL5.5-Server-20100322.0-x86_64-DVD.iso,media=cdrom -drive file=/dev/vgtest/lv-base,media=disk,format=qcow2,if=virtio,boot=on -net nic,vlan=0,macaddr=10:1a:4a:10:20:40,model=virtio -net tap,vlan=0,script=/etc/qemu-ifup -cpu qemu64,+sse2 -balloon none -vnc :10 -uuid `uuidgen` -monitor stdio -m 2G -smp 2 -boot dc 2. After installation, create the template. #lvcreate -n lv-template -L 20G vgtest #qemu-img create -f qcow2 /dev/vgtest/lv-template 20G #qemu-img convert -f qcow2 /dev/vgtest/lv-base -O qcow2 /dev/vgtest/lv-template 3. Create snapshot from the template. # lvs LV VG Attr LSize Origin Snap% Move Log Copy% Convert LogVol00 VolGroup00 -wi-ao 292.28G LogVol01 VolGroup00 -wi-ao 5.69G lv-base vgtest -wi-a- 20.00G lv-template vgtest -wi-a- 20.00G #lvcreate -n lv-sn1 -L 20G vgtest #qemu-img create -f qcow2 -F qcow2 -b /dev/vgtest/lv-template /dev/vgtest/lv-sn1 4.Boot snapshot 1: lv-sn1 with the above command line. Result: can boot up successfully. PS: I test it using the virtio block all the time and have not changed the interface. qzhang -> mgoldboi: Have you changed the guest interface? because there is a bug : Bug 561221 - Snapshot of guest suffers kernel panic when installed with virtio block and boot with ide block
mgoldboi does not provide input, it does work for qzhang, closing.
repo steps and system details were provided to kwolf
adding the details: Server- silver-vdsd.qa.lab.tlv.redhat.com Template location: /rhev/data-center/e80168ab-a912-4855-97ff-f778d5746432/8900978c-e842-4037-8f04-c9a740793a13/images/12cb47b1-3fcc-40f1-a17a-b5ccb0a17dd9 Instance location: /rhev/data-center/e80168ab-a912-4855-97ff-f778d5746432/8900978c-e842-4037-8f04-c9a740793a13/images/d0996fd9-4f06-4583-8bb8-0339084e1e83/2b4ce82a-e3d4-4086-95c8-2512fd4bed9d Running command: /usr/libexec/qemu-kvm -name fst -smp 1,cores=1 -k en-us -m 1024 -boot cn -net nic,vlan=1,macaddr=00:1a:4a:16:89:0c,model=e1000 -net tap,vlan=1,ifname=e1000_13_1,script=no -drive file=/rhev/data-center/e80168ab-a912-4855-97ff-f778d5746432/8900978c-e842-4037-8f04-c9a740793a13/images/d0996fd9-4f06-4583-8bb8-0339084e1e83/2b4ce82a-e3d4-4086-95c8-2512fd4bed9d,media=disk,if=ide,cache=writeback,serial=83-8bb8-0339084e1e83,boot=on,format=qcow2,werror=stop -vnc 0:13,moran -cpu qemu64,+sse2 If I run it with if=ide it works fine, but if I change it to virtio we get the bug…
Are you sure this is the right one? It does fail indeed, but never in the way as in the screenshot you attached. Instead it fails mounting its root device - for which the very simple cause seems to be that there is no virtio-blk driver (even a copy of the base image fails this way, with no snapshots involved). At least I can't see any occurrence of "virt" in the kernel log.
So Moran provided me with a different image that actually does show the corruption issue. Thanks! To test this, I created a new snapshot (in a file) and then just tried to boot the guest up: # qemu-img create -f qcow2 -F qcow2 -b /rhev/data-center/e80168ab-a912-4855-97ff-f778d5746432/8900978c-e842-4037-8f04-c9a740793a13/images/7c140b58-0dc5-48af-b43f-6ac17fc3257e/../7c140b58-0dc5-48af-b43f-6ac17fc3257e/af8425d0-d63e-4d68-a1ec-2e0ca678caa1 overlay.qcow2 # /usr/libexec/qemu-kvm -no-hpet -usbdevice tablet -rtc-td-hack -startdate 2010-06-14T11:42:22 -name xxxft -smp 1,cores=1 -k en-us -m 512 -boot c -drive file=overlay.qcow2,media=disk,if=virtio,cache=writeback,serial=af-b43f-6ac17fc3257e,boot=on,format=qcow2,werror=stop -vnc 0:15 -cpu qemu64,+sse2 -M rhel5.5.0 -notify all -balloon none -k de -serial file:/tmp/serial.out With the qemu-kvm binary of the package installed on this machine, I could reproduce the bug every time in three attempt. Tried the same three times with a binary compiled from the current rhel5/master branch and succeeded. As a final test, I also created a fresh snapshot on the block device that Moran had used and ran it with the new binary and it succeeded as well. I consider this fixed therefore, and I have strong suspicion that it's the fix of bug 542954 which fixes this as well. Marking as a duplicate of that bug. *** This bug has been marked as a duplicate of bug 542954 ***
Comment from Kevin, QE please take note and make sure the suggestions made by Kevin are well covered. ======================= Anything that uses lots of synchronous reads/writes (i.e. metadata operations). Long snapshots chains where a lot of COW happens seems to be a good candidate. It's probably enough to test intensively with one backing file format, preferably qcow2 which may issue synchronous metadata I/O again and therefore makes the scenario more complex. For verification of the fix, you need to use virtio-blk (multiple requests running at once are required to even trigger this bug). On the other hand, only IDE can directly call synchronous bdrv_read/write which is touched by this patch, so in order to avoid regressions some tests on IDE should be run, too. Kevin
Hi all, We can NOT reproduce this bug. kernel: 2.6.18-194.3.1.el5 kvm: 83-164 RHEV-H: 5.5-2.2 (4.1) host1: intel xeon core i7 host2: intel xeon 45nm Core2 host3: AMD Opteron G2 guest OS: RHEL 5.5 32bit/64bit, RHEL 5.4 64bit. Test steps: 1. Access RHEV-M with vdcadmin user. 2. Create a VM guest on iscsi storage with virtio disk and rhevm network(cow sparse on iscsi) 3. After installation, create a snapshot1 for this VM. 4. Boot the VM, the VM started successfully. 5. Stop the VM, preview and commit the snatpshot1 5. Boot the snapshot1, the VM started successfully. Additional info: 1. Commands line in RHEV-H: vdsm 13034 13025 2 09:56 ? 00:00:32 /usr/libexec/qemu-kvm -no-hpet -no-kvm-pit-reinjection -usbdevice tablet -rtc-td-hack -startdate 2010-06-17T02:56:08 -name rhel55-64 -smp 1,cores=1 -k en-us -m 1024 -boot cd -net nic,vlan=1,macaddr=00:1a:4a:42:41:0b,model=e1000 -net tap,vlan=1,ifname=e1000_10_1,script=no -drive file=/rhev/data-center/2e85b7a4-e36c-4a15-b3e0-e41f91fb965c/95a01a9f-4341-44db-b725-34f4d08eff11/images/8aeaca2f-04e5-4389-9956-96109dbfcbd7/c2ce9ef5-9d0e-4a53-a69f-4623a1eceab4,media=disk,if=virtio,cache=off,serial=89-9956-96109dbfcbd7,boot=on,format=qcow2,werror=stop -pidfile /var/vdsm/4f074e4f-7925-480f-97bd-e851e3adbd78.pid -vnc 0:10,password -cpu qemu64,+sse2,+cx16,+ssse3,+sse4.1,+sse4.2,+popcnt -M rhel5.5.0 -notify all -balloon none -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=5.5-2.2-4.1,serial=44454C4C-4800-1032-8033-C7C04F4D3258_00:21:9b:ff:b9:fe,uuid=4f074e4f-7925-480f-97bd-e851e3adbd78 -vmchannel di:0200,unix:/var/vdsm/4f074e4f-7925-480f-97bd-e851e3adbd78.guest.socket,server -monitor unix:/var/vdsm/4f074e4f-7925-480f-97bd-e851e3adbd78.monitor.socket,server 2. We also test this bug on rhevm-backup.qa.lab.tlv.redhat.com which is ykaul provided. But we also can NOT reproduce this bug with the same steps. 3. We need to continue to test other scenario for qcow2 virtual disk with iscsi storage
We always can reproduce the bug 578869 with the following env.: Host: RHEL 5.5 Server Kernel:2.6.18-194.el5 KVM Version:83-164.el5_5.6 iscsi on Solaris Verified this bug today: Host: RHEL 5.5 Server Kernel:2.6.18-194.3.1.el5 KVM Version:83-164.el5_5.12 iscsi on Solaris Note: We could not reproduce the bug when we used iscsi on NetBSD v1.62 before, Now this bug can be reproduced always when we use iscsi on Solaris.
Created attachment 426181 [details] QE reproduce this bug screenshot
Created attachment 426182 [details] QE reproduce this bug screenshot2