Bug 760899 - Guest aborted during ping-pong migration with ENOSPAC when qcow2 is used within a LV
Summary: Guest aborted during ping-pong migration with ENOSPAC when qcow2 is used with...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kvm
Version: 5.8
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Kevin Wolf
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 807971
TreeView+ depends on / blocked
 
Reported: 2011-12-07 10:14 UTC by Qunfang Zhang
Modified: 2012-04-12 15:11 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-04-10 08:38:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Qunfang Zhang 2011-12-07 10:14:30 UTC
Description of problem:
Install and migrate a geust (I used win7-64) between 2 hosts and the images is located on a iscsi storage. Create a small LV and enlarge it with "lvextend" command when there's no space to proceed installation.  Guest aborted.

Version-Release number of selected component (if applicable):
kernel-2.6.18-300.el5
kvm-83-246.el5

How reproducible:
Always

Steps to Reproduce:
0. Create the lv and image file and *make sure the image is valid and active on both 2 hosts*.
#vgcreate vgtest-qzhang /dev/sdb1
#lvcreate -n lv-install -L 1G vgtest-qzhang
#qemu-img create -f qcow2 /dev/vgtest-qzhang/lv-install 20G

1. Boot a win7 guest on host A (from iscsi disk):

(gdb) r -no-hpet -no-kvm-pit-reinjection -usbdevice tablet -rtc-td-hack -startdate now -name win7-64 -smp 2,cores=2 -k en-us -m 2048 -boot dc -net nic,vlan=1,macaddr=00:1a:4a:40:19:01,model=virtio -net tap,vlan=1,ifname=virtio_10_1,script=/etc/qemu-ifup,downscript=no -drive file=/dev/vgtest-qzhang/lv-install,media=disk,if=virtio,cache=off,boot=on,format=qcow2,werror=stop -drive file=/mnt/en_windows_7_ultimate_with_sp1_x64_dvd_618240.iso,media=cdrom,if=ide -cpu qemu64,+sse2 -M rhel5.6.0 -notify all -balloon none -monitor stdio -spice host=0,ic=on,port=5930,disable-ticketing -qxl 1 -fda /mnt/virtio-drivers-1.0.3-52454.vfd 

2. Boot guest with "-incoming tcp:0:5800" on host B.

3. Ping-pong migrate guest between the 2 hosts.

4. If guest stops as no enough space, using the following commands to proceed.
#lvextend -L +500M /dev/vgtest-qzhang/lv-install
(qemu)cont
  
Actual results:
Guest aborted, always on the dst side.

Expected results:
No aborted happens, guest migration and installation succeed.

Additional info:

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x41e02940 (LWP 25836)]
0x0000003d1ba30285 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x0000003d1ba30285 in raise () from /lib64/libc.so.6
#1  0x0000003d1ba31d30 in abort () from /lib64/libc.so.6
#2  0x000000000041a30e in raw_pread (bs=0xdd7910, offset=1633615872, buf=0x1917010 "", count=65536) at block-raw-posix.c:310
#3  0x0000000000463f3d in bdrv_pread (bs=0xdd7910, offset=1633615872, buf1=0x1917010, count1=65536) at block.c:944
#4  0x0000000000498a77 in l2_load (bs=<value optimized out>, l2_offset=1633615872, l2_table=0x80) at block-qcow2.c:655
#5  0x000000000049b423 in get_cluster_table (bs=0xdd7010, offset=2282418176, new_l2_table=0x41e01dc0, 
    new_l2_offset=0x41e01dc8, new_l2_index=0x41e01dd4) at block-qcow2.c:954
#6  0x000000000049b47c in alloc_cluster_offset (bs=0x64e2, offset=25836, n_start=120, n_end=256, num=0xec7f84, 
    m=0x101010101010101) at block-qcow2.c:1164
#7  0x000000000049c184 in qcow_aio_write_cb (opaque=0xec7f50, ret=0) at block-qcow2.c:1614
#8  0x000000000049c309 in qcow_aio_write (bs=<value optimized out>, sector_num=<value optimized out>, 
    buf=<value optimized out>, nb_sectors=<value optimized out>, cb=<value optimized out>, opaque=<value optimized out>)
    at block-qcow2.c:1661
#9  0x0000000000462ec0 in bdrv_aio_write (bs=0xdd7010, sector_num=4457848, buf=0x1df6c00 "\001\066", nb_sectors=136, 
    cb=0x4172d0 <virtio_blk_rw_complete>, opaque=0x1decac0) at block.c:1663
#10 0x000000000041749b in virtio_blk_handle_write (req=0x1decac0)
    at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/hw/virtio-blk.c:185
#11 0x000000000041777d in virtio_blk_handle_output (vdev=0xec2aa0, vq=0xec7010)
    at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/hw/virtio-blk.c:253
#12 0x0000000000500d91 in kvm_outw (opaque=<value optimized out>, addr=25836, data=6)
    at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/qemu-kvm.c:692
#13 0x000000000052c526 in handle_io (kvm=0xdc1c30, run=0x2aaaaaab7000, vcpu=1) at libkvm.c:742
#14 0x000000000052cdd9 in kvm_run (kvm=0xdc1c30, vcpu=1, env=0xe74010) at libkvm.c:967
#15 0x00000000005014f9 in kvm_cpu_exec (env=0x6) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/qemu-kvm.c:206
#16 0x0000000000501783 in kvm_main_loop_cpu (_env=<value optimized out>)
    at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/qemu-kvm.c:402
#17 ap_main_loop (_env=<value optimized out>) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/qemu-kvm.c:443
#18 0x0000003d1c60677d in start_thread () from /lib64/libpthread.so.0
#19 0x0000003d1bad495d in clone () from /lib64/libc.so.6
(gdb)

Comment 1 Ronen Hod 2011-12-07 15:24:22 UTC
Qunfang,

Do you know if it happens in 6.2?

Comment 2 Kevin Wolf 2011-12-07 15:45:26 UTC
Do you migrate while the guest is stopped due to the ENOSPC condition, or do you make sure that it's always running when you migrate? Is there a pattern like that it always fails if the error happens while migration is in progress?

Comment 3 Qunfang Zhang 2011-12-08 04:37:03 UTC
(In reply to comment #1)
> Qunfang,
> 
> Do you know if it happens in 6.2?
Hi, Ronon
I'm reinstalling the 2 RHEL6.2 hosts and will update the result later.


(In reply to comment #2)
> Do you migrate while the guest is stopped due to the ENOSPC condition, or do
> you make sure that it's always running when you migrate? Is there a pattern
> like that it always fails if the error happens while migration is in progress?

Kevin,
I re-test the following scenarios:
(1)Ping-pong migration while vm is always running. => Succeed.

(2)Ping-pong migration, and the vm is stopped due to the ENOSPC error. => Succeed.

(3)Ping-pong migration, while vm is stopped during migration, I enlarge the lvm with "lvextend" command and then using "cont" qemu command to continue it. At this moment, migration has not finished yet.
==> The segmentation happens at the dst side after migration finishes.

Comment 4 Qunfang Zhang 2011-12-08 06:14:07 UTC
Hi, Ronon
I tested RHEL6.2 with the same steps as bug description. Guest stops during migration due to no enough space, so I enlarge the lvm to 20G and using "cont" qemu command to proceed it. After finish migration, guest does not get segmentation fault. But it always stops at the dst side due to "No space left on device (28)". The lvm size is 20G now and the installation only finishes 26%. 
Actually if I install the win7 guest with a 20G disk, it should be enough.

Comment 5 Kevin Wolf 2011-12-08 10:00:17 UTC
Can you please check if the size of the LV has been successfully updated on the destination host or if it's still the smaller old size?

Comment 6 Qunfang Zhang 2011-12-08 10:11:02 UTC
(In reply to comment #5)
> Can you please check if the size of the LV has been successfully updated on the
> destination host or if it's still the smaller old size?

The size of LV has been updated on both destination and source hosts in the same time.

Comment 8 RHEL Program Management 2012-04-02 10:41:43 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release.  Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products.  This request is not yet committed for inclusion in
a release.

Comment 9 Ademar Reis 2012-04-09 22:46:02 UTC
(In reply to comment #4)
> Hi, Ronon
> I tested RHEL6.2 with the same steps as bug description. Guest stops during
> migration due to no enough space, so I enlarge the lvm to 20G and using "cont"
> qemu command to proceed it. After finish migration, guest does not get
> segmentation fault. But it always stops at the dst side due to "No space left
> on device (28)". The lvm size is 20G now and the installation only finishes
> 26%. 
> Actually if I install the win7 guest with a 20G disk, it should be enough.

Please try with the current 6.3 beta and open a RHEL6 bug if you see this behavior (if you haven't opened a RHEL6 bug already).

Regarding RHEL5, I understand this is a corner case and since it was not reported by any customer, I recommend we close this bug as WONTFIX.

Comment 10 Qunfang Zhang 2012-04-10 07:33:45 UTC
(In reply to comment #9)
> (In reply to comment #4)
> > Hi, Ronon
> > I tested RHEL6.2 with the same steps as bug description. Guest stops during
> > migration due to no enough space, so I enlarge the lvm to 20G and using "cont"
> > qemu command to proceed it. After finish migration, guest does not get
> > segmentation fault. But it always stops at the dst side due to "No space left
> > on device (28)". The lvm size is 20G now and the installation only finishes
> > 26%. 
> > Actually if I install the win7 guest with a 20G disk, it should be enough.
> 
> Please try with the current 6.3 beta and open a RHEL6 bug if you see this
> behavior (if you haven't opened a RHEL6 bug already).
> 

Yes, it exists on the latest 6.3, file a new bug 811110 to track it.

> Regarding RHEL5, I understand this is a corner case and since it was not
> reported by any customer, I recommend we close this bug as WONTFIX.


Note You need to log in before you can comment on or make changes to this bug.