Bug 682653

Summary: Migration progress stuck with qcow image ENOSPAC during installation
Product: Red Hat Enterprise Linux 6 Reporter: Mike Cao <bcao>
Component: qemu-kvmAssignee: Orit Wasserman <owasserm>
Status: CLOSED WORKSFORME QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.1CC: bcao, cpelland, hhuang, juzhang, kwolf, michen, mkenneth, owasserm, quintela, syeghiay, tburke, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 700701 (view as bug list) Environment:
Last Closed: 2011-10-11 10:42:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 580951, 700701    

Description Mike Cao 2011-03-07 06:34:38 UTC
Description of problem:
make sure qcow2 image on lvm hit ENOSPAC while migration is in progress ,hit the issue that migration progress is stuck with remaining ram: 8 kbytes

Version-Release number of selected component (if applicable):
# uname -r
2.6.32-118.el6.x86_64
# rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.148.el6.x86_64


How reproducible:
2/2

Steps to Reproduce:
1.create a 1G lvm on the host
#lvcreate -L 1G -n EIO10 mike
2.format it as 20GB qcow2 image
#qemu-img create -f qcow2 /dev/mike/EIO10 20G
3.start VM in src host
eg:/usr/libexec/qemu-kvm -enable-kvm -m 4G -smp 4 -name rhel6U1 -uuid adcbfb49-3411-2701-3c36-6bdbc00bedb8 -rtc base=utc,clock=host,driftfix=slew -boot c -drive file=/dev/mike/EIO10,if=none,id=mike_d1,format=qcow2,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,drive=mike_d1,id=mike_d1 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=a2:51:50:a4:c2:21 -chardev pty,id=serial0 -device isa-serial,chardev=serial0 -usb -device usb-tablet,id=input0 -vnc :4 -device virtio-balloon-pci,id=ballooning -monitor stdio -kernel /mnt/pxe/vmlinuz -initrd /mnt/pxe/initrd.img -incoming tcp:0:5888
4.start listenning port
5.do live migration back& forth.MAKING SURE that qcow2 ENOSPAC occurs during migration is in progress
  
Actual results:
after more or less 30mins
migration will stuck at remaining ram: 8 kbytes .

(qemu) info migrate 
Migration status: active
transferred ram: 4254048 kbytes
remaining ram: 8 kbytes
total ram: 4211072 kbytes
(qemu) info migrate 
Migration status: active
transferred ram: 4254048 kbytes
remaining ram: 8 kbytes
total ram: 4211072 kbytes

Expected results:
migration will complete.

Additional info:
1.can exec (qemu)cont in the src host as workround ,then migration will complete.
2.migration_cancel can used 
3.the installation se a http tree .no CDROM in this test case

Comment 2 Kevin Wolf 2011-03-07 13:06:55 UTC
This looks like a general migration problem. From what I can tell, migrating after ENOSPC has two special conditions: First the VM is stopped, and second the disk emulation has pending requests.

Also, I wonder if this is a regression? I seem to remember that we had the topic of migrating stopped VMs before.

Comment 3 Mike Cao 2011-03-08 04:54:42 UTC
I tried migration with qcow2 ENOSPAC W/O guest installation.

steps:
1.create a 1G lvm on the host
#lvcreate -L 1G -n EIO10 mike
2.format it as 20GB qcow2 image
#qemu-img create -f qcow2 /dev/mike/EIO10 20G
3.start VM in src host
eg:/usr/libexec/qemu-kvm -enable-kvm -m 4G -smp 4 -name rhel6U1 -uuid
adcbfb49-3411-2701-3c36-6bdbc00bedb8 -rtc base=utc,clock=host,driftfix=slew
-boot c -drive
file=/dev/mike/RHEL6,if=none,id=mike_d1,format=qcow2,cache=none,werror=stop,rerror=stop
-device virtio-blk-pci,drive=mike_d1,id=mike_d1,bootindex=1 -netdev
tap,id=hostnet0,vhost=on -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=a2:51:50:a4:c2:21 -chardev
pty,id=serial0 -device isa-serial,chardev=serial0 -usb -device
usb-tablet,id=input0 -vnc :4 -device virtio-balloon-pci,id=ballooning -monitor
stdio -drive
file=/dev/mike/EIO,if=none,id=mike_d1,format=qcow2,cache=none,werror=stop,rerror=stop
-device virtio-blk-pci,drive=mike_d1,id=mike_d1
4.start listenning port
5.do live migration back& forth.MAKING SURE that qcow2 ENOSPAC occurs during
migration is in progress .
6.extend LVM's size ,and repeat step5 in a loop


Actual Results:
1.migration can finished successfully
2./dev/mike/EIO images corropted 
#Leaked cluster 161367 refcount=1 reference=0
Leaked cluster 161368 refcount=1 reference=0

6181 errors were found on the image.
Data may be corrupted, or further writes to the image may corrupt it.

66304 leaked clusters were found on the image.
This means waste of disk space, but no harm to data.

3.(randomly occurs)
qemu-kvm process quit due to "Guest moved used index from 23823 to 3140"

Comment 4 Kevin Wolf 2011-03-08 08:48:14 UTC
2. could be the same as bug 681472. 3. looks like a virtio migration bug. Though both of them seem to be a different problem than the original one.

Comment 5 Dor Laor 2011-03-17 13:42:55 UTC
Was there corruption of the image? Seems like the virtio bug might mask the ENOSPC issue. Can QE try with IDE/E1000?

Comment 6 Mike Cao 2011-03-21 07:46:36 UTC
Tried with I(In reply to comment #5)
> Was there corruption of the image? 
 it prompts the image was corrupted by using #qemu-img check
>Seems like the virtio bug might mask the
> ENOSPC issue. Can QE try with IDE/E1000?

Tried with IDE/E1000

senario 1:
1.do live migration during guest installation while ENOSPC occurs

Actual Results:
Migration process does not stuck. can NOT hit the issue on comment #0.

Senario 2:
1.Repeat steps on comment #3 ,with IDE/e1000

Actual Results:
#qemu-img check /dev/mike/tt2.img
Leaked cluster 24576 refcount=1 reference=0
Leaked cluster 24577 refcount=1 reference=0
Leaked cluster 24578 refcount=1 reference=0

5969 errors were found on the image.
Data may be corrupted, or further writes to the image may corrupt it.

80 leaked clusters were found on the image.
This means waste of disk space, but no harm to data.

Based on #qemu-kvm check ,the image was corrupted

Comment 12 Orit Wasserman 2011-10-10 07:29:35 UTC
I couldn't reproduce (both installation and load) on a single host.
Can you try and reproduce it on two host machines ?
(note: don't forget to call lvchange --refresh on the dest host after lvextend )

Comment 13 Mike Cao 2011-10-11 06:16:32 UTC
(In reply to comment #12)
> I couldn't reproduce (both installation and load) on a single host.
> Can you try and reproduce it on two host machines ?
> (note: don't forget to call lvchange --refresh on the dest host after lvextend
> )

Hi ,Orit

Tried the on qemu-kvm-0.12.1.2-2.195.el6.x86_64 with steps in comment #0 and
comment #12 ,
I can not reproduce it either on two host machines.

Mike