Bug 614286

Summary: Installation failed and Image errors occurs when do Live Migration during Guest installation
Product: Red Hat Enterprise Linux 6 Reporter: Mike Cao <bcao>
Component: qemu-kvmAssignee: Juan Quintela <quintela>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.0CC: amit.shah, ddumas, kwolf, lcapitulino, lihuang, michen, mkenneth, quintela, tburke, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-07-28 11:33:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Screen dumped
none
Do live migration in this step of installation
none
Windows 7 Guest doing live migration during installation
none
RHEL 5 Guest doing live migration during installation none

Description Mike Cao 2010-07-14 02:58:03 UTC
Created attachment 431658 [details]
Screen dumped

Description of problem:
Migrate during win2008R2 guest installation,cause image errors and installation can not finish.

Version-Release number of selected component (if applicable):
# uname -r
2.6.32-44.el6.x86_64
# rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.91.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1.in the src host start install windows 2008 R2 guest.
# /usr/libexec/qemu-kvm -M rhel6.0.0 -enable-kvm -m 8192 -smp 8,sockets=8,cores=1,threads=1 -name win2k8r2 -uuid 7f6183f3-c95b-4c54-ba08-e6d5e1b6917c -nodefconfig -nodefaults -monitor stdio -rtc base=localtime -boot order=c,once=d -cdrom /mnt/win08-R2.iso -drive file=/mnt/error.qcow2,if=none,id=drive-ide0-0-0,boot=on,format=qcow2,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=52:54:00:72:a4:a8,bus=pci.0,addr=0x4 -chardev pty,id=serial0 -device isa-serial,chardev=serial0 -usb -device usb-tablet,id=input0 -vnc :3 -k en-us -vga std -device AC97,id=sound0,bus=pci.0,addr=0x5 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3

2.start listening mode in remote host.
#<commandLine> -incoming tcp:0:5888

3.Do Live migration during installation
  
Actual results:
Windows 2008 R2 guest can NOT finish installation.
#qemu-img check /mnt/error.qcow2
ERROT cluster .... refcount=1 reference=1
22106 errors were found on the image.


Expected results:
Installation should be completed.

Additional info:
Test both file image and iscsi image ,both hit this issue.

Comment 2 Amit Shah 2010-07-15 11:30:35 UTC
Can you try with virtio disk?

This also could be related to the ide save/restore bug.

Comment 3 Mike Cao 2010-07-19 05:01:57 UTC
(In reply to comment #2)
> Can you try with virtio disk?
> 
> This also could be related to the ide save/restore bug.    

Test with virtio disk.
# /usr/libexec/qemu-kvm -M rhel6.0.0 -enable-kvm -m 4G -smp 8,sockets=8,cores=1,threads=1 -name tt -uuid `uuidgen` -nodefconfig -nodefaults -rtc base=localtime -boot order=c,once=d -drive file=/mnt/share2.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,boot=on,cache=none -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -drive file=/mnt/win08-R2.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,cache=none -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/mnt/virtio-win-1.1.8-0.vfd,if=none,id=drive-fdc0-0-0,cache=none -global isa-fdc.driveA=drive-fdc0-0-0 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:d6:4a:b8,bus=pci.0,addr=0x5 -chardev pty,id=serial0 -device isa-serial,chardev=serial0 -usb -device usb-tablet,id=input0 -vnc :3 -k en-us -vga std -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -monitor stdio -qmp tcp:localhost:4444,server,nowait

Actual Results:
the image was lots of errors and installation can not be finished.

Additional info :
Do migration when the the step of installing guest is formatting system may hit this issue.

Comment 4 Mike Cao 2010-07-19 05:11:30 UTC
Created attachment 432762 [details]
Do live migration in this step of installation

In my test case ,After load virtio driver to find the virtual drive in the guest(referring to screen dump) ,then Do live migration and continue  installation will hit this issue.

Comment 5 Dor Laor 2010-07-19 09:10:09 UTC
What do you mean lots of errors? Does the guest reports that? Does qemu? Does qemu-img check (note that you should only run it when the guest is offline)

Comment 6 Mike Cao 2010-07-19 09:30:34 UTC
(In reply to comment #5)
> What do you mean lots of errors? Does the guest reports that? Does qemu? Does
> qemu-img check (note that you should only run it when the guest is offline)    

#qemu-img check win7.qcow2
ERROR cluster 1039 refcount =1 reference=0
....
ERROR cluster 1548 refcount =1 reference=0
1545 errors were found on the image.

Addtional info :
I test with windows 2008R2 ,windows 7 64bit ,windows xp 64bit guest ,all hit this issue.

Comment 7 Dor Laor 2010-07-19 12:48:36 UTC
Kevin, is the above leak or a real error?
Mike, you still did not answer the question - what happened to the guest/migration.

I just tested live migration of win2k8r2 over qcow2 within LV and it worked.

Comment 8 Kevin Wolf 2010-07-19 12:55:52 UTC
reference=0 is the real number of references, so it's a leak. Still shouldn't happen during a successful migration, of course.

The original description contains a different message that looks made up. Was it in fact the same? "ERROT cluster .... refcount=1 reference=1" is surely not a qemu-img output, it has ERROR misspelt and refcount=reference is not an error anway.

Comment 9 Mike Cao 2010-07-20 02:52:48 UTC
Created attachment 433045 [details]
Windows 7 Guest doing live migration during installation

Comment 10 Mike Cao 2010-07-20 02:57:50 UTC
Created attachment 433048 [details]
RHEL 5 Guest doing live migration during installation

Comment 11 Mike Cao 2010-07-20 03:09:38 UTC
(In reply to comment #7)

> Mike, you still did not answer the question - what happened to the
> guest/migration.
  
  Do live migration during Guest installation, Migration can be done successfully.
The Guest installation can not complete,referring the screen dumped in the attachments.When the guest prompts something like installation can not finish, Turn off the Guest ,check the image by using #qemu-img check XXX, 
it comes out lots of errors as I described in comment #6

> 
> I just tested live migration of win2k8r2 over qcow2 within LV and it worked. 

   As I described in comment #4 and the second attach ,do migration when the virtual disk was found and before it is going to be formatting .and make sure that when the disk is formatting during guest installation , migration operation should be still in progress. 

addtional info:

Following the steps with raw format images will NOT hit this issue.

Comment 12 Amit Shah 2010-07-23 09:59:04 UTC
I just tried this with a qcow2 image and didn't hit the issue.

I tried it when the screen as shown in attachment 2 [details] is displayed. I also tried migration on the next screen (copying files), and both cases worked fine.

Does the error happen right after the migration? Or does it happen after some time?

Can you try a newer version of qemu-kvm?

Comment 14 Dor Laor 2010-07-26 14:12:32 UTC
Probably a dup of Bug 596903  - backport: ide save/restore current transfer fields

Comment 15 Luiz Capitulino 2010-07-26 21:03:20 UTC
Managed to reproduce the problem, got the exact same error as shown in the first attachment.

What I've tried so far:

1. The fix suggested for bug #596903 (upstream commit 42ee76f) did not fix it
2. Tried old git tags (like qemu-kvm-0.12.1.2-2.70.el6), but the bug exists there too

Some notes:

1. I've used the exact same command-line from the original report
2. My qcow2 disk has 10G only
3. Some times I have to do some ping-pong to reproduce it (maximum was 5 migrations during installation)
4. I migrate right after the screen "Where do you want to install Windows" changes

Comment 16 Luiz Capitulino 2010-07-26 21:18:00 UTC
(In reply to comment #12)
> I just tried this with a qcow2 image and didn't hit the issue.
> 
> I tried it when the screen as shown in attachment 2 [details] is displayed. I also tried
> migration on the next screen (copying files), and both cases worked fine.

This is where I get it. I write the migration command when windows is booting, then I hit enter right when the screen changes from "where do you want to install windows" to the next one.

> Does the error happen right after the migration? Or does it happen after some
> time?

It happens when the migrate command is completed on both sides. Doing basic inspection ("info migrate", "info status", etc) I don't see anything wrong.

> Can you try a newer version of qemu-kvm?    

Tried with:

* qemu-kvm-0.12.1.2-2.99.el6.x86_64 (package)
* current rhel6/master (commit e938546)
* current rhel6/next (commit e4879079)

And my kernel is: 2.6.32-52.el6.x86_64.

Comment 17 Luiz Capitulino 2010-07-26 21:26:53 UTC
(In reply to comment #15)

> 1. I've used the exact same command-line from the original report

Err, exaggerated a bit: I'm using 4096 for memory and the tap interfaces are set by hand (eg. ifname=tap0,script=no,downscript=no).

Comment 18 Mike Cao 2010-07-27 06:39:39 UTC
According to comment #16 ,I thought Luiz's steps are exactly as same as mine

Retested win2008 32 bit in newer version
# uname -r
2.6.32-52.el6.x86_64
# rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.99.el6.x86_64

CommandLine:
/usr/libexec/qemu-kvm -M rhel6.0.0 -cpu qemu64,+sse2,+x2apic -enable-kvm -m 8G -smp 8,sockets=8,cores=1,threads=1 -name Win2k8 -uuid `uuidgen` -nodefconfig -monitor stdio -qmp tcp:0:4444,server,nowait -rtc base=utc -cdrom /mnt/en_windows_server_2008_datacenter_enterprise_standard_x86_dvd_X14-26710.iso -boot dc -drive file=/mnt/win2008.qcow2,if=none,id=drive-ide0-0-0,boot=on,format=qcow2,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=a2:54:20:8d:62:99,bus=pci.0,addr=0x5 -chardev pty,id=serial0 -device isa-serial,chardev=serial0 -usb -device usb-tablet,id=input0 -vnc :10 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3

Actual Results:
VM still can not installed successfully.

#qemu-img check /home/win2008.qcow2
Leaked cluster 215 refcount=1 reference=0
....
Leaked cluster 270 refcount=1 reference=0

56 leaked clusters were found on the image.

Comment 19 Dor Laor 2010-07-27 11:05:04 UTC
In order to narrow it down, can you please retest with virtio instead of ide?
The same (in separate for -smp 2 guest)

Comment 20 Juan Quintela 2010-07-27 13:47:36 UTC
Can you told me what NFS server are you using?  Linux NFS server, a NAS,....?

Comment 21 Luiz Capitulino 2010-07-27 14:01:16 UTC
(In reply to comment #20)
> Can you told me what NFS server are you using?  Linux NFS server, a NAS,....?    

The only image I have on NFS is the cdrom one, I can copy it locally.. Will debug it further today, but it's a busy day (patch review, meetings, etc) so it will take a bit.

Comment 22 Luiz Capitulino 2010-07-28 01:16:41 UTC
For the record:

1. Tried with virtio, it does reproduce the problem
2. Tried with -smp 1, it does reproduce the problem too
3. Reduced the command-line to:

sudo ./qemu-rhel6 -M rhel6.0.0 -enable-kvm -m 4096 -smp 1 -monitor stdio -boot d -cdrom /tmp/kvm_autotest_root/isos/windows/en_windows_server_2008_r2_standard_enterprise_datacenter_and_web_x64_dvd_x15-59754.iso -drive file=./disks/error.qcow2,if=none,id=drive-ide0-0-0,boot=on,format=qcow2,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -usb -device usb-tablet,id=input0 -vnc :1 -L /usr/share/qemu-kvm/ -vga std -incoming tcp:0:5557

4. Tested with qemu-kvm-0.12.1.2-2.103.el6.x86_64 (package) and rhel6/next (commit 490bad80)

Now, the new discovery is that I was deleting and re-creating the qcow2 image between tests, because it was damaged. However, if I do the following:

1. Boot the source guest with the command-line above
2. Cancel the installation process right after the disk is choosen
3. Quit the VM

I don't seem to reproduce the problem if I run the tests using the same image from the procedure above. I'm under the impression it's something with the block layer.

Comment 23 Mike Cao 2010-07-28 05:56:04 UTC
(In reply to comment #19)
> In order to narrow it down, can you please retest with virtio instead of ide?
> The same (in separate for -smp 2 guest)    

test on qemu-kvm-0.12.1.2-2.99.el6

CLI:
/usr/libexec/qemu-kvm -M rhel6.0.0 -enable-kvm -m 4G -smp 2 -name win2008 -uuid `uuidgen` -nodefconfig -monitor stdio -rtc base=localtime -boot dc -drive file=mike_win2k8.qcow2,if=none,id=drive-ide0-0-0,boot=on,format=qcow2,cache=none -device virtio-blk-pci,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:72:a4:a8,bus=pci.0,addr=0x4 -chardev pty,id=serial0 -device isa-serial,chardev=serial0 -usb -device usb-tablet,id=input0 -vnc :3 -k en-us -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -cdrom en_windows_server_2008_datacenter_enterprise_standard_x64_dvd_X14-26714.iso -fda virtio-win-1.1.8-0.vfd

Actual Results:
After migraion, VM can not finish installation.

It hit the same issue.

Comment 24 Mike Cao 2010-07-28 05:59:30 UTC
(In reply to comment #20)
> Can you told me what NFS server are you using?  Linux NFS server, a NAS,....?    

In my case,Guest installation ISO is located on Linux NFS Server ,qcow2 Image located on NFS Server or LVM (block device) are both hit this issue.

Comment 25 Amit Shah 2010-07-28 06:21:09 UTC
What we have so far from QE:

- qcow2: fail
- raw: pass
- ide: fail
- virtio: fail

- install iso on nfs
- qcow2 over nfs or lvm (both fail)

From Luiz:

- ide: fail
- virtio: fail
- install iso on nfs as well as local (both fail)
- guest image on nfs as well as local (both fail)

Also, Luiz mentions this in comment #22:

> 1. Boot the source guest with the command-line above
> 2. Cancel the installation process right after the disk is choosen
> 3. Quit the VM

> I don't seem to reproduce the problem if I run the tests using the same image
> from the procedure above. I'm under the impression it's something with the
> block layer.

I think it's related to qcow2 as raw works fine.

Comment 26 Kevin Wolf 2010-07-28 09:02:59 UTC
This is likely the problem that the destination opens the image too early. Juan is working on this problem by forward-porting a RHEL 5 patch, reassigning.

Comment 27 Dor Laor 2010-07-28 11:33:35 UTC

*** This bug has been marked as a duplicate of bug 618601 ***

Comment 28 Luiz Capitulino 2010-07-28 18:00:17 UTC
Just to confirm: I can't reproduce the problem with a quick and dirty version of fix for bug 618601.