Bug 816505 - [qemu-kvm]disk checking for consistency happens sometimes when rebooting guest after migrating.
[qemu-kvm]disk checking for consistency happens sometimes when rebooting gues...
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
7.0
Unspecified Unspecified
medium Severity medium
: rc
: 7.0
Assigned To: Ronen Hod
Virtualization Bugs
:
Depends On:
Blocks: 798682
  Show dependency treegraph
 
Reported: 2012-04-26 05:52 EDT by dawu
Modified: 2014-04-28 07:43 EDT (History)
19 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-04-28 07:43:08 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
win2k8-32-diskchecking-1 (9.62 KB, image/png)
2012-04-26 05:54 EDT, dawu
no flags Details
win2k8-32-diskchecking-2 (16.41 KB, image/png)
2012-04-26 05:54 EDT, dawu
no flags Details
win2k8-32-diskchecking-3 (21.39 KB, image/png)
2012-04-26 05:55 EDT, dawu
no flags Details
2k8-32-NoBalloonDriver-1 (15.11 KB, image/png)
2012-04-27 05:14 EDT, dawu
no flags Details
2k8-32-noBalloonDriver-2 (16.57 KB, image/png)
2012-04-27 05:14 EDT, dawu
no flags Details
2k3-64-noBalloonDriver-1 (16.92 KB, image/png)
2012-04-27 05:15 EDT, dawu
no flags Details
2k3-64-noballoonDriver-2 (21.78 KB, image/png)
2012-04-27 05:16 EDT, dawu
no flags Details
ErrorLogFromKernel.txt for win2k8-32 (5.25 KB, text/plain)
2012-05-04 00:18 EDT, dawu
no flags Details

  None (edit)
Description dawu 2012-04-26 05:52:04 EDT
Description of problem:
Disk checking for consistency happened sometimes when rebooting guest after migrating with balloon driver. especially when migrating after some operation for balloon driver such as evict balloon size or hot-plug/unplug driver.

Version-Release number of selected component (if applicable):
kernel-2.6.32-266.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.282.el6.x86_64
seabios: seabios-0.6.1.2-19.el6 
virtio-win-prewhql-0.1-26

How reproducible:
70%

Steps to Reproduce:
1.Boot one guest on src host with balloon driver
  /usr/libexec/qemu-kvm -m 2G -smp 2 -cpu cpu64-rhel6,+x2apic -usb -device usb-tablet -drive file=win2k8-32-fun-balloon.raw,format=raw,if=none,id=drive-ide0-0-0,werror=stop,rerror=stop,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,id=hostnet0,script=/etc/qemu-ifup0 -device e1000,netdev=hostnet0,mac=00:10:16:23:78:01,bus=pci.0,addr=0x4 -uuid a65b5920-b410-4606-8c4a-eb2eacb58f96 -rtc base=localtime -no-kvm-pit-reinjection -monitor stdio -name win2k8-32 -spice disable-ticketing,port=5931 -vga qxl -device virtio-balloon-pci,addr=0x6,bus=pci.0,id=virtio-balloon -bios /usr/share/seabios/bios-pm.bin

2. Boot another guest on des host with CLI:
   /usr/libexec/qemu-kvm -m 2G -smp 2 -cpu cpu64-rhel6,+x2apic -usb -device usb-tablet -drive file=win2k8-32-fun-balloon.raw,format=raw,if=none,id=drive-ide0-0-0,werror=stop,rerror=stop,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,id=hostnet0,script=/etc/qemu-ifup0 -device e1000,netdev=hostnet0,mac=00:10:16:23:78:01,bus=pci.0,addr=0x4 -uuid a65b5920-b410-4606-8c4a-eb2eacb58f96 -rtc base=localtime -no-kvm-pit-reinjection -monitor stdio -name win2k8-32 -spice disable-ticketing,port=5931 -vga qxl -device virtio-balloon-pci,addr=0x6,bus=pci.0,id=virtio-balloon -bios /usr/share/seabios/bios-pm.bin -incoming tcp:0:5800

2.Do some operation for balloon driver on src guest such as evict balloon size or hot-plug/unplug driver, or just do nothing.

3.Implement migration.

4.Shutdown two guests from src host and des host.

5.Restart guest from des host.
  
Actual results:
Disk checking for consistency happened sometimes.  cut out some screen shots for this issue, please refer to the attachments of "win2k8-32-diskchecking-1.png", "win2k8-32-diskchecking-2.png" and "win2k8-32-diskchecking-3.png".

Expected results:
Guest can start normally without any disk checking.

Additional info:
1. Only tested this issue on win2k8-32 and win2k3-64, both hit this issue.
2. This issue also happened when migrating without any operation on balloon driver,but not easily to reproduce, it easier to reproduce when  migrating after some operation for balloon driver such as evict balloon size or hot-plug/unplug driver.
Comment 1 dawu 2012-04-26 05:54:16 EDT
Created attachment 580417 [details]
win2k8-32-diskchecking-1
Comment 2 dawu 2012-04-26 05:54:58 EDT
Created attachment 580418 [details]
win2k8-32-diskchecking-2
Comment 3 dawu 2012-04-26 05:55:46 EDT
Created attachment 580419 [details]
win2k8-32-diskchecking-3
Comment 4 dawu 2012-04-26 05:58:35 EDT
Tested without balloon driver on win2k3-64 for three times, didn't hit this issue
,I'll try more times on win2k8-32 without balloon driver and update the results.

Thanks!
Best Regards,
Dawn
Comment 6 dawu 2012-04-27 05:12:28 EDT
(In reply to comment #4)
> Tested without balloon driver on win2k3-64 for three times, didn't hit this
> issue
> ,I'll try more times on win2k8-32 without balloon driver and update the
> results.
> 
> Thanks!
> Best Regards,
> Dawn

1.Tried more times ping-pong migration between 2 hosts on fresh images of win2k8-32 nd win2k3-64 without balloon driver, also hit this issue, CLI:

/usr/libexec/qemu-kvm -m 2G -smp 2 -cpu cpu64-rhel6,+x2apic -usb -device usb-tablet -drive file=win2k3-64-fun-balloon.raw,format=raw,if=none,id=drive-ide0-0-0,werror=stop,rerror=stop,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,id=hostnet0,script=/etc/qemu-ifup0 -device e1000,netdev=hostnet0,mac=00:10:16:23:78:01,bus=pci.0,addr=0x4 -uuid a65b5920-b410-4606-8c4a-eb2eacb58f96 -rtc base=localtime -no-kvm-pit-reinjection -monitor stdio -name rhel63 -spice disable-ticketing,port=5931 -vga qxl  -bios /usr/share/seabios/bios-pm.bin

Please refer to the attached for "2k8-32-NoBalloonDriver-1.png", "2k8-32-noBalloonDriver-2.png", "2k3-64-noBalloonDriver-1.png" and "2k3-64-noballoonDriver-2.png".

Note:
Md5sum for image is different before migration and after migration -> shutdown -> re-start, since there is file corrupted or missing.

2.For rhel6.3 guest, I tried 5 times, no any issue found till now.

Best Regards,
Dawn
Comment 7 dawu 2012-04-27 05:14:14 EDT
Created attachment 580685 [details]
2k8-32-NoBalloonDriver-1
Comment 8 dawu 2012-04-27 05:14:59 EDT
Created attachment 580686 [details]
2k8-32-noBalloonDriver-2
Comment 9 dawu 2012-04-27 05:15:44 EDT
Created attachment 580687 [details]
2k3-64-noBalloonDriver-1
Comment 10 dawu 2012-04-27 05:16:15 EDT
Created attachment 580688 [details]
2k3-64-noballoonDriver-2
Comment 11 Yan Vugenfirer 2012-05-03 04:34:56 EDT
Please check guest event log for kernel crash entries.
Comment 12 dawu 2012-05-04 00:17:36 EDT
(In reply to comment #11)
> Please check guest event log for kernel crash entries.

Hi Yan,

Please refer to the attachment of "ErrorLogFromKernel.txt" for details, I collected Kernel-General event logs from the event viewer, if it's not what you need, please let me know.

Thanks!

Best Regards,
Dawn
Comment 13 dawu 2012-05-04 00:18:36 EDT
Created attachment 582019 [details]
ErrorLogFromKernel.txt for win2k8-32
Comment 14 Juan Quintela 2012-06-08 07:41:39 EDT
This looks like one IDE bug.  We haven't been able to reproduce it yet.  Could you:
- try to reproduce from libvirt (although all options look right)
- take screenshots of the IDE controller properties in the
  migration destination after each migration, reboot, and only attach them
  when they get a disk check.

Just to be sure if we find any pattern there.

Thanks, Juan.
Comment 15 dawu 2012-06-11 06:26:46 EDT
(In reply to comment #14)
> This looks like one IDE bug.  We haven't been able to reproduce it yet. 
> Could you:
> - try to reproduce from libvirt (although all options look right)
Hi Juan,
I have tried on win2k8-32 for 6 times, didn't hit this issue on libvirt.

> - take screenshots of the IDE controller properties in the
>   migration destination after each migration, reboot, and only attach them
>   when they get a disk check.
I'd like to confirm with you for the IDE controller properties for two points:
1. Is it refer to "intel(R) 82371SB PCI Bus Master IDE Controller" under the path
Device Manager -> IDE ATA/ATAPI controllers -> intel(R) 82371SB PCI Bus Master IDE Controller

2. If it is, what do you focus on? info for all tabs ("General" / "Driver" / "Details" / "Resources")? If it is, I'll take screen for each tab one by one, and for content of tab "Details", there are many options, so if needed ,could you tell me which options you want to know so that I can take response info for you.

3. you said "reboot, and only attach them when they get a disk check." You mean to take screen when get a disk check just like screen of "win2k8-32-diskchecking-1",right?

Please refer to the attachment "IDE_properties.JPG" for details.

Thanks!
Best Regards,
Dawn

> 
> Just to be sure if we find any pattern there.
> 
> Thanks, Juan.
Comment 16 Juan Quintela 2012-07-13 08:35:49 EDT
Could you test using virtio block and networking and see if the problem goes away?  Suspicion is that the problem is in ide, code, but that would help confirm it.
Comment 17 dawu 2012-07-19 04:06:17 EDT
(In reply to comment #16)
> Could you test using virtio block and networking and see if the problem goes
> away?  Suspicion is that the problem is in ide, code, but that would help
> confirm it.

Hi Juan,

This issue still happened when using virtio block and networking.
This issue reproduce not easily, sometimes, the first run of migration can hit this issue, but sometimes, you'll hit this issue after many loops for  ping-pong migration.

Best Regards,
Dawn
Comment 18 Marcelo Tosatti 2012-07-30 20:31:14 EDT
From the event viewer file:

"{Registry Hive Recovered} Registry hive (file): '\SystemRoot\System32\Config\SOFTWARE' was corrupted and it has been recovered. Some data might have been lost."

So Windows performed chkdisk because it encounters file system corruption.

Can you describe details of shared storage setup.
Comment 19 dawu 2012-07-30 22:57:19 EDT
(In reply to comment #18)
> From the event viewer file:
> 
> "{Registry Hive Recovered} Registry hive (file):
> '\SystemRoot\System32\Config\SOFTWARE' was corrupted and it has been
> recovered. Some data might have been lost."
> 
> So Windows performed chkdisk because it encounters file system corruption.
> 
> Can you describe details of shared storage setup.

Hi Marcelo,

shared storage setup steps:
on shared host hostC:
1. vi /etc/exports
/home *(rw,no_root_squash)

2. service nfs start

on test hosts hostA and hostB
3. mount hostA:/home /mnt   on hostA
   mount hostB:/home /mnt   on hostB


Best Regard,
Dawn
Comment 20 Juan Quintela 2012-08-06 05:43:23 EDT
I guess you mean:

mount hostc:/home /mnt

on both hosts, right?
Comment 21 dawu 2012-08-06 21:32:49 EDT
(In reply to comment #20)
> I guess you mean:
> 
> mount hostc:/home /mnt
> 
> on both hosts, right?

Juan, 

Sorry for my typing mistakes , you are right.

Best Regards,
Dawn
Comment 22 Karen Noel 2012-08-10 08:25:03 EDT
Vadim, Can you reproduce this and help figure out what's happening from within Windows?
Comment 25 Juan Quintela 2012-08-20 11:54:39 EDT
writethrough cache option is not valid for migration.  Their example show cache=none in all the disks.  My suspcion was in the balloon driver was alos doing something strange, but I don't know either :-(  bug is as starnge as it can be.
Comment 26 Vadim Rozenfeld 2012-08-21 02:04:46 EDT
does it mean that the problem is not reproducible if balloon was deflated
before migration?
Comment 34 juzhang 2014-04-24 22:12:36 EDT
Hi Qian,

Can you have a test and update the testing result?

Best Regards,
Junyi
Comment 35 Qian Guo 2014-04-28 04:08:48 EDT
(In reply to juzhang from comment #34)
> Hi Qian,
> 
> Can you have a test and update the testing result?
> 
> Best Regards,
> Junyi

Test this bug according to comment #6
in RHEL7 hosts and with windows 2008 32bit guest, test 10 times, can  not be reprodcued.

Components
# rpm -q qemu-kvm
qemu-kvm-1.5.3-60.el7.x86_64
# uname -r
3.10.0-121.el7.x86_64

For nfs server:
# cat /etc/exports
/home/  *(rw,no_root_squash)

cli:
# /usr/libexec/qemu-kvm -m 4G -smp 4 -cpu Penryn -usb -device usb-tablet -drive file=/mnt/win2008-32.qcow2,format=qcow2,if=none,id=drive-ide0-0-0,werror=stop,rerror=stop,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,id=hostnet0,script=/etc/ovs-ifup,downscript=/etc/ovs-ifdown -device e1000,netdev=hostnet0,mac=00:10:16:23:78:01,bus=pci.0,addr=0x4 -uuid a65b5920-b410-4606-8c4a-eb2eacb58f96 -rtc base=localtime -no-kvm-pit-reinjection -monitor stdio -name m2008 -vnc :10 -vga std  -bios /usr/share/seabios/bios.bin  -boot menu=on

Steps:
migration -> shutdown -> re-start

Test for 10 times, can not be reproduced

Thanks,
Comment 36 Ronen Hod 2014-04-28 07:43:08 EDT
I do not see us doing anything with this BZ. It does not reproduce well, more so in RHEL7.
I will close it, and we can reopen once we have a reproducer.

Note You need to log in before you can comment on or make changes to this bug.