Bug 910445

Summary: Storage Live Migration of thick disk results in corrupted disk
Product: Red Hat Enterprise Virtualization Manager Reporter: jon.benedict
Component: vdsmAssignee: Federico Simoncelli <fsimonce>
Status: CLOSED ERRATA QA Contact: Dafna Ron <dron>
Severity: high Docs Contact:
Priority: high    
Version: 3.1.0CC: abaron, amureini, bazulay, derez, hateya, iheim, italkohe, jrankin, lpeer, scohen, sgrinber, ykaul, zdover
Target Milestone: ---Keywords: ZStream
Target Release: 3.2.0Flags: scohen: Triaged+
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: vdsm-4.10.2-14.0.el6ev Doc Type: Bug Fix
Doc Text:
Previously, live migrating a preallocated file (e.g. NFS) virtual disk to a different storage domain failed because of a problem in the preparation of the volumes at the destination. Now, it is possible to regularly live migrate preallocated file virtual disks to a other storage domains.
Story Points: ---
Clone Of:
: 949683 (view as bug list) Environment:
Last Closed: 2013-06-10 20:40:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 949683    
Attachments:
Description Flags
engine.log from RHEV-M server
none
logs from rhev-h node
none
logs for rhevh04
none
newer rhevm logs
none
engine log for failed Live Storage Migration w/ thick-h
none
vdsm log for failed Live Storage Migration w/thick-h
none
output of `qemu-img info` on src & dst, vdsm log, engine log, messages log
none
logs from thick-h
none
"qemu-img info" output from source and destination storage domains
none
vdsm log for latest failed storage live migration
none
engine log from latest failed storage live migration
none
vdsm, engine, and qemu-img logs/txt files
none
vdsm and engine logs from successfull run of storage live migration w/ Federico's patches. none

Description jon.benedict 2013-02-12 16:42:26 UTC
Created attachment 696579 [details]
engine.log from RHEV-M server

Description of problem:
Storage live migration of a thick boot disk results in 2 issues: the vm is paused due to "unknown storage error" and the disk itself is corrupted.

Version-Release number of selected component (if applicable):
RHEV 3.1, RHEL 6.3 guest, NetApp Clustered ONTAP 8.1.2, RHEV-H as shipped w/ RHEV 3.1

How reproducible:
reproducible for thick disk, unable to reproduce on thin disk

Steps to Reproduce:
1. Create RHEL 6.3 guest with thickly provisioned disk
2. Allow new VM to reboot post-install
3. Initiate live migration of storage
  
Actual results:
Virtual machine is paused. VM will is not able to resume and requires a non-graceful shutdown. When the VM is restarted, the following message appears in the Spice console: "Booting from Hard, Disk Geom Error_"

Reverting to snapshot results in a functional VM

Expected results:
Storage live migration of thick should result in functioning VM that does not pause and does not require a restore from snapshot

Additional info:

Comment 2 jon.benedict 2013-02-13 12:55:24 UTC
Created attachment 696794 [details]
logs from rhev-h node

Comment 3 Federico Simoncelli 2013-02-15 10:45:09 UTC
(In reply to comment #2)
> Created attachment 696794 [details]
> logs from rhev-h node

I couldn't correlate the engine logs to the vdsm/libvirt ones. Can you double check that you provided all the relevant logs? Are the engine/rhevh machine time synchronized? (that should make it easier for you to find them)

The vdsm logs should contain calls for:

vmSnapshot
cloneImageStructure
vmDiskReplicateStart
syncImageData

The libvirt log should contain calls for:

__com.redhat_drive-mirror

Comment 4 jon.benedict 2013-02-15 15:25:15 UTC
Created attachment 697861 [details]
logs for rhevh04

Comment 5 jon.benedict 2013-02-15 15:26:12 UTC
Created attachment 697863 [details]
newer rhevm logs

Comment 6 jon.benedict 2013-02-15 15:28:10 UTC
Federico - I reproduced the issue again this morning for the sole purpose of getting you more complete logs. I've verified that the VM (live_mig_03) was running on thin hypervisor rhevh04.

Comment 7 Federico Simoncelli 2013-02-15 16:32:40 UTC
There are two parts of the problem. One is that the drive mirroring *on the destination* is failing:

2013-02-15 15:06:42.733+0000: 10773: debug : qemuMonitorIOWrite:432 : QEMU_MONITOR_IO_WRITE: mon=0x7f28f000e840 buf={"execute":"__com.redhat_drive-mirror","arguments":{"device":"drive-virtio-disk0","target":"/rhev/data-center/cf0a7a60-99d1-43de-88d6-a39c8829c00e/fc3dde9a-73f4-473d-8e2b-af0a1b74e185/images/b9a983f8-8099-45cc-9476-36c4bf9256f1/14af948e-3a4b-455b-9a32-7195c29ecb89","full":false,"speed":0,"mode":"existing","format":"qcow2"},"id":"libvirt-434"}
2013-02-15 15:06:42.764+0000: 10773: debug : qemuMonitorIOProcess:327 : QEMU_MONITOR_IO_PROCESS: mon=0x7f28f000e840 buf={"return": {}, "id": "libvirt-434"}

2013-02-15 15:06:42.765+0000: 10773: debug : qemuMonitorIOProcess:327 : QEMU_MONITOR_IO_PROCESS: mon=0x7f28f000e840 buf={"timestamp": {"seconds": 1360940802, "microseconds": 765542}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-disk0", "len": 21474836480, "offset": 0, "speed": 0, "type": "mirror", "error": "Input/output error"}}

I don't have more information on it other than "Input/output error". Maybe you can try to give us more insights on this part (why do we get io errors on the destination?).

That said, vdsm is correctly reporting the issue at vmDiskReplicateFinish time (returning 'Drive replication error').

Thread-530894::DEBUG::2013-02-15 15:08:53,911::BindingXMLRPC::894::vds::(wrapper) client [172.20.44.185]::call vmDiskReplicateFinish with ('7bb1c906-1f9d-41d0-8f55-b022b68185bc', {'device': 'disk', 'domainID': 'e2cebd7d-14c2-421a-b482-ad9072a95eaf', 'volumeID': '14af948e-3a4b-455b-9a32-7195c29ecb89', 'poolID': 'cf0a7a60-99d1-43de-88d6-a39c8829c00e', 'imageID': 'b9a983f8-8099-45cc-9476-36c4bf9256f1'}, {'device': 'disk', 'domainID': 'fc3dde9a-73f4-473d-8e2b-af0a1b74e185', 'volumeID': '14af948e-3a4b-455b-9a32-7195c29ecb89', 'poolID': 'cf0a7a60-99d1-43de-88d6-a39c8829c00e', 'imageID': 'b9a983f8-8099-45cc-9476-36c4bf9256f1'}) {}
Thread-530894::ERROR::2013-02-15 15:08:53,918::libvirtvm::2121::vm.Vm::(diskReplicateFinish) vmId=`7bb1c906-1f9d-41d0-8f55-b022b68185bc`::Replication job not found for disk vda ({'device': 'disk', 'domainID': 'e2cebd7d-14c2-421a-b482-ad9072a95eaf', 'volumeID': '14af948e-3a4b-455b-9a32-7195c29ecb89', 'poolID': 'cf0a7a60-99d1-43de-88d6-a39c8829c00e', 'imageID': 'b9a983f8-8099-45cc-9476-36c4bf9256f1'})
Thread-530894::DEBUG::2013-02-15 15:08:53,933::BindingXMLRPC::900::vds::(wrapper) return vmDiskReplicateFinish with {'status': {'message': 'Drive replication error', 'code': 55}}

Sadly this is not handled correctly by the engine, which goes ahead and deletes the source (instead of the broken destination):

2013-02-15 10:08:53,872 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (pool-3-thread-27) START, VmReplicateDiskFinishVDSCommand(HostName = rhevh04.stl.netapp.com, HostId = 768a6dfe-59be-11e2-8429-52540046a077, vmId=7bb1c906-1f9d-41d0-8f55-b022b68185bc), log id: 4dbd9cdd
2013-02-15 10:08:53,909 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (pool-3-thread-27) FINISH, VmReplicateDiskFinishVDSCommand, log id: 4dbd9cdd
2013-02-15 10:08:53,922 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (pool-3-thread-27) START, DeleteImageGroupVDSCommand( storagePoolId = cf0a7a60-99d1-43de-88d6-a39c8829c00e, ignoreFailoverLimit = false, compatabilityVersion = null, storageDomainId = e2cebd7d-14c2-421a-b482-ad9072a95eaf, imageGroupId = b9a983f8-8099-45cc-9476-36c4bf9256f1, postZeros = false, forceDelete = false), log id: 3a934832
2013-02-15 10:08:55,904 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (pool-3-thread-27) FINISH, DeleteImageGroupVDSCommand, log id: 3a934832

There is probably a problem in the error handling on the engine side but we should try to figure out what's the problem on the destination ("Input/output error").

Comment 8 Daniel Erez 2013-02-16 21:20:49 UTC
The issue in the engine has already been solved as part of commit fbe1b4903e9b64d7eb12f264529432b9e60deca1
[VmReplicateDiskFinishTaskHandler throws an exception on VmReplicateDiskFinish failure - prevents deletion of the source disk]

Comment 9 jon.benedict 2013-02-19 14:30:42 UTC
Federico - what additional information do you need from me? If it's for your question "(why do we get io errors on the destination?)", I have no idea.. it works fine with a thin guest, but breaks on a thick guest.. I'm not getting any errors on the storage side of things.

Comment 10 Ayal Baron 2013-02-25 11:36:47 UTC
Reducing severity since the problematic behaviour was already solved as noted in comment 8

Comment 11 jon.benedict 2013-02-27 21:25:32 UTC
Federico - my lab is back to 100%. I've redeployed one of my hypervisors as a RHEL 6.3 thick hypervisor. The issue remains, but I'm getting different symptoms/errors now on the thick hypervisor as compared to the thin.

For example, in the original issue, the live storage migration was reported as successful, but the VM would pause due to "unknown storage error". Using the thick hypervisor, it reports that the live storage migration failed, but the disk is clearly visible in the new storage area. Additionally, it now reports that the disk is "thin provisioned" (perhaps that is expected?).

The other difference is that on the thin-h, when I attempted to restart the VM it failed with "Booting from Hard, Disk Geom Error_" appearing in the Spice console.. using the thick-h, I don't even get to open the Spice console before the VM start fails with "Exit message:'truesize'".

Let me know all of the logs and files that you would like me to attach from the thick-h and from rhev-m.

Comment 12 Federico Simoncelli 2013-02-28 11:16:42 UTC
Jon, let's try to stick to the rhel-6.3+vdsm (thick-h) because you can benefit of newer packages and the debug is easier.

Can you try a live storage migration and attach here the logs (engine.log and vdsm.log). Can you also provide the components versions (vdsm/qemu-kvm-rhev/libvirt). Thanks.

Comment 13 jon.benedict 2013-02-28 12:39:10 UTC
Created attachment 703894 [details]
engine log for failed Live Storage Migration w/ thick-h

Comment 14 jon.benedict 2013-02-28 12:40:09 UTC
Created attachment 703895 [details]
vdsm log for failed Live Storage Migration w/thick-h

Comment 15 jon.benedict 2013-02-28 15:31:11 UTC
Created attachment 703911 [details]
output of `qemu-img info` on src & dst, vdsm log, engine log, messages log

Comment 16 jon.benedict 2013-02-28 15:36:04 UTC
Federico - not sure what happened to my notes, but I just reran the test as follows:

created a new (thick) 20GB VM
Made the thick-h the SPM
initiated the storage live migration
captured output of `qemu-img info image_name` on src and destination images (saved output in txt files)
captured vdsm, engine, and messages logs

As you'll see in the txt files and the logs, at the end it fails to find the copied image. I ran `ls` and it complained of stale file handles.

Comment 17 Federico Simoncelli 2013-02-28 16:35:47 UTC
(In reply to comment #16)
> Federico - not sure what happened to my notes, but I just reran the test as
> follows:

Thank you Jon for the additional information you provided, they have been critical to understand the issue.

The involved piece of code is (image.py:~64):

 if (volParams['volFormat'] == volume.COW_FORMAT
         or volParams['prealloc'] == volume.SPARSE_VOL):
     volTmpSize = volParams['size']
 else:
     volTmpSize = TEMPORARY_VOLUME_SIZE  # in sectors (10M)

In some way you have been able to create a base disk that is raw preallocated. Did you use the Rest API? (as far as I remember it's blocked from the UI).

You always mentioned a thick disk but I always intended it as raw (sparse) and not a raw preallocated.

Now, that said, the piece of code I quoted is designed to optimize out the preallocated volumes and just create a 10M placeholder (for other flows). This generates the snapshot on top as a 10M virtual size (in the qcow header), as we can observe from your (destination) qemu-img info:

[root@rhevh01 68457151-b4c3-4f57-8b12-2bd186ca5a7e]# qemu-img info b9ad75b4-ec84-496d-b60a-01f30e03f754
image: b9ad75b4-ec84-496d-b60a-01f30e03f754
file format: qcow2
virtual size: 10M (10485760 bytes)
disk size: 140K
cluster_size: 65536

When the VM tries to mirror any data with offset > 10M it receives an IOError and the sync fails.

I'm working on a solution for this.
Thanks again.

Comment 18 jon.benedict 2013-02-28 16:56:50 UTC
Federico - I did not use the rest api to create the VM or disk image. I only used the web admin tool.

Comment 19 Ayal Baron 2013-03-01 22:47:36 UTC
(In reply to comment #18)
> Federico - I did not use the rest api to create the VM or disk image. I only
> used the web admin tool.

Ack, raw preallocated base is fully supported...

Comment 20 Federico Simoncelli 2013-03-04 16:58:54 UTC
image: placeholder optimization on preallocation

When the destination of a copy or move command is an NFS domain
we should try to optimize the placeholders creation avoiding
unnecessary prezeroing (only relevant case: RAW PREALLOCATED).
In such case in the past we were using a TEMPORARY_VOLUME_SIZE
to create a placeholder with a temporary size of 10Mb.
This is interfering with live storage migration as the following
volumes in the chain are inheriting the temporary size inside the
qcow header (virtual disk size).
With this patch we are instead forcing the creation of the volume
as SPARSE as this won't have any side effect on the qcow header.

http://gerrit.ovirt.org/#/c/12692/
Change-Id: I32811bd45320ef02bfc593a71dfdafc0b0550c7d

Comment 21 jon.benedict 2013-03-08 21:15:20 UTC
Federico - I finally had the opportunity to try the new vdsm packages that you posted. Unfortunately, the storage live migration still failed.

I'll post logs and output from `qemu-img info _img_`

Jon

Comment 22 jon.benedict 2013-03-08 21:22:51 UTC
Created attachment 707223 [details]
logs from thick-h

tried storage live migration w/ 4 test vdsm packages from Federico S.

Comment 23 jon.benedict 2013-03-08 21:24:43 UTC
Created attachment 707224 [details]
"qemu-img info" output from source and destination storage domains

Comment 24 jon.benedict 2013-03-09 15:43:39 UTC
Created attachment 707417 [details]
vdsm log for latest failed storage live migration

forgot to attach this last night

Comment 25 jon.benedict 2013-03-09 15:45:52 UTC
Created attachment 707418 [details]
engine log from latest failed storage live migration

forgot to add this last night

Comment 26 Federico Simoncelli 2013-03-10 13:51:41 UTC
(In reply to comment #24)
> Created attachment 707417 [details]
> vdsm log for latest failed storage live migration
> 
> forgot to attach this last night

From the vdsm log it looks like the host (where the VM was running) was not the SPM. Since the fix actually resolves a problem in cloneImageStructure you should double check that you used the patched rpm on the SPM too. Thanks.

Comment 27 jon.benedict 2013-03-11 13:16:51 UTC
Created attachment 708366 [details]
vdsm, engine, and qemu-img logs/txt files

Comment 28 jon.benedict 2013-03-11 13:20:28 UTC
Federico - the latest tarball contains the vdsm log, engine log, and the qemu-img text files. I confirmed that the thick-h is both the host for the Storage Live Migration VM and the SPM.

The Storage Live Migration itself worked this time, although the event log in RHEVM says that it failed. Also, qemu-img was no longer able to get information from the disk images once the storage live migration finished.

Comment 29 jon.benedict 2013-03-11 15:52:47 UTC
The last test (comment 28) is not valid. Please disregard.

Comment 30 jon.benedict 2013-03-11 15:54:35 UTC
Created attachment 708476 [details]
vdsm and engine logs from successfull run of storage live migration w/ Federico's patches.

Comment 31 jon.benedict 2013-03-11 15:56:54 UTC
The latest test w/ Federico's patches was successful. This looks to be fixed on my end. Thanks, Federico.

Comment 34 Dafna Ron 2013-04-08 08:49:34 UTC
moved a preallocated disk on NFS storage with no failure
verfied on sf13 with vdsm-4.10.2-14.0.el6ev.x86_64

Comment 38 errata-xmlrpc 2013-06-10 20:40:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0886.html