1048915 – QCOW2 disk corruption after a "thin provition" guest paused

Bug 1048915 - QCOW2 disk corruption after a "thin provition" guest paused

Summary: QCOW2 disk corruption after a "thin provition" guest paused

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	vdsm
Sub Component:
Version:	3.2.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	3.5.0
Assignee:	Nir Soffer
QA Contact:	Aharon Canan
Docs Contact:
URL:
Whiteboard:	storage
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-01-06 13:19 UTC by Juan Sebastian Castro
Modified:	2017-11-14 17:00 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-02-23 08:16:33 UTC
oVirt Team:	Storage
Target Upstream Version:
Embargoed:
Flags:	tnisan: Triaged+

Attachments	(Terms of Use)

Description Juan Sebastian Castro 2014-01-06 13:19:24 UTC

Description of problem: Guest vm with thin privision allocation policy got paused. Because of that there was no possibility to recover the QCOW2 disk and the data is now corrupt.


Version-Release number of selected component (if applicable):

VDSM: vdsm-4.10.2-27.0.el6ev
RHEV Hypervisor: 6.4 - 20131016.0.el6


How reproducible: Undetermined


Steps to Reproduce:
1. Create a guest vm with thin provisioning disk
2. Wait for VM to flip its state to pause with the message: "VM has been paused fue to a storage IO error"


Actual results: The VM got paused


Expected results: The VM shouldn't be paused.


Additional info: Modifications to the vdsm.conf file were applied to workaround the issue without possitive results (KCS: https://access.redhat.com/site/solutions/385283)

Comment 1 Ayal Baron 2014-01-06 13:25:35 UTC

VMs automatically move to paused if they receive EIO in RHEV, this has nothing to do with thin provisioning.
Resuming the VM would make it retry the same I/O.  If the original storage problem (that caused the EIO) persists then it will pause again.

1. What make you determine that there is a qcow2 corruption here?
2. please attach vdsm and libvirt logs.

Thanks.

Comment 9 Allon Mureinik 2014-01-27 13:25:21 UTC

Nir, please take a look.

Comment 10 Nir Soffer 2014-01-28 20:14:24 UTC

Please attach engine, vdsm, libvirt and qemu logs.

engine: /path/to/ovirt-engine/var/log/engine.log
vdsm: /var/log/vdsm/vdsm.log
libvirt: /var/log/libvirtd.log
qemu: /var/log/libvirt/qemu/vmname.log

Comment 11 Nir Soffer 2014-01-28 20:54:22 UTC

Please attach also the output of "qemu-img info /path/to/disk" for all disks on this vm.

Comment 12 Kevin Wolf 2014-01-29 09:04:54 UTC

What is the oldest qemu-kvm version that was used with this specific image?
Trying to make sure that it's not a duplicate of bug 974617 (copied as bug 996151
for 6.4.z), which was fixed in qemu-kvm-0.12.1.2-2.355.el6_4.7.

Comment 14 Brian Hamrick 2014-02-17 14:38:18 UTC

This is for Juan to answer.  I am not technically driving this bug.

Comment 15 Ayal Baron 2014-02-23 08:16:33 UTC

We cannot make progress without further info.  Please reopen once it is available.

Note You need to log in before you can comment on or make changes to this bug.