Bugzilla (bugzilla.redhat.com) will be under maintenance for infrastructure upgrades and will not be available on July 31st between 12:30 AM - 05:30 AM UTC. We appreciate your understanding and patience. You can follow status.redhat.com for details.
Bug 1048915 - QCOW2 disk corruption after a "thin provition" guest paused
Summary: QCOW2 disk corruption after a "thin provition" guest paused
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.2.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: 3.5.0
Assignee: Nir Soffer
QA Contact: Aharon Canan
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-01-06 13:19 UTC by Juan Sebastian Castro
Modified: 2017-11-14 17:00 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-02-23 08:16:33 UTC
oVirt Team: Storage
Target Upstream Version:
tnisan: Triaged+


Attachments (Terms of Use)

Description Juan Sebastian Castro 2014-01-06 13:19:24 UTC
Description of problem: Guest vm with thin privision allocation policy got paused. Because of that there was no possibility to recover the QCOW2 disk and the data is now corrupt.


Version-Release number of selected component (if applicable):

VDSM: vdsm-4.10.2-27.0.el6ev
RHEV Hypervisor: 6.4 - 20131016.0.el6


How reproducible: Undetermined


Steps to Reproduce:
1. Create a guest vm with thin provisioning disk
2. Wait for VM to flip its state to pause with the message: "VM has been paused fue to a storage IO error"


Actual results: The VM got paused


Expected results: The VM shouldn't be paused.


Additional info: Modifications to the vdsm.conf file were applied to workaround the issue without possitive results (KCS: https://access.redhat.com/site/solutions/385283)

Comment 1 Ayal Baron 2014-01-06 13:25:35 UTC
VMs automatically move to paused if they receive EIO in RHEV, this has nothing to do with thin provisioning.
Resuming the VM would make it retry the same I/O.  If the original storage problem (that caused the EIO) persists then it will pause again.

1. What make you determine that there is a qcow2 corruption here?
2. please attach vdsm and libvirt logs.

Thanks.

Comment 9 Allon Mureinik 2014-01-27 13:25:21 UTC
Nir, please take a look.

Comment 10 Nir Soffer 2014-01-28 20:14:24 UTC
Please attach engine, vdsm, libvirt and qemu logs.

engine: /path/to/ovirt-engine/var/log/engine.log
vdsm: /var/log/vdsm/vdsm.log
libvirt: /var/log/libvirtd.log
qemu: /var/log/libvirt/qemu/vmname.log

Comment 11 Nir Soffer 2014-01-28 20:54:22 UTC
Please attach also the output of "qemu-img info /path/to/disk" for all disks on this vm.

Comment 12 Kevin Wolf 2014-01-29 09:04:54 UTC
What is the oldest qemu-kvm version that was used with this specific image?
Trying to make sure that it's not a duplicate of bug 974617 (copied as bug 996151
for 6.4.z), which was fixed in qemu-kvm-0.12.1.2-2.355.el6_4.7.

Comment 14 Brian Hamrick 2014-02-17 14:38:18 UTC
This is for Juan to answer.  I am not technically driving this bug.

Comment 15 Ayal Baron 2014-02-23 08:16:33 UTC
We cannot make progress without further info.  Please reopen once it is available.


Note You need to log in before you can comment on or make changes to this bug.