Bug 1091094

Summary:

VM Disk extended after snapshot until block storage domain is out of space

Product:

[Retired] oVirt

Reporter:

Adam Litke <alitke>

Component:

vdsm

Assignee:

Adam Litke <alitke>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Kevin Alon Goldblatt <kgoldbla>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

3.4

CC:

alitke, amureini, bazulay, bugs, eblake, fsimonce, gklein, iheim, mgoldboi, nsoffer, ogofen, rbalakri, tnisan, yeylon

Target Milestone:

---

Keywords:

Reopened

Target Release:

3.5.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

storage

Fixed In Version:

v4.16.0

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2014-10-17 12:38:23 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

Storage

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1174791, 1176673, 1198128

Attachments:

Description	Flags
vdsm log	none
engine log	none

Description Adam Litke 2014-04-24 21:08:21 UTC

Description of problem:
I have an iscsi storage domain with a pool of VMs.  The VM disks are 2GB (thin provisioned).  The system behaves normally until I use ovirt-engine to create a VM snapshot.  Once the libvirt snapshot API finishes pivoting to the new volume, vdsm will extend the new leaf volume forever (until it consumes all free space on the block storage domain).

Version-Release number of selected component (if applicable):
Post 3.4 release (master)
vdsm: e826e76c6c7df3791964b41c7720e39a406a98f6
ovirt-engine: 6dc681ba7b1682b9a3eed56e8b986d6a4a06e3ad

How reproducible: (For me) Always


Steps to Reproduce:
1. Create a pool of VMs from a template on an iscsi sd.  The disk should be a 2G thinly provisioned volume.
2. Start a VM
3. Create a disk-only snapshot of the VM
4. Monitor the storage "Used" and "Allocated" fields from ovirt engine for the data domain being used

Actual results:
"Used" will climb continuously and surpass "Allocated" as long as the VM continues to run.  On the vdsm side, the disk apparent/physical size will continue to increase.


Expected results:
The storage domain should reflect the allocation to accommodate the new snapshot volume.  "Used" should not increase dramatically (certainly not above "Allocated".

Additional info:
See attached vdsm.log and engine.log

Comment 1 Adam Litke 2014-04-24 21:09:09 UTC

Created attachment 889443 [details]
vdsm log

Comment 2 Adam Litke 2014-04-24 21:09:41 UTC

Created attachment 889444 [details]
engine log

Comment 3 Adam Litke 2014-04-25 21:35:34 UTC

Hmm, I am seeing bad information in the output of virsh domblkinfo as well.  I think the problem lies below vdsm actually.  The symptom here is that allocation jumps up to be equal to physical in some cases.

Federico, might this be a symptom of the lvmetad problem not being solved on my host?  I have set use_lvmetad to 0 and made sure that the daemon is not running.

Comment 4 Federico Simoncelli 2014-04-28 14:54:51 UTC

(In reply to Adam Litke from comment #3)
> Hmm, I am seeing bad information in the output of virsh domblkinfo as well. 
> I think the problem lies below vdsm actually.  The symptom here is that
> allocation jumps up to be equal to physical in some cases.
> 
> Federico, might this be a symptom of the lvmetad problem not being solved on
> my host?  I have set use_lvmetad to 0 and made sure that the daemon is not
> running.

No it's not related to lvmetad. What is the libvirt version that you're using?

Comment 5 Adam Litke 2014-04-28 17:06:36 UTC

libvirt-1.2.3 (compiled from source and manually installed on a F20 host)

Comment 6 Adam Litke 2014-04-28 17:06:36 UTC

libvirt-1.2.3 (compiled from source and manually installed on a F20 host)

Comment 7 Sandro Bonazzola 2014-05-08 13:52:13 UTC

This is an automated message.

oVirt 3.4.1 has been released.
This issue has been retargeted to 3.4.2 as it has severity high, please retarget if needed.
If this is a blocker please add it to the tracker Bug #1095370

Comment 8 Federico Simoncelli 2014-05-08 20:13:07 UTC

Adam have you discovered if this is caused by the libvirt build you made? Can we close this?

Comment 9 Adam Litke 2014-06-09 13:57:29 UTC

Closing since this seems to be limited to the custom libvirt build I was working with.  I'll reopen if I see it in an official libvirt release.

Comment 10 Adam Litke 2014-06-09 20:00:06 UTC

I've reproduced this and determined the root cause.  Taking bug.

Comment 11 Nir Soffer 2014-06-09 20:05:19 UTC

(In reply to Adam Litke from comment #10)
> I've reproduced this and determined the root cause.  Taking bug.

Can you share the root cause with us?

Comment 12 Adam Litke 2014-06-09 20:50:47 UTC

(In reply to Nir Soffer from comment #11)
> (In reply to Adam Litke from comment #10)
> > I've reproduced this and determined the root cause.  Taking bug.
> 
> Can you share the root cause with us?

Of course :) I was just waiting for the fix to appear in gerrit.

As explained in http://gerrit.ovirt.org/#/c/28531/1 , When a snapshot XML does not specify that it is of type block, type file is assumed.  This has the side effect of converting the libvirt disk to type file.  When libvirt is returning the high write watermark information for a drive, it always returns the physical file size for file disks, not the value given by qemu.

Comment 13 Allon Mureinik 2014-06-10 08:52:52 UTC

(In reply to Adam Litke from comment #12)
> (In reply to Nir Soffer from comment #11)
> > (In reply to Adam Litke from comment #10)
> > > I've reproduced this and determined the root cause.  Taking bug.
> > 
> > Can you share the root cause with us?
> 
> Of course :) I was just waiting for the fix to appear in gerrit.
> 
> As explained in http://gerrit.ovirt.org/#/c/28531/1 , When a snapshot XML
> does not specify that it is of type block, type file is assumed.  This has
> the side effect of converting the libvirt disk to type file.  When libvirt
> is returning the high write watermark information for a drive, it always
> returns the physical file size for file disks, not the value given by qemu.

Just to add the info from the patch - this behavior "change" was introduced in libvirt 1.2.2.

Comment 14 Allon Mureinik 2014-07-18 17:39:16 UTC

Adam - this was merged in oVirt 3.5.
Is there any sense in backporting it to 3.4.z too?

Comment 15 Adam Litke 2014-07-18 19:02:30 UTC

I guess we probably should.  I'll need to backport http://gerrit.ovirt.org/#/c/28531/1 and http://gerrit.ovirt.org/30228 (which fixes a regression introduced by the first patch).  Let me build up a test environment and get a patch for it submitted.

Comment 16 Adam Litke 2014-07-21 19:12:31 UTC

After further investigation I am reversing myself regarding a 3.4 backport.  I don't see this problem in my 3.4 environment.  I think the problem only manifests with newer versions of libvirt (ie. the current fedora 20 version is 1.1.3.5-2 and that works fine).  Given this, I think it only makes sense to target to 3.5 where the newer version of libvirt will be utilized.

Comment 17 Kevin Alon Goldblatt 2014-08-17 14:33:33 UTC

I ran the scenario from above as follows:

Steps to Reproduce:
1. created a VM with 1 disk of 2 GB thinly provisioned on a iscsi block device
2. Created a template of the VM
3. Created a Pool of 3 VMs from the template
4. Brought up one of the VMs 
5. Checked the Used and Available space on the Storage Domain
6. Created a snapshot of the VM that was brought up
7. Wrote to the disk of the VM (triend installing and os greater than 2GB) >>> The installation failed which means the size allocation is now correctly limited to the Virtual size of the leaf. 


Moving to Verified

Comment 18 Sandro Bonazzola 2014-10-17 12:38:23 UTC

oVirt 3.5 has been released and should include the fix for this issue.

Comment 19 Nir Soffer 2014-12-23 08:06:47 UTC

Ori, can you explain why this bug blocks bug 1176673?

Comment 20 Ori Gofen 2014-12-23 09:21:55 UTC

Nir I think that it's an automatic bug tracker, I haven't set this bug as depended (even though I can perfectly understand the logic :))