Description of problem: I have an iscsi storage domain with a pool of VMs. The VM disks are 2GB (thin provisioned). The system behaves normally until I use ovirt-engine to create a VM snapshot. Once the libvirt snapshot API finishes pivoting to the new volume, vdsm will extend the new leaf volume forever (until it consumes all free space on the block storage domain). Version-Release number of selected component (if applicable): Post 3.4 release (master) vdsm: e826e76c6c7df3791964b41c7720e39a406a98f6 ovirt-engine: 6dc681ba7b1682b9a3eed56e8b986d6a4a06e3ad How reproducible: (For me) Always Steps to Reproduce: 1. Create a pool of VMs from a template on an iscsi sd. The disk should be a 2G thinly provisioned volume. 2. Start a VM 3. Create a disk-only snapshot of the VM 4. Monitor the storage "Used" and "Allocated" fields from ovirt engine for the data domain being used Actual results: "Used" will climb continuously and surpass "Allocated" as long as the VM continues to run. On the vdsm side, the disk apparent/physical size will continue to increase. Expected results: The storage domain should reflect the allocation to accommodate the new snapshot volume. "Used" should not increase dramatically (certainly not above "Allocated". Additional info: See attached vdsm.log and engine.log
Created attachment 889443 [details] vdsm log
Created attachment 889444 [details] engine log
Hmm, I am seeing bad information in the output of virsh domblkinfo as well. I think the problem lies below vdsm actually. The symptom here is that allocation jumps up to be equal to physical in some cases. Federico, might this be a symptom of the lvmetad problem not being solved on my host? I have set use_lvmetad to 0 and made sure that the daemon is not running.
(In reply to Adam Litke from comment #3) > Hmm, I am seeing bad information in the output of virsh domblkinfo as well. > I think the problem lies below vdsm actually. The symptom here is that > allocation jumps up to be equal to physical in some cases. > > Federico, might this be a symptom of the lvmetad problem not being solved on > my host? I have set use_lvmetad to 0 and made sure that the daemon is not > running. No it's not related to lvmetad. What is the libvirt version that you're using?
libvirt-1.2.3 (compiled from source and manually installed on a F20 host)
This is an automated message. oVirt 3.4.1 has been released. This issue has been retargeted to 3.4.2 as it has severity high, please retarget if needed. If this is a blocker please add it to the tracker Bug #1095370
Adam have you discovered if this is caused by the libvirt build you made? Can we close this?
Closing since this seems to be limited to the custom libvirt build I was working with. I'll reopen if I see it in an official libvirt release.
I've reproduced this and determined the root cause. Taking bug.
(In reply to Adam Litke from comment #10) > I've reproduced this and determined the root cause. Taking bug. Can you share the root cause with us?
(In reply to Nir Soffer from comment #11) > (In reply to Adam Litke from comment #10) > > I've reproduced this and determined the root cause. Taking bug. > > Can you share the root cause with us? Of course :) I was just waiting for the fix to appear in gerrit. As explained in http://gerrit.ovirt.org/#/c/28531/1 , When a snapshot XML does not specify that it is of type block, type file is assumed. This has the side effect of converting the libvirt disk to type file. When libvirt is returning the high write watermark information for a drive, it always returns the physical file size for file disks, not the value given by qemu.
(In reply to Adam Litke from comment #12) > (In reply to Nir Soffer from comment #11) > > (In reply to Adam Litke from comment #10) > > > I've reproduced this and determined the root cause. Taking bug. > > > > Can you share the root cause with us? > > Of course :) I was just waiting for the fix to appear in gerrit. > > As explained in http://gerrit.ovirt.org/#/c/28531/1 , When a snapshot XML > does not specify that it is of type block, type file is assumed. This has > the side effect of converting the libvirt disk to type file. When libvirt > is returning the high write watermark information for a drive, it always > returns the physical file size for file disks, not the value given by qemu. Just to add the info from the patch - this behavior "change" was introduced in libvirt 1.2.2.
Adam - this was merged in oVirt 3.5. Is there any sense in backporting it to 3.4.z too?
I guess we probably should. I'll need to backport http://gerrit.ovirt.org/#/c/28531/1 and http://gerrit.ovirt.org/30228 (which fixes a regression introduced by the first patch). Let me build up a test environment and get a patch for it submitted.
After further investigation I am reversing myself regarding a 3.4 backport. I don't see this problem in my 3.4 environment. I think the problem only manifests with newer versions of libvirt (ie. the current fedora 20 version is 1.1.3.5-2 and that works fine). Given this, I think it only makes sense to target to 3.5 where the newer version of libvirt will be utilized.
I ran the scenario from above as follows: Steps to Reproduce: 1. created a VM with 1 disk of 2 GB thinly provisioned on a iscsi block device 2. Created a template of the VM 3. Created a Pool of 3 VMs from the template 4. Brought up one of the VMs 5. Checked the Used and Available space on the Storage Domain 6. Created a snapshot of the VM that was brought up 7. Wrote to the disk of the VM (triend installing and os greater than 2GB) >>> The installation failed which means the size allocation is now correctly limited to the Virtual size of the leaf. Moving to Verified
oVirt 3.5 has been released and should include the fix for this issue.
Ori, can you explain why this bug blocks bug 1176673?
Nir I think that it's an automatic bug tracker, I haven't set this bug as depended (even though I can perfectly understand the logic :))