1060287 – RFE: support pools of thin lvm

Bug 1060287 - RFE: support pools of thin lvm

Summary: RFE: support pools of thin lvm

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	Virtualization Tools
Classification:	Community
Component:	libvirt
Sub Component:
Version:	unspecified
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	Libvirt Maintainers
QA Contact:
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	860476 1363786 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-01-31 17:10 UTC by David Jaša
Modified:	2023-03-08 22:37 UTC (History)
CC List:	18 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2020-11-03 17:00:46 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	860476	0	unspecified	CLOSED	RFE: support lvcreate thin provisioning	2021-02-22 00:41:40 UTC
Red Hat Bugzilla	1190910	0	unspecified	CLOSED	ThinVolume usage causes libvirt deactivation of the storage pool based on the LVM Volume Group	2021-02-22 00:41:40 UTC

Internal Links: 860476 1190910

Description David Jaša 2014-01-31 17:10:25 UTC

Description of problem:
libvirt can not use thin-provisioned lvm natively as a pool, only as individual devices. This is suboptimal given that thinly-provisioned lvm provides quite attractive features such as:
  * automatic free space reclamation based on guest-issued discard commands
    (requires virtio-scsi with discard=unmap to pass discards down to lvm)
  * long and branchy snapshot chains without performance degradation
  * each snapshot visible as a raw device to qemu, eliminating all of qcow2
    overhead without sacrificing its key features
  * online snapshot operations

Adding Layered Product: RHEV as this feature will be interesting to RHEV once multi-host support is also added to thinly-provisioned lvm, but the feature would be beneficial to libvirt even without this consumer.

Version-Release number of selected component (if applicable):
libvirt 1.1

Comment 2 Pavel Hrdina 2015-02-17 10:39:54 UTC

Hi, are you still interested in this feature?

Comment 3 David Jaša 2015-02-17 12:31:40 UTC

Hi, this is somewhat of lower priority to me personally as I switched to alternate ways.

I think however that dm-thin integration could greatly help standalone libvirt host installations because of possible performance and space efficiency gains when using dm-thin for snapshots (e.g. lengthy snapshot operations with qemu-img could be done in seconds when using native dm-thin capabilities).

Comment 4 John Ferlan 2015-10-02 14:03:41 UTC

Not sure what caused me to trip across reading this, but I did... Anyway, check out patches I posted a while back upstream on this:

http://www.redhat.com/archives/libvir-list/2014-December/msg00705.html

In particular:

http://www.redhat.com/archives/libvir-list/2014-December/msg00706.html

which was mostly rejected as it felt like a new feature. In order to fix the referenced bug from the series, I took the simpler route of adding the "--type snapshot" keeping/using the "old" syntax.  

Feel free to borrow (or otherwise use) changes from the patch I threw away. I still have it in a branch which I keep up with the top of the branch.

Comment 5 John Ferlan 2016-02-03 20:44:42 UTC

Since there was another foray in this area with a set of patches upstream:

http://www.redhat.com/archives/libvir-list/2016-February/msg00073.html

I figured rather than leaving some thoughts upstream to be forgotten or need to be searched on later, I'd leave them in the bz...

Building upon the thoughts left there regarding whether to create a separate 'thin-logical' volume pool as opposed to a 'logical' volume pool that has thin-pool lv's contained...

A single volume group won't be usable by two libvirt pools since the name of the libvirt pool is designed to match the name of the vg...

When using virsh pool-create-as $name, the code will search the existing volume groups for the name passed ($name == vgname) - if not found, then there's a failure (virStorageBackendLogicalMatchPoolSource).

When using virsh pool-create $file, we have a similar failure. If we try to use the --build option, then we have to provide the '<device path='%s'/> for that pool in our XML $file. When the buildPool is called, it tries to 'vgcreate $name $device' and fails if $device is already used by another vg (virStorageBackendLogicalBuildPool).

Overhauling the logic to somehow separate things will probably run into more problems. There's a reason for the relationship between pool name and vg name. It lets LVM manage aspects that we don't want to or should not manage.

FWIW: This is what I used for an example to test (/dev/sdh is a 50M iSCSI device) intermixing of types in one pool:

pvcreate /dev/sdh
vgcreate vg_test_thin /dev/sdh
lvcreate --name lv_test_snapshot -L 5M --type snapshot -V 20M vg_test_thin
lvcreate --type thin-pool -L 20M
--thinpool thinpool_lv_test_thin vg_test_thin
lvcreate --name lv_test_thin --thin vg_test_thin/thinpool_lv_test_thin \
--type thin -V 40M
lvcreate --name lv_test_thick -L 5M vg_test_thin

lvs vg_test_thin -o lv_name,vg_name,attr,size
LV VG Attr LSize
lv_test_snapshot vg_test_thin swi-a-s--- 8.00m
lv_test_thin vg_test_thin Vwi-a-tz-- 40.00m
lv_test_thick vg_test_thin -wi-a----- 8.00m
thinpool_lv_test_thin vg_test_thin twi-aotz-- 20.00m

virsh vol-list vg_test_thin --details
Name Path Type Capacity Allocation
------------------------------------------------------------------------------------
lv_test_snapshot /dev/vg_test_thin/lv_test_snapshot block 20.00 MiB 8.00 MiB
lv_test_thin /dev/vg_test_thin/lv_test_thin block 40.00 MiB 40.00 MiB
lv_test_thick /dev/vg_test_thin/lv_test_thick block 8.00 MiB 8.00 MiB

virsh pool-info vg_test_thin
Name: vg_test_thin
UUID: cf515fe6-5736-4548-ab68-0deac56046a5
State: running
Persistent: no
Autostart: no
Capacity: 48.00 MiB
Allocation: 44.00 MiB
Available: 4.00 MiB

NB: The lv_test_thin only shows up with a patch 4 from the series I noted at the top of this...

If I add another thin lv to the thin-pool:

lvcreate --name lv_test_thin2 --thin vg_test_thin/thinpool_lv_test_thin --type thin -V 10M

virsh pool-refresh vg_test_thin
virsh vol-list vg_test_thin --details
Name Path Type Capacity Allocation
------------------------------------------------------------------------------------
lv_test_snapshot /dev/vg_test_thin/lv_test_snapshot block 20.00 MiB 8.00 MiB
lv_test_thin /dev/vg_test_thin/lv_test_thin block 40.00 MiB 40.00 MiB
lv_test_thin2 /dev/vg_test_thin/lv_test_thin2 block 12.00 MiB 12.00 MiB
lv_test_thick /dev/vg_test_thin/lv_test_thick block 8.00 MiB 8.00 MiB
virsh pool-info vg_test_thin
Name: vg_test_thin
UUID: cf515fe6-5736-4548-ab68-0deac56046a5
State: running
Persistent: no
Autostart: no
Capacity: 48.00 MiB
Allocation: 44.00 MiB
Available: 4.00 MiB

The astute reader will know that the addition of all the capacity's listed is not 48 as shown by pool-info, rather it's 80M (larger than the 50M). That's because virsh pool-info doesn't add up each lv, it uses "vgs -o vg_size,vg_free $pool_name"

Some more thoughts about the output. The lv_test_snapshot shows a difference because the "size" for allocation as returned by LVM is 8M while the capacity as calculated by the lseek on the target.path of the block device is 20M. The guest sees 8M initially, but is allowed to grow into 20M and libvirt isn't involved other than displaying the sizes. For the 'lv_test_thin' the allocation/capacity are the same 40M even though the thinpool is only 20M. It's a different way to doing things, but still all managed by LVM. The only way to get that 20M value is to query lvm, look at the thinpool for the thin lv, and grab it's size. However, what good does that do? What purpose does libvirt have with that? If the storage pool management software tells us it's of allocation size N and the following block lseek shows the same size, who really cares that there's a thin-pool behind that? That's an implementation detail of the storage management software. We would have to make a special case in virStorageBackendUpdateVolTargetInfo in order to handle the one oddball (so far) case where perhaps we want to get the capacity of the thin-pool instead of the lseek() to the end of the thin lv.

Personally, I think the best we can do to support a thin-pool is add a new option/flag so that a virsh vol-create[-as] could create a thin-pool and a thin lv in that pool rather than a --snapshot lv (e.g. the default) when capacity != allocation for the volume. There are some "extra" considerations for this, but it doesn't force the user/admin to choose whether their vg will list either thin or non-thin lv's. This would require some sort of volume XML changes in order to describe the pool name.

Things to consider for keeping the status quo and generating thin-pools within any vg :

1. For thin volume creation add new option which would require an argument (--thinpool $thinpoolname).

The allocation value would be required for at least the initial creation and would be the "-L" parameter for the thin-pool creation (lvcreate --type thin-pool -L $allocation --thinpool $thinpoolname VGname). We would have to check if a thin-pool of that name exists before attempting to create it. Finally, the 'allocation' supplied for subsequent thin lv's could be ignored or we could check if they match and fail if not.

2. For volume display (and input to vol-create via XML file), the <volume> ... <source> ... would need to be updated to have some sort of name for the thin pool field. My suggestions were described here:

http://www.redhat.com/archives/libvir-list/2016-January/msg01253.html
http://www.redhat.com/archives/libvir-list/2016-February/msg00050.html

I did forget to update the storagevol.rng file...

Also, some more details/needs to know which thin-pool is being used for a thin lv are here:

http://www.redhat.com/archives/libvir-list/2016-January/msg01254.html

3. For pool display, nothing needs to change.

4. Deletion of a thin lv would need to determine if it as the last thin lv in the thin-pool and if so, also delete the thin-pool. We could keep the pool, but if all thin lv's were gone, then how would one delete it? A subsequent create with different allocation/capacity and the new argument would recreate the thin-pool.

5. Resizing a thin-pool is tricky, but would be better left to the LVM tools. The only way a resize of the thin-pool could happen is if all the thin lv's were deleted and the next create would change the size.

I'm sure there will be some more odds'n'ends discovered, but these are the ones that came to mind.

With respect to the alternate solution - using/forcing a 'thin-pool' as it's own volume group. Whether sharing or copying the logic from the 'logical' pool, here are some thoughts of around areas that need to be addressed:

1. Creation of a pool doesn't require one to build the pool. So whatever decisions are made in the buildPool processing cannot necessarily be shared as truisms for the pool in general. IOW: It's possible to create a 'thin logical' pool from an existing vg that has a thin-pool contained.

2. Building the pool would involve generating the 'thin-pool' lv that covers the entire physical volume set. The 'capacity' of the thin-pool would be the entire size of the pv's in the pool (eg all the provided "<device path='%s'/>" entries. Not sure if we have a way to calculate the pv's yet.

3. The pool refresh code would need to decide whether the current logic to get the data from vgs -o vg_size,vg_free is acceptible or whether there's a need to determine all the data_percent used by all thin lv's in the pool and calculate available based upon that.

4. Volume creation would always use the -V (--virtualsize) as the "capacity" value and then need to decide what to do if an allocation value was provided. On the display side, usage of the 'lvs -o' field "data_percent" will help calculate what's really available; otherwise, allocation will be the 'lv_size' (or capacity value it seems).

5. Since we cannot guarantee that we created the vg, finding the LV's for the pool would invert existing logic to look for thin lv's only. Since there would be a 1-to-1 relationship between the thin-pool and libvirt pool, only lv's using the 'lv_pool' name would be selected. That is, fetching the 'lv_name' field in lvs output and comparing it to name of the pool (or thin-pool name).

6. Not sure if the volume allocation logic would need some adjustment to understand the over-subscription model of a thin-pool with thin lv's. There can be a thin-pool of size 20G with more than 20G of thin lv's in the pool (eg, 5 5G thin lv's). IOW: Allocation could be sum of all thin lv's, while capacity could be the size of the thin-pool lv. NOTE: Not the size of all the pv's since we cannot assume we created the pool.

7. Resizing the thin-pool will need to be handled via LVM commands... Perhaps new options could be provided, but I'm not sure.

8. Deletion of the pool does a 'vgremove' and 'pvremove' - what happens if there were other non thin-pool lv's in the vg? From a vg we didn't create...

Again, I'm sure there's more odds'n'ends to be discovered. Besides I've typed too much already!

Comment 6 Pavel Hrdina 2016-02-04 09:23:11 UTC

Hi John, thanks for summarizing all the information.  Now, take a moment, read all the staff again and you can see what needs to be done to update current code to properly handle thin pools and thin volumes.  Just this could be a good reason to create a new pool type in libvirt and don't create this mess in the code by combining non-thin volumes together with thin volumes.

Comment 7 John Ferlan 2016-02-04 15:09:59 UTC

I think I started that list in the steps after "With respect to the alternate solution" above...

The 'thin-logical' (or whatever it gets called) becomes it's own storage driver backend. It'll need all the same backend API's that the logical pool has. Whether there are synergies between them is something that would be discovered during development. It'll be like peeling back all the layers of the onion. Perhaps similar to the 'scsi' and 'iscsi' pools which share some stuff.

You will have to figure out a way to define/create the pool using an existing volume group. The .findPoolSources and *MatchPoolSource could be tricky. You'll probably need some sort of output from lvs to find existing thin-pool "segtype"'s. Also usable by buildPool to ensure we don't try to build/create a thin-pool of the same name.

The build pool would be the way to create the thin-pool within an existing VG and if the VG doesn't exist yet, a way to create the VG from a source device, then create the thin-pool from the new VG. That is - if the VG doesn't exist, then the *LogicalBuildPool would handle that part and then only need to create the thin-pool. The interesting part will be the capacity comparisons. There are implications of the capacity value depending on whether you create the VG or not as it relates to how large an existing VG is vs. the size of all the PV's which could be used if you create the VG. If you have an existing VG creation of the thin-pool within that VG would fail if the capacity provide was larger than the pool had available. If you don't have an existing VG, a similar rule applies. So you need a VG available from the PV's (devices) provided in order to then check if the desired capacity of the thin-pool is acceptable. If not failure and deletion of any created VG would have to occur.

Not quite sure how to handle the delete pool, but you need to consider a few factors... Is the thin-pool "all" that is in the VG? That is - if you lvremove the thin-pool, what's left in the VG? If there's nothing, then sure it's an easy decision delete the VG like the logical pool. However, if there's something else in the VG, then I don't think we want to delete it. I would think in that case, we have to assume we didn't create the VG rather it already existed and we used it. Although that may not be a fair assumption either, but it's something we can document.

I suspect when you have this done the output XML is something similar to:

<pool type='thinlogical'>
<name>thinmints</name>
... <uuid>, <capacity>, <allocation>, <available>...
<source>
<name>VGname</name>
...
</source>
<target>
<path>???</path>
</target>
</pool>

So that ??? for the target path is an unknown in my mind. For a logical pool it would be path to the physical VG (/dev/$VGname). I suppose that could be used; however, is there any libvirt code that checks the same target path - I forget. This is one of those trial and error type decision points.

Note that for both the output 'logical' and 'thinlogical', any <source>... <device path='%s'/> provide on some *input* XML doesn't get displayed again. If we find it on input, then I suppose we need to make sure that the VGname is suing the listed devices (but that's all part of that findPoolSources and *MatchPoolSource logic).

The refreshPool and existing *LogicalFindLVs have some synergies w/r/t lvs output; however, you're only interested in lv's that have the same thin-pool name as the thin logical pool. Thus, using the "lvs -o ...,pool_lv" and then filtering any output that doesn't have the pool_lv perhaps using the "(\\S+)" on the regex. I think you may also want to have the "lvs -o ...,data_percent" value so that you know what percentage of the thin and thin-pool capacity is used. For the thin lv's there are no devices to parse, but yet the display of the volume source as the thin-pool will need to be done.

The createVol support would always use the --virtualsize/-V argument with the syntax "lvcreate --name <name> --thin <VGname>/<POOLname> --type thin -V <capacity>".

Unlike what I did back in Dec 2014, you won't be deleting the thin-pool, just the thin lv.

The buildVolFrom, uploadVol, and downloadVol seem to be shareable - although I never tried it. Logically it seems they would be.

The wipeVol support could perhaps only make use of the new 'trim' algorithm... As the 'zero' would write zeros to the lv effectively rendering it full (similar to the sparse snapshot lv). I hadn't thought about how to 'trim' an existing snapshot lv, but I suspect they'd be similar.

Still think it's easier to list a thin lv within an existing pool rather than all the logic necessary just to handle a special type of lv.

Comment 9 Cole Robinson 2016-04-13 21:19:01 UTC

*** Bug 860476 has been marked as a duplicate of this bug. ***

Comment 10 Ján Tomko 2016-08-03 14:52:17 UTC

*** Bug 1363786 has been marked as a duplicate of this bug. ***

Comment 13 Mike Goodwin 2017-10-09 16:42:29 UTC

It's been close to two years since anyone's said anything on this topic. Is this still being worked on? I'm still somewhat in shock that the whole suite from virt-manager down to libvirt doesn't support thinly provisioned lvm volumes to this day. It's really the most optimal way to do VM storage and snapshots outside of the image-file paradigm and it's currently very awkward to handle this outside of the proper tooling. Please take a look at this again?

Comment 14 Daniel Berrangé 2020-11-03 17:00:46 UTC

Thank you for reporting this issue to the libvirt project. Unfortunately we have been unable to resolve this issue due to insufficient maintainer capacity and it will now be closed. This is not a reflection on the possible validity of the issue, merely the lack of resources to investigate and address it, for which we apologise. If you none the less feel the issue is still important, you may choose to report it again at the new project issue tracker https://gitlab.com/libvirt/libvirt/-/issues The project also welcomes contribution from anyone who believes they can provide a solution.

Note You need to log in before you can comment on or make changes to this bug.