Bug 2121015 - [RFE] Output actual number of extents in test mode
Summary: [RFE] Output actual number of extents in test mode
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: LVM and device-mapper
Classification: Community
Component: lvm2
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: LVM and device-mapper development team
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On: 1899134
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-08-24 10:05 UTC by Marius Vollmer
Modified: 2023-03-27 07:37 UTC (History)
8 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2023-03-27 07:28:01 UTC
Embargoed:
pm-rhel: lvm-technical-solution?
pm-rhel: lvm-test-coverage?


Attachments (Terms of Use)
Proposed patch (993 bytes, patch)
2022-08-24 10:05 UTC, Marius Vollmer
no flags Details | Diff

Description Marius Vollmer 2022-08-24 10:05:44 UTC
Created attachment 1907349 [details]
Proposed patch

Description of problem:

In Cockpit we would like to let the user specify the size of a new logical volume with a slider control that goes from zero to the maximum possible size.

For some types like raid5 the maximum size is complicated to compute and it would be good to use the actual code in lvcreate itself for that.

It could look like this:

lvcreate --test --verbose vgroup0 -n foo --type raid5 -l100%PVS
  TEST MODE: Metadata will NOT be updated and volumes will not be (de)activated.
  Using default stripesize 64.00 KiB.
  Converted 100% of PVS (1394) extents into 1394 (with mimages 1 and stripes 2 for segtype raid5).
  Creating logical volume foo
  Found fewer allocatable extents for logical volume foo than requested: using 480 extents (reduced by 916).
  TEST-INFO:lv-extents: 478
  Creating logical volume foo_rimage_0
  Creating logical volume foo_rmeta_0
  Creating logical volume foo_rimage_1
  Creating logical volume foo_rmeta_1
  Creating logical volume foo_rimage_2
  Creating logical volume foo_rmeta_2
  Test mode: Skipping wiping of metadata areas.
  Test mode: Skipping archiving of volume group.
  Test mode: Skipping activation, zeroing and signature wiping.
  Logical volume "foo" created.
  Test mode: Skipping backup of volume group.

The new output is the "TEST-INFO:lv-extents: 478" line.

Comment 1 Zdenek Kabelac 2022-08-25 12:35:03 UTC
The problem with this RFE is -  existing lvm2 codebase is not capable to work in the 'virtual' mode.

There is not an virtual 'allocator' that would upfront allocate all the LVs needed to fulfill command - there is rather a 1-by-1 approach that might fail (in some cases with 'backtracking').

Cocpit might use  '%' bars instead of 'extents' - as the extent precision could be only 'obtained' by virtualizing a device set and let the command run in such fake environment - also users likely do not care much about these 'imprecise' numbers anyway since they deal with TiB range storage anyway.

We will be able to realize such RFE once a better allocation engine will be written for lvm2.

Comment 2 Marius Vollmer 2022-08-26 07:49:43 UTC
(In reply to Zdenek Kabelac from comment #1)
> The problem with this RFE is -  existing lvm2 codebase is not capable to
> work in the 'virtual' mode.

What is "virtual mode"?  Is it the same as "test mode", which is activated with the "--test" command line argument?

> There is not an virtual 'allocator' that would upfront allocate all the LVs
> needed to fulfill command - there is rather a 1-by-1 approach that might
> fail (in some cases with 'backtracking').

Are you saying that sometimes a single invocation of "lvcreate" results in multiple calls to "lv_extend" and thus in multiple "TEST-INFO:lv-extents" lines?  That would be fine for Cockpit.  Right now we are only interested in cases that result in a single call to "lv_extend", I think.
 
> Cocpit might use  '%' bars instead of 'extents'

That would require fixes to LVM first, I am afraid:

https://github.com/storaged-project/udisks/pull/969#issuecomment-1211948334

Extract:

The experience when using "%PVS" for anything but "100%PVS" is not good: "50%PVS" does not give you something that is half the size of "100%PVS":

# lsblk /dev/sdd /dev/sde /dev/sdf
NAME MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sdd    8:48   0  500M  0 disk 
sde    8:64   0  500M  0 disk 
sdf    8:80   0  500M  0 disk 

# vgcreate vgroup0 /dev/sdd /dev/sde /dev/sdf
 Volume group "vgroup0" successfully created

# lvcreate vgroup0 -n lvol0 --type raid5 -l "100%PVS"
  Using default stripesize 64.00 KiB.
  Logical volume "lvol0" created.

# lvs vgroup0
  LV    VG      Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  lvol0 vgroup0 rwi-a-r--- 984.00m                                    100.00          

# lvremove vgroup0/lvol0
  Logical volume "lvol0" successfully removed

# lvcreate vgroup0 -n lvol0 --type raid5 -l "50%PVS"
  Using default stripesize 64.00 KiB.
  Logical volume "lvol0" created.

# lvs vgroup0
  LV    VG      Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  lvol0 vgroup0 rwi-a-r--- 744.00m                                    100.00          

A raid5 on "100%PVS" is 984 megs, but with "50%PVS" it is 744 megs instead of the expected 490ish megs.

This is because "%PVS" doesn't work well with layouts that require much more space on PVs than they provide on the resulting LV. "100%PVS" in the case above is asking for a LV of size 1500ish megs, but lvcreate flips on a special mode and makes one that is as large as it can be, which turns out to be 980 megs. "50%PVS" asks for half of 1500 = 750, which is possible, so we get that.

End extract.
 
> We will be able to realize such RFE once a better allocation engine will be
> written for lvm2.

Ok, LVM2 RAID support in Cockpit will wait for that.

Comment 3 Zdenek Kabelac 2022-08-26 09:22:11 UTC
(In reply to Marius Vollmer from comment #2)
> (In reply to Zdenek Kabelac from comment #1)
> > The problem with this RFE is -  existing lvm2 codebase is not capable to
> > work in the 'virtual' mode.
> 
> What is "virtual mode"?  Is it the same as "test mode", which is activated
> with the "--test" command line argument?

I wanted to rather 'emphasize' that current lvm2 --test mode is not going into 'big' depth of command execution - it's basically only able to try if some 'basic' functionality is able to proceed - but it's unfortunately incapable to go further without actually doing some 'real thing' - i.e. lvm2 is not able to simulate real function of real target - so using --test mode to obtain some 'real' final data is not a good approach here.

This is usually not a problem for targets without metadata - but as soon as the kernel DM target is doing its own metadata logic - the resulting outcome may eventually even differ between kernel version.

> > There is not an virtual 'allocator' that would upfront allocate all the LVs
> > needed to fulfill command - there is rather a 1-by-1 approach that might
> > fail (in some cases with 'backtracking').
> 
> Are you saying that sometimes a single invocation of "lvcreate" results in
> multiple calls to "lv_extend" and thus in multiple "TEST-INFO:lv-extents"
> lines?  That would be fine for Cockpit.  Right now we are only interested in
> cases that result in a single call to "lv_extend", I think.

Basically lvm2 is sometimes calling allocator multiple times (i.e. thin-pool on raid data device) gives you call to allocate 'pmspare' + 'metadata' + 'data as raid' target. There is no 'global' single allocator call resolving this as an atomic operation - which is unfortunate and prevent us giving you simple 'allocation' result on a single call.

There are some plans to introduce such kind of allocation engine - but there is no ETA for such outcome as this is seriously complicated.

>  
> > Cocpit might use  '%' bars instead of 'extents'
> 
> That would require fixes to LVM first, I am afraid:
> 
> https://github.com/storaged-project/udisks/pull/969#issuecomment-1211948334
> 
> Extract:
> 
> The experience when using "%PVS" for anything but "100%PVS" is not good:
> "50%PVS" does not give you something that is half the size of "100%PVS":
> 


The strings represent basically a shortcut for specifying LV size - aka usable volume size of a volume.
So when you create a RAID volume - lvm2 does not (ATM) support specification of 'whole/total' size of raid volume, only a size of user usable volume -   so  user asks for size XYZ + raid level and allocations are made to fulfill the request.

If the user is using %PVS - this asked size is translated from given number of extents represented as a sum of extents from each listed PV.
However unlike with 'precise' extent/size specification (-l|-L) - lvm2 here is allowed to 'round-down' to fit.

So the difference is - if user asks with -L100G - there is either 100G available for resulting LV or command fails.
If the same is asked via i.e.  100%VG  if the VG has already some space in use -  lvm2 scales down to give highest possible size.
So it should be seens as  %XXX is   'give me at most size X, but and anything smaller fits as well'

>   LV    VG      Attr       LSize   Pool Origin Data%  Meta%  Move Log
> Cpy%Sync Convert
>   lvol0 vgroup0 rwi-a-r--- 984.00m                                    100.00
> 
> # lvcreate vgroup0 -n lvol0 --type raid5 -l "50%PVS"
>   Using default stripesize 64.00 KiB.
>   Logical volume "lvol0" created.
> 
> # lvs vgroup0
>   LV    VG      Attr       LSize   Pool Origin Data%  Meta%  Move Log
> Cpy%Sync Convert
>   lvol0 vgroup0 rwi-a-r--- 744.00m                                    100.00
> 
> 
> A raid5 on "100%PVS" is 984 megs, but with "50%PVS" it is 744 megs instead
> of the expected 490ish megs.

Which is in line with current documented lvm2 design (aka there is no bug).

> This is because "%PVS" doesn't work well with layouts that require much more
> space on PVs than they provide on the resulting LV. "100%PVS" in the case
> above is asking for a LV of size 1500ish megs, but lvcreate flips on a
> special mode and makes one that is as large as it can be, which turns out to
> be 980 megs. "50%PVS" asks for half of 1500 = 750, which is possible, so we
> get that.

We have already these related RFE BZ #958459, BZ #918328 and especially BZ #1899134

But it all basically goes back to rework of our allocator engine - so currently hitting devel capacity limitation.

Comment 4 Marius Vollmer 2022-08-26 09:55:38 UTC
(In reply to Zdenek Kabelac from comment #3)

> > A raid5 on "100%PVS" is 984 megs, but with "50%PVS" it is 744 megs instead
> > of the expected 490ish megs.
> 
> Which is in line with current documented lvm2 design (aka there is no bug).

At the same time there is this in lvcreate.c:

 		/* For mirrors and raid with percentages based on physical extents, convert the total number of PEs 
		 * into the number of logical extents per image (minimum 1) */
		/* FIXME Handle all the supported raid layouts here based on already-known segtype. */
		if ((lcp->percent != PERCENT_ORIGIN) && lp->mirrors) {
			extents /= lp->mirrors;
			if (!extents)
				extents = 1;
		}

Maybe this should be removed?

Comment 5 Marius Vollmer 2022-08-26 09:58:47 UTC
(In reply to Zdenek Kabelac from comment #3)

> I wanted to rather 'emphasize' that current lvm2 --test mode is not going
> into 'big' depth of command execution 

Would it help you to accept this patch if it would output the TEST-INFO line only in the basic cases where we know it will be correct?  This would be for the segment types "linear", "mirror", and "raid*", right?

Comment 6 Zdenek Kabelac 2022-08-26 10:39:09 UTC
(In reply to Marius Vollmer from comment #4)
> (In reply to Zdenek Kabelac from comment #3)
> 
> > > A raid5 on "100%PVS" is 984 megs, but with "50%PVS" it is 744 megs instead
> > > of the expected 490ish megs.
> > 
> > Which is in line with current documented lvm2 design (aka there is no bug).
> 
> At the same time there is this in lvcreate.c:
> 
>  		/* For mirrors and raid with percentages based on physical extents,
> convert the total number of PEs 
> 		 * into the number of logical extents per image (minimum 1) */
> 		/* FIXME Handle all the supported raid layouts here based on already-known
> segtype. */
> 		if ((lcp->percent != PERCENT_ORIGIN) && lp->mirrors) {
> 			extents /= lp->mirrors;
> 			if (!extents)
> 				extents = 1;
> 		}

Comment is 'correct' - the meaming of 'total' is related to PE sum per image leg.

So as mentioned (and visible in your example as well)   %PVS, %VG are simply 'converted' to a number of extent for a single raid/mirror leg - which then presents LV size.

Comment 7 David Teigland 2022-08-26 16:37:14 UTC
> Ok, LVM2 RAID support in Cockpit will wait for that.

LVM on mdraid is a much more common config, so you might look at that as an alternative (if it's not already done.)

Comment 8 Marius Vollmer 2022-08-26 18:15:20 UTC
(In reply to David Teigland from comment #7)
> > Ok, LVM2 RAID support in Cockpit will wait for that.
> 
> LVM on mdraid is a much more common config, so you might look at that as an
> alternative (if it's not already done.)

That is done.  The request for supporting LVM RAID in Cockpit came from the LVM team, in fact. :-)

Comment 9 Zdenek Kabelac 2022-08-27 10:03:45 UTC
Let's just be a bit more explicit here why the plain 'allocation' result is not all that much useful as you might think - and why am I talking about 'virtual' working mode for lvm2.

Cocpit tool would be managing VG - and user tries to add a 'single' LV there and then user wants to add another LV - however the placement of such LV is not 'just' size  constrained  - but also by its own  extent location over PV set.

So without having a 'virtual-like' mode working on - you are basically building 'cloud-castle' - that is just combining sizes together to make an illusion user may 'create' such LVs once user hits 'proceed' button - however it may actually fail as soon as the 2nd. LV is asked to be created - as the changes in layout for allocation might make it unexecutable.

So ATM Cocpit may support only one-step ahead of doing a 'real-thing' - there is ATM no way user can combine LV objects just based on their sizes.

Comment 10 Marius Vollmer 2022-08-29 06:24:33 UTC
(In reply to Zdenek Kabelac from comment #9)

> So ATM Cocpit may support only one-step ahead of doing a 'real-thing' -
> there is ATM no way user can combine LV objects just based on their sizes.

This is good enough for Cockpit.

Comment 11 Marius Vollmer 2023-03-27 07:28:01 UTC
I now think we don't need to run the real LVM2 allocation algorithm to determine the maximum size of new raid LVs.  The scenarios that it can be trusted to handle are straightforward enough to calculate without it, see bug 2181573 and https://github.com/cockpit-project/cockpit/pull/17226#issuecomment-1484614466.


Note You need to log in before you can comment on or make changes to this bug.