Created attachment 1907349 [details] Proposed patch Description of problem: In Cockpit we would like to let the user specify the size of a new logical volume with a slider control that goes from zero to the maximum possible size. For some types like raid5 the maximum size is complicated to compute and it would be good to use the actual code in lvcreate itself for that. It could look like this: lvcreate --test --verbose vgroup0 -n foo --type raid5 -l100%PVS TEST MODE: Metadata will NOT be updated and volumes will not be (de)activated. Using default stripesize 64.00 KiB. Converted 100% of PVS (1394) extents into 1394 (with mimages 1 and stripes 2 for segtype raid5). Creating logical volume foo Found fewer allocatable extents for logical volume foo than requested: using 480 extents (reduced by 916). TEST-INFO:lv-extents: 478 Creating logical volume foo_rimage_0 Creating logical volume foo_rmeta_0 Creating logical volume foo_rimage_1 Creating logical volume foo_rmeta_1 Creating logical volume foo_rimage_2 Creating logical volume foo_rmeta_2 Test mode: Skipping wiping of metadata areas. Test mode: Skipping archiving of volume group. Test mode: Skipping activation, zeroing and signature wiping. Logical volume "foo" created. Test mode: Skipping backup of volume group. The new output is the "TEST-INFO:lv-extents: 478" line.
The problem with this RFE is - existing lvm2 codebase is not capable to work in the 'virtual' mode. There is not an virtual 'allocator' that would upfront allocate all the LVs needed to fulfill command - there is rather a 1-by-1 approach that might fail (in some cases with 'backtracking'). Cocpit might use '%' bars instead of 'extents' - as the extent precision could be only 'obtained' by virtualizing a device set and let the command run in such fake environment - also users likely do not care much about these 'imprecise' numbers anyway since they deal with TiB range storage anyway. We will be able to realize such RFE once a better allocation engine will be written for lvm2.
(In reply to Zdenek Kabelac from comment #1) > The problem with this RFE is - existing lvm2 codebase is not capable to > work in the 'virtual' mode. What is "virtual mode"? Is it the same as "test mode", which is activated with the "--test" command line argument? > There is not an virtual 'allocator' that would upfront allocate all the LVs > needed to fulfill command - there is rather a 1-by-1 approach that might > fail (in some cases with 'backtracking'). Are you saying that sometimes a single invocation of "lvcreate" results in multiple calls to "lv_extend" and thus in multiple "TEST-INFO:lv-extents" lines? That would be fine for Cockpit. Right now we are only interested in cases that result in a single call to "lv_extend", I think. > Cocpit might use '%' bars instead of 'extents' That would require fixes to LVM first, I am afraid: https://github.com/storaged-project/udisks/pull/969#issuecomment-1211948334 Extract: The experience when using "%PVS" for anything but "100%PVS" is not good: "50%PVS" does not give you something that is half the size of "100%PVS": # lsblk /dev/sdd /dev/sde /dev/sdf NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sdd 8:48 0 500M 0 disk sde 8:64 0 500M 0 disk sdf 8:80 0 500M 0 disk # vgcreate vgroup0 /dev/sdd /dev/sde /dev/sdf Volume group "vgroup0" successfully created # lvcreate vgroup0 -n lvol0 --type raid5 -l "100%PVS" Using default stripesize 64.00 KiB. Logical volume "lvol0" created. # lvs vgroup0 LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert lvol0 vgroup0 rwi-a-r--- 984.00m 100.00 # lvremove vgroup0/lvol0 Logical volume "lvol0" successfully removed # lvcreate vgroup0 -n lvol0 --type raid5 -l "50%PVS" Using default stripesize 64.00 KiB. Logical volume "lvol0" created. # lvs vgroup0 LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert lvol0 vgroup0 rwi-a-r--- 744.00m 100.00 A raid5 on "100%PVS" is 984 megs, but with "50%PVS" it is 744 megs instead of the expected 490ish megs. This is because "%PVS" doesn't work well with layouts that require much more space on PVs than they provide on the resulting LV. "100%PVS" in the case above is asking for a LV of size 1500ish megs, but lvcreate flips on a special mode and makes one that is as large as it can be, which turns out to be 980 megs. "50%PVS" asks for half of 1500 = 750, which is possible, so we get that. End extract. > We will be able to realize such RFE once a better allocation engine will be > written for lvm2. Ok, LVM2 RAID support in Cockpit will wait for that.
(In reply to Marius Vollmer from comment #2) > (In reply to Zdenek Kabelac from comment #1) > > The problem with this RFE is - existing lvm2 codebase is not capable to > > work in the 'virtual' mode. > > What is "virtual mode"? Is it the same as "test mode", which is activated > with the "--test" command line argument? I wanted to rather 'emphasize' that current lvm2 --test mode is not going into 'big' depth of command execution - it's basically only able to try if some 'basic' functionality is able to proceed - but it's unfortunately incapable to go further without actually doing some 'real thing' - i.e. lvm2 is not able to simulate real function of real target - so using --test mode to obtain some 'real' final data is not a good approach here. This is usually not a problem for targets without metadata - but as soon as the kernel DM target is doing its own metadata logic - the resulting outcome may eventually even differ between kernel version. > > There is not an virtual 'allocator' that would upfront allocate all the LVs > > needed to fulfill command - there is rather a 1-by-1 approach that might > > fail (in some cases with 'backtracking'). > > Are you saying that sometimes a single invocation of "lvcreate" results in > multiple calls to "lv_extend" and thus in multiple "TEST-INFO:lv-extents" > lines? That would be fine for Cockpit. Right now we are only interested in > cases that result in a single call to "lv_extend", I think. Basically lvm2 is sometimes calling allocator multiple times (i.e. thin-pool on raid data device) gives you call to allocate 'pmspare' + 'metadata' + 'data as raid' target. There is no 'global' single allocator call resolving this as an atomic operation - which is unfortunate and prevent us giving you simple 'allocation' result on a single call. There are some plans to introduce such kind of allocation engine - but there is no ETA for such outcome as this is seriously complicated. > > > Cocpit might use '%' bars instead of 'extents' > > That would require fixes to LVM first, I am afraid: > > https://github.com/storaged-project/udisks/pull/969#issuecomment-1211948334 > > Extract: > > The experience when using "%PVS" for anything but "100%PVS" is not good: > "50%PVS" does not give you something that is half the size of "100%PVS": > The strings represent basically a shortcut for specifying LV size - aka usable volume size of a volume. So when you create a RAID volume - lvm2 does not (ATM) support specification of 'whole/total' size of raid volume, only a size of user usable volume - so user asks for size XYZ + raid level and allocations are made to fulfill the request. If the user is using %PVS - this asked size is translated from given number of extents represented as a sum of extents from each listed PV. However unlike with 'precise' extent/size specification (-l|-L) - lvm2 here is allowed to 'round-down' to fit. So the difference is - if user asks with -L100G - there is either 100G available for resulting LV or command fails. If the same is asked via i.e. 100%VG if the VG has already some space in use - lvm2 scales down to give highest possible size. So it should be seens as %XXX is 'give me at most size X, but and anything smaller fits as well' > LV VG Attr LSize Pool Origin Data% Meta% Move Log > Cpy%Sync Convert > lvol0 vgroup0 rwi-a-r--- 984.00m 100.00 > > # lvcreate vgroup0 -n lvol0 --type raid5 -l "50%PVS" > Using default stripesize 64.00 KiB. > Logical volume "lvol0" created. > > # lvs vgroup0 > LV VG Attr LSize Pool Origin Data% Meta% Move Log > Cpy%Sync Convert > lvol0 vgroup0 rwi-a-r--- 744.00m 100.00 > > > A raid5 on "100%PVS" is 984 megs, but with "50%PVS" it is 744 megs instead > of the expected 490ish megs. Which is in line with current documented lvm2 design (aka there is no bug). > This is because "%PVS" doesn't work well with layouts that require much more > space on PVs than they provide on the resulting LV. "100%PVS" in the case > above is asking for a LV of size 1500ish megs, but lvcreate flips on a > special mode and makes one that is as large as it can be, which turns out to > be 980 megs. "50%PVS" asks for half of 1500 = 750, which is possible, so we > get that. We have already these related RFE BZ #958459, BZ #918328 and especially BZ #1899134 But it all basically goes back to rework of our allocator engine - so currently hitting devel capacity limitation.
(In reply to Zdenek Kabelac from comment #3) > > A raid5 on "100%PVS" is 984 megs, but with "50%PVS" it is 744 megs instead > > of the expected 490ish megs. > > Which is in line with current documented lvm2 design (aka there is no bug). At the same time there is this in lvcreate.c: /* For mirrors and raid with percentages based on physical extents, convert the total number of PEs * into the number of logical extents per image (minimum 1) */ /* FIXME Handle all the supported raid layouts here based on already-known segtype. */ if ((lcp->percent != PERCENT_ORIGIN) && lp->mirrors) { extents /= lp->mirrors; if (!extents) extents = 1; } Maybe this should be removed?
(In reply to Zdenek Kabelac from comment #3) > I wanted to rather 'emphasize' that current lvm2 --test mode is not going > into 'big' depth of command execution Would it help you to accept this patch if it would output the TEST-INFO line only in the basic cases where we know it will be correct? This would be for the segment types "linear", "mirror", and "raid*", right?
(In reply to Marius Vollmer from comment #4) > (In reply to Zdenek Kabelac from comment #3) > > > > A raid5 on "100%PVS" is 984 megs, but with "50%PVS" it is 744 megs instead > > > of the expected 490ish megs. > > > > Which is in line with current documented lvm2 design (aka there is no bug). > > At the same time there is this in lvcreate.c: > > /* For mirrors and raid with percentages based on physical extents, > convert the total number of PEs > * into the number of logical extents per image (minimum 1) */ > /* FIXME Handle all the supported raid layouts here based on already-known > segtype. */ > if ((lcp->percent != PERCENT_ORIGIN) && lp->mirrors) { > extents /= lp->mirrors; > if (!extents) > extents = 1; > } Comment is 'correct' - the meaming of 'total' is related to PE sum per image leg. So as mentioned (and visible in your example as well) %PVS, %VG are simply 'converted' to a number of extent for a single raid/mirror leg - which then presents LV size.
> Ok, LVM2 RAID support in Cockpit will wait for that. LVM on mdraid is a much more common config, so you might look at that as an alternative (if it's not already done.)
(In reply to David Teigland from comment #7) > > Ok, LVM2 RAID support in Cockpit will wait for that. > > LVM on mdraid is a much more common config, so you might look at that as an > alternative (if it's not already done.) That is done. The request for supporting LVM RAID in Cockpit came from the LVM team, in fact. :-)
Let's just be a bit more explicit here why the plain 'allocation' result is not all that much useful as you might think - and why am I talking about 'virtual' working mode for lvm2. Cocpit tool would be managing VG - and user tries to add a 'single' LV there and then user wants to add another LV - however the placement of such LV is not 'just' size constrained - but also by its own extent location over PV set. So without having a 'virtual-like' mode working on - you are basically building 'cloud-castle' - that is just combining sizes together to make an illusion user may 'create' such LVs once user hits 'proceed' button - however it may actually fail as soon as the 2nd. LV is asked to be created - as the changes in layout for allocation might make it unexecutable. So ATM Cocpit may support only one-step ahead of doing a 'real-thing' - there is ATM no way user can combine LV objects just based on their sizes.
(In reply to Zdenek Kabelac from comment #9) > So ATM Cocpit may support only one-step ahead of doing a 'real-thing' - > there is ATM no way user can combine LV objects just based on their sizes. This is good enough for Cockpit.
I now think we don't need to run the real LVM2 allocation algorithm to determine the maximum size of new raid LVs. The scenarios that it can be trusted to handle are straightforward enough to calculate without it, see bug 2181573 and https://github.com/cockpit-project/cockpit/pull/17226#issuecomment-1484614466.