Bug 2232632 - overcloud LVM thin pool has very small spare pool metadata volume
Summary: overcloud LVM thin pool has very small spare pool metadata volume
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: diskimage-builder
Version: 17.1 (Wallaby)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z1
: 17.1
Assignee: Steve Baker
QA Contact: James E. LaBarre
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-08-17 16:15 UTC by Pavel Cahyna
Modified: 2023-09-20 00:30 UTC (History)
11 users (show)

Fixed In Version: diskimage-builder-3.29.1-1.20230424091025.el9ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-09-20 00:29:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 892244 0 None MERGED growvols: reserve space for spare metadata volume 2023-08-30 19:15:07 UTC
Red Hat Issue Tracker OSP-27571 0 None None None 2023-08-17 16:17:07 UTC
Red Hat Product Errata RHBA-2023:5138 0 None None None 2023-09-20 00:30:28 UTC

Description Pavel Cahyna 2023-08-17 16:15:14 UTC
Description of problem:

This is a followup to bz2149586. While the metadata volume was extended to 1 GB, the spare pool metadata volume remains at the original size of 8 MB, which is different from a freshly created pool (which would have a spare pool metadata LV of the same size as the pool metadata). This causes an issue when restoring from backup, encountered by a customer: https://bugzilla.redhat.com/show_bug.cgi?id=2222899#c58

I have verified that lvextend normally behaves properly, i.e. extends the spare volume as well together with the metadata volume:

# lvs -a
  LV              VG                  Attr       LSize Pool   Origin Data%  Meta%  Move Log Cpy%Sync Convert
  [lvol0_pmspare] rhel_kvm-08-guest24 ewi------- 8.00m                                                      
  pool00          rhel_kvm-08-guest24 twi-aotz-- 6.17g               47.09  23.24                           
  [pool00_tdata]  rhel_kvm-08-guest24 Twi-ao---- 6.17g                                                      
  [pool00_tmeta]  rhel_kvm-08-guest24 ewi-ao---- 8.00m                                                      

# lvextend --poolmetadatasize +16M /dev/rhel_kvm-08-guest24/pool00
  Size of logical volume rhel_kvm-08-guest24/pool00_tmeta changed from 8.00 MiB (2 extents) to 24.00 MiB (6 extents).
  Logical volume rhel_kvm-08-guest24/pool00_tmeta successfully resized.
# lvs -a
  LV              VG                  Attr       LSize  Pool   Origin Data%  Meta%  Move Log Cpy%Sync Convert
  [lvol0_pmspare] rhel_kvm-08-guest24 ewi------- 24.00m                                                      
  pool00          rhel_kvm-08-guest24 twi-aotz--  6.17g               47.09  14.42                           
  [pool00_tdata]  rhel_kvm-08-guest24 Twi-ao----  6.17g                                                      
  [pool00_tmeta]  rhel_kvm-08-guest24 ewi-ao---- 24.00m                                                      

If it has not happened in this case, it must be because there is not enough space left:

# lvextend --poolmetadatasize +1G /dev/rhel_kvm-08-guest24/pool00
  Size of logical volume rhel_kvm-08-guest24/pool00_tmeta changed from 24.00 MiB (6 extents) to 1.02 GiB (262 extents).
  Insufficient free space: 256 extents needed, but only 19 available
  Logical volume rhel_kvm-08-guest24/pool00_tmeta successfully resized.
# lvs -a
  LV              VG                  Attr       LSize  Pool   Origin Data%  Meta%  Move Log Cpy%Sync Convert
  [lvol0_pmspare] rhel_kvm-08-guest24 ewi------- 24.00m                                                      
  pool00          rhel_kvm-08-guest24 twi-aotz--  6.17g               47.09  1.63                            
  [pool00_tdata]  rhel_kvm-08-guest24 Twi-ao----  6.17g                                                      
  [pool00_tmeta]  rhel_kvm-08-guest24 ewi-ao----  1.02g                                                      

(aren't you getting this message regularly during growvols invocation?)

The problem with backup recovery is that lvcreate behaves differently from lvextend when there is no space left for the spare: while lvextend prints the message above and resizes the volume anyway, leaving the spare volume small, lvcreate (called during backup recovery) aborts. The fix would be to leave enough space in the VG to extend both the metadata volume and the spare.

Version-Release number of selected component (if applicable):
diskimage-builder-3.29.1-1.20230424091024.ed9bdf8.el9ost

How reproducible:
The problem can be seen already in https://bugzilla.redhat.com/show_bug.cgi?id=2149586#c14

Steps to Reproduce:
1. examine volume groups after growvols has been executed

Actual results:
very small spare pool metadata volume 
[lvol0_pmspare]     vg ewi-------  8.00m                                                           

Expected results:
spare pool metadata volume size equal to pool metadata volume size

Additional info:

it looks like being caused by this:

    if thin_pool:
        # total size available, reduced by POOL_METADATA_SIZE
        # rounded down to whole extent
        size_bytes -= POOL_METADATA_SIZE
        size_bytes -= size_bytes % PHYSICAL_EXTENT_BYTES
    dev_path = '/dev/%s' % devname

( https://review.opendev.org/c/openstack/diskimage-builder/+/868049/1/diskimage_builder/elements/growvols/static/usr/local/sbin/growvols#511 )
- it should reserve 2*POOL_METADATA_SIZE to account for the spare metadata volume increase.

Comment 17 Pavel Cahyna 2023-08-31 12:24:28 UTC
Hi Steve,

just an idea - why are you calculating the size of the space to extend and use --poolmetadatasize at all? Why not use "lvextend ...  -l +100%FREE" to determine the space automatically ? In my experiments, this uses all the available space and at the same time recalculates the pool metadata size according to defaults and extends it together with the spare:

[root@kvm-03-guest07 ~]# vgs
  VG                  #PV #LV #SN Attr   VSize   VFree 
  rhel_kvm-03-guest07   1   4   0 wz--n- <53.00g 10.64g
[root@kvm-03-guest07 ~]# lvs -a
  LV              VG                  Attr       LSize   Pool   Origin Data%  Meta%  Move Log Cpy%Sync Convert
  home            rhel_kvm-03-guest07 Vwi-aotz--  12.63g pool00        4.82                                   
  [lvol0_pmspare] rhel_kvm-03-guest07 ewi-------  40.00m                                                      
  pool00          rhel_kvm-03-guest07 twi-aotz-- <38.52g               8.25   13.14                           
  [pool00_tdata]  rhel_kvm-03-guest07 Twi-ao---- <38.52g                                                      
  [pool00_tmeta]  rhel_kvm-03-guest07 ewi-ao----  40.00m                                                      
  root            rhel_kvm-03-guest07 Vwi-aotz--  25.88g pool00        9.93                                   
  swap            rhel_kvm-03-guest07 -wi-ao----   3.76g                                                      
[root@kvm-03-guest07 ~]# lsblk
NAME                                     MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vda                                      252:0    0   54G  0 disk 
├─vda1                                   252:1    0    1G  0 part /boot
└─vda2                                   252:2    0   53G  0 part 
  ├─rhel_kvm--03--guest07-pool00_tmeta   253:0    0   40M  0 lvm  
  │ └─rhel_kvm--03--guest07-pool00-tpool 253:2    0 38.5G  0 lvm  
  │   ├─rhel_kvm--03--guest07-root       253:3    0 25.9G  0 lvm  /
  │   ├─rhel_kvm--03--guest07-pool00     253:5    0 38.5G  1 lvm  
  │   └─rhel_kvm--03--guest07-home       253:6    0 12.6G  0 lvm  /home
  ├─rhel_kvm--03--guest07-pool00_tdata   253:1    0 38.5G  0 lvm  
  │ └─rhel_kvm--03--guest07-pool00-tpool 253:2    0 38.5G  0 lvm  
  │   ├─rhel_kvm--03--guest07-root       253:3    0 25.9G  0 lvm  /
  │   ├─rhel_kvm--03--guest07-pool00     253:5    0 38.5G  1 lvm  
  │   └─rhel_kvm--03--guest07-home       253:6    0 12.6G  0 lvm  /home
  └─rhel_kvm--03--guest07-swap           253:4    0  3.8G  0 lvm  [SWAP]
[root@kvm-03-guest07 ~]# lvextend rhel_kvm-03-guest07/pool00 -l +100%FREE
  Rounding size to boundary between physical extents: 52.00 MiB.
  Size of logical volume rhel_kvm-03-guest07/pool00_tmeta changed from 40.00 MiB (10 extents) to 52.00 MiB (13 extents).
  Size of logical volume rhel_kvm-03-guest07/pool00_tdata changed from <38.52 GiB (9860 extents) to 49.13 GiB (12578 extents).
  Logical volume rhel_kvm-03-guest07/pool00 successfully resized.
[root@kvm-03-guest07 ~]# lvs -a
  LV              VG                  Attr       LSize  Pool   Origin Data%  Meta%  Move Log Cpy%Sync Convert
  home            rhel_kvm-03-guest07 Vwi-aotz-- 12.63g pool00        4.82                                   
  [lvol0_pmspare] rhel_kvm-03-guest07 ewi------- 52.00m                                                      
  pool00          rhel_kvm-03-guest07 twi-aotz-- 49.13g               6.47   12.50                           
  [pool00_tdata]  rhel_kvm-03-guest07 Twi-ao---- 49.13g                                                      
  [pool00_tmeta]  rhel_kvm-03-guest07 ewi-ao---- 52.00m                                                      
  root            rhel_kvm-03-guest07 Vwi-aotz-- 25.88g pool00        9.93                                   
  swap            rhel_kvm-03-guest07 -wi-ao----  3.76g                                                      
[root@kvm-03-guest07 ~]# vgs
  VG                  #PV #LV #SN Attr   VSize   VFree
  rhel_kvm-03-guest07   1   4   0 wz--n- <53.00g    0 


For backporting to a stable branch using your approach is probably the best as it is a minimal possible change, but for long term development the approach above might be better thanks to its simplicity (if feasible).

Comment 22 Steve Baker 2023-09-08 05:04:28 UTC
(In reply to Pavel Cahyna from comment #17)
> Hi Steve,
> 
> just an idea - why are you calculating the size of the space to extend and
> use --poolmetadatasize at all? Why not use "lvextend ...  -l +100%FREE" to
> determine the space automatically ?

Having a metadata size of 1G is intentional, we don't want it using all remaining space as that may be used by the end user for some purpose or by ephemeral upgrade snapshots.

Comment 27 James E. LaBarre 2023-09-11 13:15:07 UTC
The thing that was confusing in the diskimage-builder package is that it was listed as being fixed in diskimage-builder-3.30.1*, but the build on the 20230907.n.1 compose is diskimage-builder-3.29.1-1.20230424091025.el9ost.noarch.  The changes are in place and the fix has worked, so I'm presuming the version was kept at 3.29.1 intentionally?

Regardless of the numbering, the fixes are in place (did a compare on the growvols and test_growvols.py files to be sure).  Thin_pools on a completed run are sized as expected (1.1g on lv_thinpool_tmeta and lvol0_pmspare).

Comment 32 errata-xmlrpc 2023-09-20 00:29:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1.1 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:5138


Note You need to log in before you can comment on or make changes to this bug.