2149586 – overcloud LVM thin pool has very small metadata volume

Bug 2149586 - overcloud LVM thin pool has very small metadata volume

Summary: overcloud LVM thin pool has very small metadata volume

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	diskimage-builder
Sub Component:
Version:	17.1 (Wallaby)
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	17.1
Assignee:	Steve Baker
QA Contact:
Docs Contact:
URL:
Whiteboard:
Duplicates (3):	2151335 2162434 2178905 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-11-30 09:52 UTC by Ricardo Diaz
Modified:	2023-08-16 01:13 UTC (History)
CC List:	8 users (show)
Fixed In Version:	diskimage-builder-3.29.1-1.20230424091024.ed9bdf8.el9ost
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-08-16 01:12:55 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Manually execution of growvols commands (5.62 KB, application/zip) 2022-11-30 09:52 UTC, Ricardo Diaz	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
OpenStack gerrit	868049	None	NEW	Grow thin pool metadata by 1GiB	2022-12-19 00:54:09 UTC
Red Hat Issue Tracker	OSP-20567	None	None	None	2022-11-30 09:53:35 UTC
Red Hat Product Errata	RHEA-2023:4577	None	None	None	2023-08-16 01:13:40 UTC

Description Ricardo Diaz 2022-11-30 09:52:11 UTC

Created attachment 1928643 [details]
Manually execution of growvols commands

Description of problem:

When doing the overcloud node provision, the execution of /usr/share/ansible/tripleo-playbooks/cli-overcloud-node-growvols.yaml playbook hangs when running growvols ("/usr/local/sbin/growvols /=8GB /tmp=1GB /var/log=10GB /var/log/audit=2GB /home=1GB /var=100%"). 

Version-Release number of selected component (if applicable):

Red Hat OpenStack Platform release 17.1.0 Beta (Wallaby)
Red Hat Enterprise Linux release 9.1 (Plow)

How reproducible:
100% (apparently only for HDD:s (rotational) or > 1T)

Steps to Reproduce:
1. Run: openstack overcloud node provision --network-config --stack overcloud baremetal_deployment.yaml
2.
3.

Actual results:

The growvols execution hangs when extending /dev/mapper/vg-lv_thinpool with 1.8 TB HDD. The error has been manually reproduced in two different nodes (computes).

For nodes with SDD < 1T, the problem does not occur (see attached growvols_sdd.txt):

[root@compute-0 ~]# lvextend -L+954246103040B /dev/mapper/vg-lv_thinpool /dev/sda6
  Size of logical volume vg/lv_thinpool_tdata changed from <4.93 GiB (1261 extents) to <893.64 GiB (228771 extents).
  Logical volume vg/lv_thinpool successfully resized.

For nodes with HDD 1.8T, the following error occurs (see attached growvols_hdd.txt):

[root@compute-1 ~]# lvextend -L+1994446077952B /dev/mapper/vg-lv_thinpool /dev/sda6
  Size of logical volume vg/lv_thinpool_tdata changed from <4.93 GiB (1261 extents) to <1.82 TiB (476774 extents).
  device-mapper: resume ioctl on  (253:2) failed: No space left on device
  Unable to resume vg-lv_thinpool-tpool (253:2).
  Problem reactivating logical volume vg/lv_thinpool.

Apparently in both cases the size (in bytes) calculation match PE free.

Expected results:

Logical volume vg/lv_thinpool successfully resized for HDD as well


Additional info:
growvols_sdd: https://paste.opendev.org/show/817803/
growvols_hdd: https://paste.opendev.org/show/817804/

Comment 1 Steve Baker 2022-12-08 04:22:25 UTC

I've replicated this with a 2TB drive, investigating now.

Comment 2 Steve Baker 2022-12-18 22:27:02 UTC

The problem is the thin pool metadata volume is (deliberately) small on the image, and it is filling up when growing the pool on a large disk.

Advice[1] suggests a 1GiB metadata volume is a reasonable default, a fix will be proposed which grows the metadata volume by this amount before growing the pool.

[1] https://access.redhat.com/solutions/6318131#:~:text=Size%20of%20pool%20metadata%20LV,from%202MiB%20to%20approximately%2016GiB.

Comment 3 Steve Baker 2022-12-19 20:50:59 UTC

*** Bug 2151335 has been marked as a duplicate of this bug. ***

Comment 12 Steve Baker 2023-02-13 21:35:06 UTC

*** Bug 2162434 has been marked as a duplicate of this bug. ***

Comment 13 Steve Baker 2023-03-20 19:45:32 UTC

*** Bug 2178905 has been marked as a duplicate of this bug. ***

Comment 14 Ricardo Diaz 2023-03-23 08:36:34 UTC

Fixed:

[stack@undercloud-0 ~]$ cat /etc/redhat-release 
Red Hat Enterprise Linux release 9.2 Beta (Plow)
[stack@undercloud-0 ~]$ cat /etc/rhosp-release 
Red Hat OpenStack Platform release 17.1.0 Beta (Wallaby)

[root@compute-1 ~]# lvs -a
  LV                                             VG                                        Attr       LSize  Pool        Origin Data%  Meta%  Move Log Cpy%Sync Convert
  lv_audit                                       vg                                        Vwi-aotz-- <2.05g lv_thinpool        2.85                                   
  lv_home                                        vg                                        Vwi-aotz--  1.16g lv_thinpool        0.90                                   
  lv_log                                         vg                                        Vwi-aotz-- <9.55g lv_thinpool        39.53                                  
  lv_root                                        vg                                        Vwi-aotz-- 10.52g lv_thinpool        15.97                                  
  lv_srv                                         vg                                        Vwi-aotz-- 48.00m lv_thinpool        15.62                                  
  lv_thinpool                                    vg                                        twi-aotz-- <1.82t                    0.61   2.63                            
  [lv_thinpool_tdata]                            vg                                        Twi-ao---- <1.82t                                                           
  [lv_thinpool_tmeta]                            vg                                        ewi-ao---- <1.01g                                                           
  lv_tmp                                         vg                                        Vwi-aotz--  1.16g lv_thinpool        7.87                                   
  lv_var                                         vg                                        Vwi-aotz--  1.79t lv_thinpool        0.32                                   
  [lvol0_pmspare]                                vg                                        ewi-------  8.00m

Comment 22 Pavel Cahyna 2023-08-10 12:25:04 UTC

(In reply to Ricardo Diaz from comment #14)
> Fixed:
> 
> [stack@undercloud-0 ~]$ cat /etc/redhat-release 
> Red Hat Enterprise Linux release 9.2 Beta (Plow)
> [stack@undercloud-0 ~]$ cat /etc/rhosp-release 
> Red Hat OpenStack Platform release 17.1.0 Beta (Wallaby)
> 
> [root@compute-1 ~]# lvs -a
>   LV                                             VG                         
> Attr       LSize  Pool        Origin Data%  Meta%  Move Log Cpy%Sync Convert
>   [lv_thinpool_tmeta]                            vg                         
> ewi-ao---- <1.01g                                                           
>   [lvol0_pmspare]                                vg                         
> ewi-------  8.00m

While the metadata volume was extended to 1 GB, the spare pool metadata volume remains at the original size of 8 MB, which is different from a freshly created pool (which would have a spare pool metadata LV of the same size as the pool metadata). This causes an issue when restoring from backup, encountered by a customer: https://bugzilla.redhat.com/show_bug.cgi?id=2222899#c58

Comment 23 Pavel Cahyna 2023-08-10 16:01:15 UTC

I have verified that lvextend behaves properly, i.e. extends the spare volume as well together with the metadata volume:

# lvs -a
  LV              VG                  Attr       LSize Pool   Origin Data%  Meta%  Move Log Cpy%Sync Convert
  [lvol0_pmspare] rhel_kvm-08-guest24 ewi------- 8.00m                                                      
  pool00          rhel_kvm-08-guest24 twi-aotz-- 6.17g               47.09  23.24                           
  [pool00_tdata]  rhel_kvm-08-guest24 Twi-ao---- 6.17g                                                      
  [pool00_tmeta]  rhel_kvm-08-guest24 ewi-ao---- 8.00m                                                      

# lvextend --poolmetadatasize +16M /dev/rhel_kvm-08-guest24/pool00
  Size of logical volume rhel_kvm-08-guest24/pool00_tmeta changed from 8.00 MiB (2 extents) to 24.00 MiB (6 extents).
  Logical volume rhel_kvm-08-guest24/pool00_tmeta successfully resized.
# lvs -a
  LV              VG                  Attr       LSize  Pool   Origin Data%  Meta%  Move Log Cpy%Sync Convert
  [lvol0_pmspare] rhel_kvm-08-guest24 ewi------- 24.00m                                                      
  pool00          rhel_kvm-08-guest24 twi-aotz--  6.17g               47.09  14.42                           
  [pool00_tdata]  rhel_kvm-08-guest24 Twi-ao----  6.17g                                                      
  [pool00_tmeta]  rhel_kvm-08-guest24 ewi-ao---- 24.00m                                                      

If it has not happened, it must be because there is not enough space left:

# lvextend --poolmetadatasize +1G /dev/rhel_kvm-08-guest24/pool00
  Size of logical volume rhel_kvm-08-guest24/pool00_tmeta changed from 24.00 MiB (6 extents) to 1.02 GiB (262 extents).
  Insufficient free space: 256 extents needed, but only 19 available
  Logical volume rhel_kvm-08-guest24/pool00_tmeta successfully resized.
# lvs -a
  LV              VG                  Attr       LSize  Pool   Origin Data%  Meta%  Move Log Cpy%Sync Convert
  [lvol0_pmspare] rhel_kvm-08-guest24 ewi------- 24.00m                                                      
  pool00          rhel_kvm-08-guest24 twi-aotz--  6.17g               47.09  1.63                            
  [pool00_tdata]  rhel_kvm-08-guest24 Twi-ao----  6.17g                                                      
  [pool00_tmeta]  rhel_kvm-08-guest24 ewi-ao----  1.02g                                                      

(aren't you getting this message regularly during growvols invocation?)

and in the case of growvols, it looks like being caused by this:

    if thin_pool:
        # total size available, reduced by POOL_METADATA_SIZE
        # rounded down to whole extent
        size_bytes -= POOL_METADATA_SIZE
        size_bytes -= size_bytes % PHYSICAL_EXTENT_BYTES
    dev_path = '/dev/%s' % devname

( https://review.opendev.org/c/openstack/diskimage-builder/+/868049/1/diskimage_builder/elements/growvols/static/usr/local/sbin/growvols#511 )
- it should reserve 2*POOL_METADATA_SIZE to account for the spare metadata volume increase.

Comment 28 errata-xmlrpc 2023-08-16 01:12:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:4577

Note You need to log in before you can comment on or make changes to this bug.