Created attachment 1928643 [details] Manually execution of growvols commands Description of problem: When doing the overcloud node provision, the execution of /usr/share/ansible/tripleo-playbooks/cli-overcloud-node-growvols.yaml playbook hangs when running growvols ("/usr/local/sbin/growvols /=8GB /tmp=1GB /var/log=10GB /var/log/audit=2GB /home=1GB /var=100%"). Version-Release number of selected component (if applicable): Red Hat OpenStack Platform release 17.1.0 Beta (Wallaby) Red Hat Enterprise Linux release 9.1 (Plow) How reproducible: 100% (apparently only for HDD:s (rotational) or > 1T) Steps to Reproduce: 1. Run: openstack overcloud node provision --network-config --stack overcloud baremetal_deployment.yaml 2. 3. Actual results: The growvols execution hangs when extending /dev/mapper/vg-lv_thinpool with 1.8 TB HDD. The error has been manually reproduced in two different nodes (computes). For nodes with SDD < 1T, the problem does not occur (see attached growvols_sdd.txt): [root@compute-0 ~]# lvextend -L+954246103040B /dev/mapper/vg-lv_thinpool /dev/sda6 Size of logical volume vg/lv_thinpool_tdata changed from <4.93 GiB (1261 extents) to <893.64 GiB (228771 extents). Logical volume vg/lv_thinpool successfully resized. For nodes with HDD 1.8T, the following error occurs (see attached growvols_hdd.txt): [root@compute-1 ~]# lvextend -L+1994446077952B /dev/mapper/vg-lv_thinpool /dev/sda6 Size of logical volume vg/lv_thinpool_tdata changed from <4.93 GiB (1261 extents) to <1.82 TiB (476774 extents). device-mapper: resume ioctl on (253:2) failed: No space left on device Unable to resume vg-lv_thinpool-tpool (253:2). Problem reactivating logical volume vg/lv_thinpool. Apparently in both cases the size (in bytes) calculation match PE free. Expected results: Logical volume vg/lv_thinpool successfully resized for HDD as well Additional info: growvols_sdd: https://paste.opendev.org/show/817803/ growvols_hdd: https://paste.opendev.org/show/817804/
I've replicated this with a 2TB drive, investigating now.
The problem is the thin pool metadata volume is (deliberately) small on the image, and it is filling up when growing the pool on a large disk. Advice[1] suggests a 1GiB metadata volume is a reasonable default, a fix will be proposed which grows the metadata volume by this amount before growing the pool. [1] https://access.redhat.com/solutions/6318131#:~:text=Size%20of%20pool%20metadata%20LV,from%202MiB%20to%20approximately%2016GiB.
*** Bug 2151335 has been marked as a duplicate of this bug. ***
*** Bug 2162434 has been marked as a duplicate of this bug. ***
*** Bug 2178905 has been marked as a duplicate of this bug. ***
Fixed: [stack@undercloud-0 ~]$ cat /etc/redhat-release Red Hat Enterprise Linux release 9.2 Beta (Plow) [stack@undercloud-0 ~]$ cat /etc/rhosp-release Red Hat OpenStack Platform release 17.1.0 Beta (Wallaby) [root@compute-1 ~]# lvs -a LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert lv_audit vg Vwi-aotz-- <2.05g lv_thinpool 2.85 lv_home vg Vwi-aotz-- 1.16g lv_thinpool 0.90 lv_log vg Vwi-aotz-- <9.55g lv_thinpool 39.53 lv_root vg Vwi-aotz-- 10.52g lv_thinpool 15.97 lv_srv vg Vwi-aotz-- 48.00m lv_thinpool 15.62 lv_thinpool vg twi-aotz-- <1.82t 0.61 2.63 [lv_thinpool_tdata] vg Twi-ao---- <1.82t [lv_thinpool_tmeta] vg ewi-ao---- <1.01g lv_tmp vg Vwi-aotz-- 1.16g lv_thinpool 7.87 lv_var vg Vwi-aotz-- 1.79t lv_thinpool 0.32 [lvol0_pmspare] vg ewi------- 8.00m
(In reply to Ricardo Diaz from comment #14) > Fixed: > > [stack@undercloud-0 ~]$ cat /etc/redhat-release > Red Hat Enterprise Linux release 9.2 Beta (Plow) > [stack@undercloud-0 ~]$ cat /etc/rhosp-release > Red Hat OpenStack Platform release 17.1.0 Beta (Wallaby) > > [root@compute-1 ~]# lvs -a > LV VG > Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert > [lv_thinpool_tmeta] vg > ewi-ao---- <1.01g > [lvol0_pmspare] vg > ewi------- 8.00m While the metadata volume was extended to 1 GB, the spare pool metadata volume remains at the original size of 8 MB, which is different from a freshly created pool (which would have a spare pool metadata LV of the same size as the pool metadata). This causes an issue when restoring from backup, encountered by a customer: https://bugzilla.redhat.com/show_bug.cgi?id=2222899#c58
I have verified that lvextend behaves properly, i.e. extends the spare volume as well together with the metadata volume: # lvs -a LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert [lvol0_pmspare] rhel_kvm-08-guest24 ewi------- 8.00m pool00 rhel_kvm-08-guest24 twi-aotz-- 6.17g 47.09 23.24 [pool00_tdata] rhel_kvm-08-guest24 Twi-ao---- 6.17g [pool00_tmeta] rhel_kvm-08-guest24 ewi-ao---- 8.00m # lvextend --poolmetadatasize +16M /dev/rhel_kvm-08-guest24/pool00 Size of logical volume rhel_kvm-08-guest24/pool00_tmeta changed from 8.00 MiB (2 extents) to 24.00 MiB (6 extents). Logical volume rhel_kvm-08-guest24/pool00_tmeta successfully resized. # lvs -a LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert [lvol0_pmspare] rhel_kvm-08-guest24 ewi------- 24.00m pool00 rhel_kvm-08-guest24 twi-aotz-- 6.17g 47.09 14.42 [pool00_tdata] rhel_kvm-08-guest24 Twi-ao---- 6.17g [pool00_tmeta] rhel_kvm-08-guest24 ewi-ao---- 24.00m If it has not happened, it must be because there is not enough space left: # lvextend --poolmetadatasize +1G /dev/rhel_kvm-08-guest24/pool00 Size of logical volume rhel_kvm-08-guest24/pool00_tmeta changed from 24.00 MiB (6 extents) to 1.02 GiB (262 extents). Insufficient free space: 256 extents needed, but only 19 available Logical volume rhel_kvm-08-guest24/pool00_tmeta successfully resized. # lvs -a LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert [lvol0_pmspare] rhel_kvm-08-guest24 ewi------- 24.00m pool00 rhel_kvm-08-guest24 twi-aotz-- 6.17g 47.09 1.63 [pool00_tdata] rhel_kvm-08-guest24 Twi-ao---- 6.17g [pool00_tmeta] rhel_kvm-08-guest24 ewi-ao---- 1.02g (aren't you getting this message regularly during growvols invocation?) and in the case of growvols, it looks like being caused by this: if thin_pool: # total size available, reduced by POOL_METADATA_SIZE # rounded down to whole extent size_bytes -= POOL_METADATA_SIZE size_bytes -= size_bytes % PHYSICAL_EXTENT_BYTES dev_path = '/dev/%s' % devname ( https://review.opendev.org/c/openstack/diskimage-builder/+/868049/1/diskimage_builder/elements/growvols/static/usr/local/sbin/growvols#511 ) - it should reserve 2*POOL_METADATA_SIZE to account for the spare metadata volume increase.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2023:4577