Currently, the RHEL9 image for RHOSP17 has LVM enabled on the disk, partitioned as per instructed in https://opendev.org/openstack/tripleo-image-elements/src/branch/master/elements/overcloud-partition-uefi/block-device-default.yaml. However, this LVM partition takes 100% of the disk, without leaving unused space at the end of the partitions. If there was, let's say, 20% of the disk left free at the end, that would give much more flexibility and the possibility of using the LVM Snapshots feature to take snapshots of the disk just before doing potentially destructive actions, such as upgrades. The idea would be to change the current partition: - name: lv_root base: vg extents: 69%VG - name: lv_tmp base: vg extents: 4%VG - name: lv_var base: vg extents: 15%VG - name: lv_log base: vg extents: 4%VG - name: lv_audit base: vg extents: 3%VG - name: lv_home base: vg extents: 4%VG - name: lv_srv base: vg extents: 1%VG to a one that does only adds up to 80%: - name: lv_root base: vg extents: 49%VG - name: lv_tmp base: vg extents: 4%VG - name: lv_var base: vg extents: 15%VG - name: lv_log base: vg extents: 4%VG - name: lv_audit base: vg extents: 3%VG - name: lv_home base: vg extents: 4%VG - name: lv_srv base: vg extents: 1%VG
Partition sizes on the built image only need to be big enough for the image installed contents. When the image is deployed to baremetal the volumes are expanded using the growvols[1] utility which can be given custom grow values[2] based on percentage of remaining disk or absolute values. The default growvols arguments are: /=8GB /tmp=1GB /var/log=10GB /var/log/audit=2GB /home=1GB /var=100% I think we need a clearer idea of snapshot space requirements, 20% takes a lot of space meant for /var. Maybe not that much is required when it only ever needs to handle short-lived thin snapshots for upgrade tasks. Ideally the size can be expressed in absolute terms instead of a percentage. Making /var smaller has an impact in CI with small disks, see [3]. There is actually quite a bit of development required here, I think it will require an RFE and getting it into 17.0 would be a big stretch. The tasks would be something like: - create a small lv_snapshots on the image, just like lv_srv - decide on growvols defaults to grow lv_snapshots, ideally in absolute storage, not percentage - ensure growvols supports growing a volume even when it doesn't have a mount point - manage the CI breakage for cases where /var is no longer big enough Only then would you have an empty volume which can be replaced by upgrade activity with whatever snapshot volumes you require. [1] https://opendev.org/openstack/diskimage-builder/src/branch/master/diskimage_builder/elements/growvols/README.rst [2] https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/provisioning/baremetal_provision.html#grow-volumes-playbook [3] https://trello.com/c/KPJoaarA/2402-cixpipelineosp17ceph-rhel9-ceph-storage-nodes-run-out-of-space-pulling-ceph-container
That seems like a plan to me. I think it might be a little optimistic trying to get this into 17.0, but it might be worth. I will discuss it with the rest of the team to check on the real priority this could be given. To get an absolute number for the lv_snapshots storage first we need to do some tests, as for now we do not really know how much snapshot space an upgrade will take. It is not easy to find out, as 16.2 does not provide LVM-enabled images, so we cannot test the snapshots from 16.1 to 17.0. And from 17.0 we still cannot upgrade to 17.1 which does not exist yet. Let's see how can we get to it.
We are currently working on this feature for 17.1, taking in account your suggestion of having an lv_snapshots lv created that the backup and restore procedure will delete to get free space so it can create the snapshots in. I have been consulting with the rest of the Upgrades people and, even we have no hard data, we think that 16GB should be enough space to do an upgrade. Would it be possible to add a 16GB lv_snapshots logical volume on 17.1 hardened uefi images?
The changes have now merged in master, now tracking the wallaby backports
*** Bug 2106154 has been marked as a duplicate of this bug. ***
This has been implemented with thin provisioning, so the VG is fully consumed by the thin pool called lv_thinpool. You can create new volumes or snapshots backed by that pool, see the RHEL-9 docs: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/configuring_and_managing_logical_volumes/creating-and-managing-thin-provisioned-volumes_configuring-and-managing-logical-volumes#doc-wrapper
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2023:4577