Bug 2066349 - [RFE] Leave free space at the end of the overcloud-hardened-uefi-full image
Summary: [RFE] Leave free space at the end of the overcloud-hardened-uefi-full image
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-image-elements
Version: 17.1 (Wallaby)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: beta
: 17.1
Assignee: Steve Baker
QA Contact: nlevinki
URL:
Whiteboard:
: 2106154 (view as bug list)
Depends On:
Blocks: 2069624
TreeView+ depends on / blocked
 
Reported: 2022-03-21 14:57 UTC by Juan Larriba
Modified: 2023-08-16 01:11 UTC (History)
15 users (show)

Fixed In Version: openstack-tripleo-image-elements-13.1.3-1.20221022033813.cfc336b.el8ost
Doc Type: Enhancement
Doc Text:
With this enhancement, the LVM volumes installed by the `overcloud-hardened-uefi-full.qcow2` whole disk overcloud image are now backed by a thin pool. The volumes are still grown to consume the available physical storage, but are not over-provisioned by default. + The benefits of thin-provisioned logical volumes: + * If a volume fills to capacity, the options for manual intervention now include growing the volume to over-provision the physical storage capacity. * The RHOSP upgrades process can now create ephemeral backup volumes in thin-provisioned environments.
Clone Of:
Environment:
Last Closed: 2023-08-16 01:11:06 UTC
Target Upstream Version:
Embargoed:
fdiazbra: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 840144 0 None MERGED Support LVM thin provisioning 2022-09-12 14:15:15 UTC
OpenStack gerrit 847860 0 None MERGED Do dmsetup remove device in rollback 2022-09-12 14:15:16 UTC
OpenStack gerrit 848688 0 None MERGED Add thin provisioning support to growvols 2022-09-12 14:15:17 UTC
OpenStack gerrit 855840 0 None MERGED Add a separate /boot partition 2022-10-27 08:52:30 UTC
OpenStack gerrit 855841 0 None MERGED Switch to MiB for all partition/volume sizes 2022-10-27 08:52:30 UTC
OpenStack gerrit 855842 0 None MERGED Switch to LVM thin provisioning 2022-10-27 08:52:31 UTC
Red Hat Issue Tracker OSP-14078 0 None None None 2022-03-21 15:26:16 UTC
Red Hat Product Errata RHEA-2023:4577 0 None None None 2023-08-16 01:11:35 UTC

Description Juan Larriba 2022-03-21 14:57:00 UTC
Currently, the RHEL9 image for RHOSP17 has LVM enabled on the disk, partitioned as per instructed in https://opendev.org/openstack/tripleo-image-elements/src/branch/master/elements/overcloud-partition-uefi/block-device-default.yaml.

However, this LVM partition takes 100% of the disk, without leaving unused space at the end of the partitions.

If there was, let's say, 20% of the disk left free at the end, that would give much more flexibility and the possibility of using the LVM Snapshots feature to take snapshots of the disk just before doing potentially destructive actions, such as upgrades.

The idea would be to change the current partition:

        - name: lv_root
          base: vg
          extents: 69%VG
        - name: lv_tmp
          base: vg
          extents: 4%VG
        - name: lv_var
          base: vg
          extents: 15%VG
        - name: lv_log
          base: vg
          extents: 4%VG
        - name: lv_audit
          base: vg
          extents: 3%VG
        - name: lv_home
          base: vg
          extents: 4%VG
        - name: lv_srv
          base: vg
          extents: 1%VG

to a one that does only adds up to 80%:

        - name: lv_root
          base: vg
          extents: 49%VG
        - name: lv_tmp
          base: vg
          extents: 4%VG
        - name: lv_var
          base: vg
          extents: 15%VG
        - name: lv_log
          base: vg
          extents: 4%VG
        - name: lv_audit
          base: vg
          extents: 3%VG
        - name: lv_home
          base: vg
          extents: 4%VG
        - name: lv_srv
          base: vg
          extents: 1%VG

Comment 1 Steve Baker 2022-03-21 22:33:00 UTC
Partition sizes on the built image only need to be big enough for the image installed contents. When the image is deployed to baremetal the volumes are expanded using the growvols[1] utility which can be given custom grow values[2] based on percentage of remaining disk or absolute values. The default growvols arguments are:
        /=8GB
        /tmp=1GB
        /var/log=10GB
        /var/log/audit=2GB
        /home=1GB
        /var=100%

I think we need a clearer idea of snapshot space requirements, 20% takes a lot of space meant for /var. Maybe not that much is required when it only ever needs to handle short-lived thin snapshots for upgrade tasks. Ideally the size can be expressed in absolute terms instead of a percentage. Making /var smaller has an impact in CI with small disks, see [3].

There is actually quite a bit of development required here, I think it will require an RFE and getting it into 17.0 would be a big stretch. The tasks would be something like:
- create a small lv_snapshots on the image, just like lv_srv
- decide on growvols defaults to grow lv_snapshots, ideally in absolute storage, not percentage
- ensure growvols supports growing a volume even when it doesn't have a mount point
- manage the CI breakage for cases where /var is no longer big enough

Only then would you have an empty volume which can be replaced by upgrade activity with whatever snapshot volumes you require.

[1] https://opendev.org/openstack/diskimage-builder/src/branch/master/diskimage_builder/elements/growvols/README.rst
[2] https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/provisioning/baremetal_provision.html#grow-volumes-playbook
[3] https://trello.com/c/KPJoaarA/2402-cixpipelineosp17ceph-rhel9-ceph-storage-nodes-run-out-of-space-pulling-ceph-container

Comment 2 Juan Larriba 2022-03-22 15:26:48 UTC
That seems like a plan to me. I think it might be a little optimistic trying to get this into 17.0, but it might be worth. I will discuss it with the rest of the team to check on the real priority this could be given.

To get an absolute number for the lv_snapshots storage first we need to do some tests, as for now we do not really know how much snapshot space an upgrade will take. It is not easy to find out, as 16.2 does not provide LVM-enabled images, so we cannot test the snapshots from 16.1 to 17.0. And from 17.0 we still cannot upgrade to 17.1 which does not exist yet. Let's see how can we get to it.

Comment 3 Juan Larriba 2022-04-20 12:21:50 UTC
We are currently working on this feature for 17.1, taking in account your suggestion of having an lv_snapshots lv created that the backup and restore procedure will delete to get free space so it can create the snapshots in.

I have been consulting with the rest of the Upgrades people and, even we have no hard data, we think that 16GB should be enough space to do an upgrade.

Would it be possible to add a 16GB lv_snapshots logical volume on 17.1 hardened uefi images?

Comment 11 Steve Baker 2022-09-05 01:24:16 UTC
The changes have now merged in master, now tracking the wallaby backports

Comment 15 Steve Baker 2022-09-26 20:01:39 UTC
*** Bug 2106154 has been marked as a duplicate of this bug. ***

Comment 29 Steve Baker 2023-03-14 20:56:15 UTC
This has been implemented with thin provisioning, so the VG is fully consumed by the thin pool called lv_thinpool. You can create new volumes or snapshots backed by that pool, see the RHEL-9 docs:

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/configuring_and_managing_logical_volumes/creating-and-managing-thin-provisioned-volumes_configuring-and-managing-logical-volumes#doc-wrapper

Comment 40 errata-xmlrpc 2023-08-16 01:11:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:4577


Note You need to log in before you can comment on or make changes to this bug.