1832730 – [RFE] Leapp should estimate the needed size according to the existing partitions

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1832730 - [RFE] Leapp should estimate the needed size according to the existing partitions

Summary: [RFE] Leapp should estimate the needed size according to the existing partitions

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	leapp-repository
Sub Component:
Version:	7.8
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Leapp Notifications Bot
QA Contact:	upgrades-and-conversions
Docs Contact:
URL:
Whiteboard:
Depends On:	2125195
Blocks:	1818077 1818088
TreeView+	depends on / blocked

Reported:	2020-05-07 08:08 UTC by Christophe Besson
Modified:	2023-12-15 17:51 UTC (History)
CC List:	11 users (show)
Fixed In Version:	leapp-repository-0.18.0-5.el7_9
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-11-16 06:56:41 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	OAMG-3206	0	None	None	None	2023-05-11 08:06:56 UTC
Red Hat Knowledge Base (Solution)	5057391	0	None	None	None	2020-05-07 08:55:34 UTC

Description Christophe Besson 2020-05-07 08:08:48 UTC

Description of problem:
While trying an in-place upgrade with leapp, some users are facing the following issue:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
At least X MB more space needed on the / filesystem
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This message appears to be confusing since in many cases, the user should extend the /var partitions (which was a limitation in previous releases).

Additionally, in some cases with an XFS filesystem, leapp creates a temporary EXT4 image used as the overlay upper directory, and sometimes its default size of 2GB is not enough. In this case, suggesting to increase this size with the LEAPP_OVL_SIZE environment variable could be welcome.

Version-Release number of selected component (if applicable):
leapp-repository-0.10.0-2.el7_8.noarch

How reproducible:
100%

Steps to Reproduce:
1. Create a separate /var filesystem with a small size, let's say 512MB.
2. Try to upgrade to RHEL 8 with leapp

Actual results:
At least X MB more space needed on the / filesystem

Expected results:
More accurate error message with ideally an estimated needed size for the upgrade (like Anaconda does, with an over-estimation afaik).

Additional info:
N/A

Comment 3 Petr Stodulka 2020-05-07 10:45:59 UTC

Hi Christophe,
thanks for the report. I will just add more context around the err msg:
---
Disk Requirements:
   At least 1044MB more space needed on the / filesystem.
---

The msg is from DNF and usually occurs during download of rpms.
You are completely right that the suggestion to set the LEAPP_OVL_SIZE would help a lot. We already planned to improve messages for issues like this. We could write message like:
  Please set the environment LEAPP_OVL_SIZE variable with value higher than Y

where Y == current value + the missing one + some suggested reserve

Currently this is reported as known issue in the upgrade documentation. We will probably not estimate the real needed size in any different way the DNF does as this is very complex problem. E.g. we realize what is the "real" needed size after we created these files already. In this case we will expect that people will run leapp again with the properly set value of the envar. Maybe we change it in future, but currently I prefer to go just with changed err msg.

Comment 4 Christophe Besson 2020-05-07 11:07:39 UTC

Hi Petr,

to my point of view, that looks good.
And I understand giving a precise estimated size is empirical and error prone.

Thanks for this improvement.

Comment 5 Christophe Besson 2020-07-27 15:03:11 UTC

Hello,

another use case hits this need. The issue appears after the leapp upgrade step, in the reboot sequence:
~~~
[  128.977515] localhost upgrade[754]:   installing package nss-softokn-freebl-3.44.0-15.el8.i686 needs 176MB on the /usr filesystem
[  128.977515] localhost upgrade[754]: Error Summary
[  128.977515] localhost upgrade[754]: -------------
[  128.977515] localhost upgrade[754]: Disk Requirements:
[  128.977515] localhost upgrade[754]:    At least 176MB more space needed on the /usr filesystem.
[  128.977515] localhost upgrade[754]: Container el8userspace failed with error code 1.
~~~

As per the sosreport, there was 1.1GB remaining in this partition.
~~~
/dev/mapper/vg_root-lv_root          999320   61804    868704   7% /
/dev/mapper/vg_root-lv_usr          3030800 1723712   1133420  61% /usr
/dev/mapper/vg_root-lv_tmp          3997376   25824   3745456   1% /tmp
/dev/sda1                            499656  141016    321944  31% /boot
/dev/mapper/vg_root-lv_var         11791528 1745600   9516584  16% /var
/dev/mapper/vg_root-lv_usr--local    999320  142100    788408  16% /usr/local
/dev/mapper/vg_root-lv_home         1998672  199092   1678340  11% /home
/dev/mapper/vg_root-lv_tmp          3997376   25824   3745456   1% /var/tmp
~~~

=> Uploading the whole rdsosreport.txt as a private attachment.
=> Increasing the severity to high, as the customer is encountering this issue in the reboot step (and also hits this one: rhbz#1839785).

Comment 12 Petr Stodulka 2022-03-18 10:58:02 UTC

Hi guys,
I think we can lower the severity again as we added one additional dry-run prior the reboot phase to prevent problems during the upgrade phases. We expect that people should be completely safe from that point with the next release. However, we should keep the ticket opened as we still could improve the solution, especially in case of systems with XFS partitions formatted without the ftype attribute.

Comment 13 Renaud Métrich 2022-03-18 11:16:09 UTC

It's ok to downgrade for me.

Could we ask FS guys regarding the benefits/drawbacks of having "ftype=0" compared to "ftype=1"?
I'm asking because staying with "ftype=0" upon upgrade will let us have the same issues for leap rhel9, then rhel10, etc.

Maybe a solution would be to migrate systems from "ftype=0" to "ftype=1" if there are benefits (e.g. better perfs).
It's possible to do so, it requires a backup/restore from Troubleshooting mode, but it's feasible for sure and would be a one-shot operation.

Renaud.

Comment 14 Petr Stodulka 2022-03-18 13:47:19 UTC

The migration ftype=0 to ftype=1 basically means they have to reformat affected partitions, which in many cases practically means that systems have to be reinstalled.


I think the main benefit for ftype=1 is containerisation, as XFS without ftype is not compatible with containerisation technologies that usually relies on overlayfs. Which is basically the reason why we have to woarkound that problem as IPU relies on overlayfs technology as well.
 
I am curious about other benefits/drawbacks, but I think that XFS without ftype is usually present just because of the unfortunate default that has been choosen for RHEL 7.0 -> 7.2.

Comment 15 Renaud Métrich 2022-03-18 14:37:56 UTC

That's why providing a procedure to fix this once could be beneficial.
Actually there is no need to reinstall, there is some need to backup the filesystem in question and restore it after reformatting and keeping exact same UUID.

Comment 16 Steven Ellis 2022-05-30 04:40:16 UTC

Note that this is still an ongoing issue if a RHEL7.x system is upgraded to RHEL8 and then upto RHEL9.

We either need a way to update the xfs partitions to "ftype=1" without backup/restore of the xfs data, or we need a better feedback around the LEAPP disk size required.

Comment 17 Petr Stodulka 2022-05-30 08:06:30 UTC

Steven, from All discussions I read around XFS, the only way is to reformat the partition completely - so it's not possible to do it without backup & restore operation. Around the disk size, I've started discussion with an RPM team as currently any better update is blocked on the fact that RPM is not providing any information about the calculated needed disk size for each partition (and rpm has it).

Only other workaround we could think about is start to use qcow images - which as you can imagine, is pretty ugly workaround.

Comment 19 Petr Stodulka 2022-07-25 08:57:51 UTC

*** Bug 2110045 has been marked as a duplicate of this bug. ***

Comment 20 Panu Matilainen 2022-09-09 07:49:45 UTC

I'm trying to wrap my head around this thing. I've zero clue about leapp internals (or otherwise, for that matter) so bear with me a bit. Currently the diskspace usage info in rpm is short-lived data inside the transaction prepare stage only, and exposing it in any meaningful manner is not an entirely trivial matter, I need to properly understand the case so anything we may come up with actually serves the cause. 

Based on what I've read here and private email exchanges, leapp is running a test-transaction with a setup where the actual partition layout doesn't match the real layout. Something like, the test-transaction / is actually an image inside the host /var and on a small /var this will obviously be insufficient? And what you'd like to do, is to toss the test-transaction to rpm to calculate diskspace requirements, take the calculated per-fs usage figures and check/adjust against what leapp knows to be the real sizes? 

If / is an image of some sort, what about the other partitions? Rpm calculates the needs based on actual mount points, so there needs to be a matching layout in the test-transaction for the data to be worth anything.

As I've said in private exchanges, I don't really see us developing + backporting a complex new API thing into RHEL 8, or even 9. Once in RHEL, rpm is pretty much set in stone for stability guarantees. So while I'm not at all opposed to exposing this data upstream via a nice API once we figure out a good way to do that, we'll probably need something more minimal for RHEL. So I'm thinking maybe we could just add an API which allows callers to reserve space from a filesystem. That would have other uses, such as allowing dnf & friends to accound for their estimated disk usage, plus it could probably be (ab)used for this cause: reserve all space from all available mounts, run the test-transaction and in the resulting problem objects you have the actual disk space requirements. Or so the theory goes.

Comment 21 Panu Matilainen 2022-09-09 09:04:49 UTC

Hmm, so actually the "reserve all space" trick should be doable without any added APIs: fallocate or create a sparse file filling up all the related filesystems (or even mount read-only). Run the test-transaction and collect the biggest numbers per mountpoint from the returned problem objects, and there you have the per-fs disk-usages. It may not be entirely pretty, but it should be doable with any rpm version out there right now.

Comment 22 Petr Stodulka 2022-09-09 10:13:18 UTC

Hi Panu, I am willing to explain it to you on a call, so you understand why this is not possible to do really. Each mountpoint is hidden under overlayfs which is in /var/lib/leapp/..... It would be awesome to be able to tell RPM what sizes are real. But it's not what we need. We need to have just something like:
{
  "/mountpoint1" : <required_free_space_in_bytes>,
  "/mountpoint2" : <required_free_space_in_bytes>,
  ...
}

or whatever information about the space. In case RPM just dump these data in /var/log/.... we are happy about that as well. So we could check on ourselves whether there is enough space on each partition.

Comment 24 Panu Matilainen 2022-09-09 10:21:40 UTC

I'm not suggesting anything about telling rpm about real sizes. I'm suggesting you make whatever mountpoints you have there to appear full to rpm, and when you run the test-transaction on that situation and walk through the returned problem objects (I mean the API, not the message string), picking up the largest number per each partition, you'll get exactly those mountpoint:required_space pairs out.

Comment 25 Petr Stodulka 2022-09-09 10:35:23 UTC

Got it. However, that would mean that we consume sometimes tens GBs of space on /var. Consuming almost all space on /var/.. could have also negative impact e.g. when database is running on the system.

Comment 26 Panu Matilainen 2022-09-09 11:00:29 UTC

I was under the impression that you were operating on disk images or something rather than the real fs'es, exactly to avoid disrupting the actual OS.

At any rate, you don't need to use the real filesystems here at all, and the size of the fs'es (from loop-back image, bind-mount or whatever) doesn't matter at all, the smaller the fs the easier it is to handle of course. The only thing that matters for is the mount tree matches that of the target, and that they are full.

Comment 30 Greg Scott 2023-01-28 15:00:14 UTC

Well, this is an unexpected bit of fun. I just ran into this same problem with my HPE Proliant 380E G8, first installed with RHEL 7.early. *All* file systems on this system are XFS with the old default ftype=0. This system lives at the center of my little world here, doing NFS for a RHV environment with several VMs. And so, I would really like to in-place upgrade it instead of completely wiping and rebuilding everything with XFS filesystems with ftype=1.

I'm trying to run a leapp upgrade from RHEL 8.7 to RHEL 9.0 and I hit the "out of space" problem with the bogus error message that triggered this BZ.

The "Known Issues" section of the Upgrade Guide describes this problem and suggests changing the value for the LEAPP_OVL_SIZE environment variable. This KCS article, https://access.redhat.com/solutions/5057391, offers similar advice.

Wonderful. Just do this:
# LEAPP_OVL_SIZE=3072
# leapp upgrade --target 9.0

and it should work right? Well - no. Same exact problem.

Okay, what about changing /etc/leapp/leapp.conf to look like this:

# cat /etc/leapp/leapp.conf
LEAPP_OVL_SIZE=3072
[repositories]
repo_path=/etc/leapp/repos.d/

[database]
path=/var/lib/leapp/leapp.db

Nope. leapp upgrade immediately blows up because all settings apparently need to go under a section definition.

I haven't found a single article anywhere that says **how** to change that environment variable, just that I need to change it.

So - **how** do I set the environment variable, LEAPP_OVL_SIZE=3072, to make leapp upgrade happy?

Comment 31 Petr Stodulka 2023-01-30 09:47:59 UTC

Hi Greg, this is not how Shell is working. I recommend you to read e.g. the following article:
* https://devconnected.com/set-environment-variable-bash-how-to/

So you can do:
  # LEAPP_OVL_SIZE=3072
  # export LEAPP_OVL_SIZE
or just
  # LEAPP_OVL_SIZE=3072 leapp upgrade --target 9.0 

So from what you wrote above:
  # LEAPP_OVL_SIZE=3072
  # leapp upgrade --target 9.0
should not work, as the applications do not know anything about that environment variable that lives
in this case onl in the current terminal and it is not propagated to other applications. So e.g. here
you can see what's happening:
  $ MY_ENVAR=foo
  $ bash -c 'echo $MY_ENVAR'
  
  $ export MY_ENVAR
  $ bash -c 'echo $MY_ENVAR'
  foo



The mentioned modification of the configuration is invalid as it's ignoring the format of the configuraiton file. I suggest to return it back to the original state.

Comment 33 Greg Scott 2023-01-30 18:20:44 UTC

@Petr Stodulka - thanks. I put the config back the way it was and then looked deeper at https://access.redhat.com/solutions/5057391. It was right there but I missed it. @Christophe Besson - I added a couple sentences to that KCS to highlight that instruction to export LEAPP_OVL_SIZE=32.

- Greg

Comment 35 Petr Stodulka 2023-07-17 19:26:30 UTC

The original solution has been redesigned and should be fixed by upstream PR:
  https://github.com/oamg/leapp-repository/pull/1097

Comment 40 errata-xmlrpc 2023-11-16 06:56:41 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (leapp and leapp-repository bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:7230

Note You need to log in before you can comment on or make changes to this bug.