Bug 1832730
| Summary: | [RFE] Leapp should estimate the needed size according to the existing partitions | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Christophe Besson <cbesson> |
| Component: | leapp-repository | Assignee: | Leapp Notifications Bot <leapp-notifications-bot> |
| Status: | CLOSED ERRATA | QA Contact: | upgrades-and-conversions |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 7.8 | CC: | fkrska, gscott, leapp-notifications, mmoran, mportman, pmatilai, podvody, pstodulk, rmetrich, saydas, sellis |
| Target Milestone: | rc | Keywords: | FutureFeature, Reproducer |
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | leapp-repository-0.18.0-5.el7_9 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-11-16 06:56:41 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 2125195 | ||
| Bug Blocks: | 1818077, 1818088 | ||
|
Description
Christophe Besson
2020-05-07 08:08:48 UTC
Hi Christophe, thanks for the report. I will just add more context around the err msg: --- Disk Requirements: At least 1044MB more space needed on the / filesystem. --- The msg is from DNF and usually occurs during download of rpms. You are completely right that the suggestion to set the LEAPP_OVL_SIZE would help a lot. We already planned to improve messages for issues like this. We could write message like: Please set the environment LEAPP_OVL_SIZE variable with value higher than Y where Y == current value + the missing one + some suggested reserve Currently this is reported as known issue in the upgrade documentation. We will probably not estimate the real needed size in any different way the DNF does as this is very complex problem. E.g. we realize what is the "real" needed size after we created these files already. In this case we will expect that people will run leapp again with the properly set value of the envar. Maybe we change it in future, but currently I prefer to go just with changed err msg. Hi Petr, to my point of view, that looks good. And I understand giving a precise estimated size is empirical and error prone. Thanks for this improvement. Hello, another use case hits this need. The issue appears after the leapp upgrade step, in the reboot sequence: ~~~ [ 128.977515] localhost upgrade[754]: installing package nss-softokn-freebl-3.44.0-15.el8.i686 needs 176MB on the /usr filesystem [ 128.977515] localhost upgrade[754]: Error Summary [ 128.977515] localhost upgrade[754]: ------------- [ 128.977515] localhost upgrade[754]: Disk Requirements: [ 128.977515] localhost upgrade[754]: At least 176MB more space needed on the /usr filesystem. [ 128.977515] localhost upgrade[754]: Container el8userspace failed with error code 1. ~~~ As per the sosreport, there was 1.1GB remaining in this partition. ~~~ /dev/mapper/vg_root-lv_root 999320 61804 868704 7% / /dev/mapper/vg_root-lv_usr 3030800 1723712 1133420 61% /usr /dev/mapper/vg_root-lv_tmp 3997376 25824 3745456 1% /tmp /dev/sda1 499656 141016 321944 31% /boot /dev/mapper/vg_root-lv_var 11791528 1745600 9516584 16% /var /dev/mapper/vg_root-lv_usr--local 999320 142100 788408 16% /usr/local /dev/mapper/vg_root-lv_home 1998672 199092 1678340 11% /home /dev/mapper/vg_root-lv_tmp 3997376 25824 3745456 1% /var/tmp ~~~ => Uploading the whole rdsosreport.txt as a private attachment. => Increasing the severity to high, as the customer is encountering this issue in the reboot step (and also hits this one: rhbz#1839785). Hi guys, I think we can lower the severity again as we added one additional dry-run prior the reboot phase to prevent problems during the upgrade phases. We expect that people should be completely safe from that point with the next release. However, we should keep the ticket opened as we still could improve the solution, especially in case of systems with XFS partitions formatted without the ftype attribute. It's ok to downgrade for me. Could we ask FS guys regarding the benefits/drawbacks of having "ftype=0" compared to "ftype=1"? I'm asking because staying with "ftype=0" upon upgrade will let us have the same issues for leap rhel9, then rhel10, etc. Maybe a solution would be to migrate systems from "ftype=0" to "ftype=1" if there are benefits (e.g. better perfs). It's possible to do so, it requires a backup/restore from Troubleshooting mode, but it's feasible for sure and would be a one-shot operation. Renaud. The migration ftype=0 to ftype=1 basically means they have to reformat affected partitions, which in many cases practically means that systems have to be reinstalled. I think the main benefit for ftype=1 is containerisation, as XFS without ftype is not compatible with containerisation technologies that usually relies on overlayfs. Which is basically the reason why we have to woarkound that problem as IPU relies on overlayfs technology as well. I am curious about other benefits/drawbacks, but I think that XFS without ftype is usually present just because of the unfortunate default that has been choosen for RHEL 7.0 -> 7.2. That's why providing a procedure to fix this once could be beneficial. Actually there is no need to reinstall, there is some need to backup the filesystem in question and restore it after reformatting and keeping exact same UUID. Note that this is still an ongoing issue if a RHEL7.x system is upgraded to RHEL8 and then upto RHEL9. We either need a way to update the xfs partitions to "ftype=1" without backup/restore of the xfs data, or we need a better feedback around the LEAPP disk size required. Steven, from All discussions I read around XFS, the only way is to reformat the partition completely - so it's not possible to do it without backup & restore operation. Around the disk size, I've started discussion with an RPM team as currently any better update is blocked on the fact that RPM is not providing any information about the calculated needed disk size for each partition (and rpm has it). Only other workaround we could think about is start to use qcow images - which as you can imagine, is pretty ugly workaround. *** Bug 2110045 has been marked as a duplicate of this bug. *** I'm trying to wrap my head around this thing. I've zero clue about leapp internals (or otherwise, for that matter) so bear with me a bit. Currently the diskspace usage info in rpm is short-lived data inside the transaction prepare stage only, and exposing it in any meaningful manner is not an entirely trivial matter, I need to properly understand the case so anything we may come up with actually serves the cause. Based on what I've read here and private email exchanges, leapp is running a test-transaction with a setup where the actual partition layout doesn't match the real layout. Something like, the test-transaction / is actually an image inside the host /var and on a small /var this will obviously be insufficient? And what you'd like to do, is to toss the test-transaction to rpm to calculate diskspace requirements, take the calculated per-fs usage figures and check/adjust against what leapp knows to be the real sizes? If / is an image of some sort, what about the other partitions? Rpm calculates the needs based on actual mount points, so there needs to be a matching layout in the test-transaction for the data to be worth anything. As I've said in private exchanges, I don't really see us developing + backporting a complex new API thing into RHEL 8, or even 9. Once in RHEL, rpm is pretty much set in stone for stability guarantees. So while I'm not at all opposed to exposing this data upstream via a nice API once we figure out a good way to do that, we'll probably need something more minimal for RHEL. So I'm thinking maybe we could just add an API which allows callers to reserve space from a filesystem. That would have other uses, such as allowing dnf & friends to accound for their estimated disk usage, plus it could probably be (ab)used for this cause: reserve all space from all available mounts, run the test-transaction and in the resulting problem objects you have the actual disk space requirements. Or so the theory goes. Hmm, so actually the "reserve all space" trick should be doable without any added APIs: fallocate or create a sparse file filling up all the related filesystems (or even mount read-only). Run the test-transaction and collect the biggest numbers per mountpoint from the returned problem objects, and there you have the per-fs disk-usages. It may not be entirely pretty, but it should be doable with any rpm version out there right now. Hi Panu, I am willing to explain it to you on a call, so you understand why this is not possible to do really. Each mountpoint is hidden under overlayfs which is in /var/lib/leapp/..... It would be awesome to be able to tell RPM what sizes are real. But it's not what we need. We need to have just something like:
{
"/mountpoint1" : <required_free_space_in_bytes>,
"/mountpoint2" : <required_free_space_in_bytes>,
...
}
or whatever information about the space. In case RPM just dump these data in /var/log/.... we are happy about that as well. So we could check on ourselves whether there is enough space on each partition.
I'm not suggesting anything about telling rpm about real sizes. I'm suggesting you make whatever mountpoints you have there to appear full to rpm, and when you run the test-transaction on that situation and walk through the returned problem objects (I mean the API, not the message string), picking up the largest number per each partition, you'll get exactly those mountpoint:required_space pairs out. Got it. However, that would mean that we consume sometimes tens GBs of space on /var. Consuming almost all space on /var/.. could have also negative impact e.g. when database is running on the system. I was under the impression that you were operating on disk images or something rather than the real fs'es, exactly to avoid disrupting the actual OS. At any rate, you don't need to use the real filesystems here at all, and the size of the fs'es (from loop-back image, bind-mount or whatever) doesn't matter at all, the smaller the fs the easier it is to handle of course. The only thing that matters for is the mount tree matches that of the target, and that they are full. Well, this is an unexpected bit of fun. I just ran into this same problem with my HPE Proliant 380E G8, first installed with RHEL 7.early. *All* file systems on this system are XFS with the old default ftype=0. This system lives at the center of my little world here, doing NFS for a RHV environment with several VMs. And so, I would really like to in-place upgrade it instead of completely wiping and rebuilding everything with XFS filesystems with ftype=1. I'm trying to run a leapp upgrade from RHEL 8.7 to RHEL 9.0 and I hit the "out of space" problem with the bogus error message that triggered this BZ. The "Known Issues" section of the Upgrade Guide describes this problem and suggests changing the value for the LEAPP_OVL_SIZE environment variable. This KCS article, https://access.redhat.com/solutions/5057391, offers similar advice. Wonderful. Just do this: # LEAPP_OVL_SIZE=3072 # leapp upgrade --target 9.0 and it should work right? Well - no. Same exact problem. Okay, what about changing /etc/leapp/leapp.conf to look like this: # cat /etc/leapp/leapp.conf LEAPP_OVL_SIZE=3072 [repositories] repo_path=/etc/leapp/repos.d/ [database] path=/var/lib/leapp/leapp.db Nope. leapp upgrade immediately blows up because all settings apparently need to go under a section definition. I haven't found a single article anywhere that says **how** to change that environment variable, just that I need to change it. So - **how** do I set the environment variable, LEAPP_OVL_SIZE=3072, to make leapp upgrade happy? Hi Greg, this is not how Shell is working. I recommend you to read e.g. the following article: * https://devconnected.com/set-environment-variable-bash-how-to/ So you can do: # LEAPP_OVL_SIZE=3072 # export LEAPP_OVL_SIZE or just # LEAPP_OVL_SIZE=3072 leapp upgrade --target 9.0 So from what you wrote above: # LEAPP_OVL_SIZE=3072 # leapp upgrade --target 9.0 should not work, as the applications do not know anything about that environment variable that lives in this case onl in the current terminal and it is not propagated to other applications. So e.g. here you can see what's happening: $ MY_ENVAR=foo $ bash -c 'echo $MY_ENVAR' $ export MY_ENVAR $ bash -c 'echo $MY_ENVAR' foo The mentioned modification of the configuration is invalid as it's ignoring the format of the configuraiton file. I suggest to return it back to the original state. @Petr Stodulka - thanks. I put the config back the way it was and then looked deeper at https://access.redhat.com/solutions/5057391. It was right there but I missed it. @Christophe Besson - I added a couple sentences to that KCS to highlight that instruction to export LEAPP_OVL_SIZE=32. - Greg The original solution has been redesigned and should be fixed by upstream PR: https://github.com/oamg/leapp-repository/pull/1097 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (leapp and leapp-repository bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:7230 |