aarch64 Workstation images were failing to compose.
It was saying it needed an additional 1.9 GiB of space, but there was enough space.
I manually ran a scratch build and when I went into the storage spoke there and just hit continue it all worked to install. So somehow it wasn't detecting space at first?
https://koji.fedoraproject.org/koji/taskinfo?taskID=101760920 was that scratch build.
The oz-aarch64.log there shows the failure and then success.
I have no idea what could have caused this, but might be related to the change for /boot/efi size?
I untagged that anaconda version and compose completed ok.
Steps to Reproduce:
1. compose workstation aarch64 raw image
2. see the compose say it needs more space
3. just restart by selecting the same packages it just failed on
4. install completes
Says it needs more space
To be clear - you're saying you kicked off a scratch build, it initially 'failed' the same way as official builds were failing (i.e. it didn't install interactively as it should but paused at the pre-install hub with a spoke error), but then you remoted into it, went through the storage spoke interactively, and then it was happy and proceeded to install?
I'm looking at the log and it's quite weird. We have this...
> 16:20:24,484 INFO anaconda:anaconda: ui.lib.space: fs space: 8.8 GiB needed: 9.7 GiB
> 16:20:24,510 ERR anaconda:anaconda: ui.tui.hubs.summary: Not enough space in file systems for the current software selection. An additional 1.13 GiB is needed.
...which is the problem. But the log line for running the check ("fs space") is not repeated. It is not run again. But the installation then actually happens. So where did we actually realize there's now enough free space? That makes no sense to me.
How would I debug this? Send an updates image? Or run the compose myself?
Huh. That *is* odd. I thought it was getting run twice (search for the parameters to df - "target,avail"), but looking closely, the second time that happens is *after* the install starts.
I think to debug we can send Kevin a scratch build (not sure he can work with an updates.img) and he can run a scratch image build. I was going to start out with one that reverts c253390b23475970f1d696b4fb9d719a0c316559 , then maybe try ones with different variations on the max and min sizes to see which is important...
Agreed on that commit as being the most likely trigger to surface this. https://github.com/rhinstaller/anaconda/pull/4711/commits/c253390b23475970f1d696b4fb9d719a0c316559
However, I doubt that's the actual root problem. Apparently something calculates the space wrong.
I'm afraid that realistically, we have these options:
- debug and fix the problem - depends on team capacity being available for this, and then succeeding
- undo the Change - not wanted because the change should happen
- bump the image size - not wanted because of bloat
Just so I have these numbers handy, the summary line we get with the new anaconda with the problem is:
16:20:24,484 INFO anaconda:anaconda: ui.lib.space: fs space: 8.8 GiB needed: 9.7 GiB
the summary line we get with the old anaconda without the problem is:
08:42:46,061 INFO anaconda:anaconda: ui.lib.space: fs space: 9.93 GiB needed: 9.71 GiB
note the difference in apparent available "fs space" is 1.13G , which doesn't track obviously with any of the numbers you can get from the EFI system partition change: the increase in the *minimum* size of the ESP was 300M (500M-200M), the increase in its *maximum* size was 1.4G (2G-600M). The difference we're seeing is in the middle. So, that part's a bit odd.
The logs I'm using are https://kojipkgs.fedoraproject.org//work/tasks/920/101760920/oz-aarch64.log (from Kevin's scratch build) and https://kojipkgs.fedoraproject.org//work/tasks/2490/101862490/oz-aarch64.log (from the 20230606.n.0 Rawhide compose).
> note the difference in apparent available "fs space" is 1.13G , which doesn't track obviously with any of the numbers you can get from the EFI system partition change
Vojto, could this difference be caused by blivet growing the partitions proportionally from the new minimum?
Looking closer at nirik's log, it does seem to go with a 2G /boot/efi in the end:
16:18:51,722 WARNING org.fedoraproject.Anaconda.Modules.Storage:DEBUG:blivet:fixing size of non-existent 2 GiB partition vda1 (46) with non-existent efi filesystem mounted at /boot/efi
16:18:51,723 WARNING org.fedoraproject.Anaconda.Modules.Storage:DEBUG:blivet:fixing size of non-existent 1024 MiB partition vda2 (53) with non-existent ext4 filesystem mounted at /boot
16:18:51,724 WARNING org.fedoraproject.Anaconda.Modules.Storage:DEBUG:blivet:fixing size of non-existent 11 GiB partition vda3 (39) with non-existent btrfs filesystem
I'm honestly not sure whether that's what we want or not, but it seems worth thinking about - I assume the intent of setting a max of 2G for /boot/efi was for it to wind up as 2G during interactive installs on user systems with sufficient space available, I'm not sure the case of image creation was considered. But OTOH, I guess with these disk images, when you deploy them, you get the filesystems from the image, plus the 'first boot grow' mechanism for root; so maybe it *is* a good idea for /boot/efi to be 2G on these images if we think deployed systems may need it to be that big? I'm really not sure.
The other log shows:
08:41:09,386 WARNING org.fedoraproject.Anaconda.Modules.Storage:DEBUG:blivet:fixing size of non-existent 600 MiB partition vda1 (46) with non-existent efi filesystem mounted at /boot/efi
08:41:09,387 WARNING org.fedoraproject.Anaconda.Modules.Storage:DEBUG:blivet:fixing size of non-existent 1024 MiB partition vda2 (53) with non-existent ext4 filesystem mounted at /boot
08:41:09,388 WARNING org.fedoraproject.Anaconda.Modules.Storage:DEBUG:blivet:fixing size of non-existent 12.41 GiB partition vda3 (39) with non-existent btrfs filesystem
Aha, I think I see what's happening: the free space calculation is fudged by 20% for safety. 8.8/11 and 9.93/12.41 are both 0.8. So it looks like the available space calculation correctly figures out how big the root filesystem is going to be, then takes 80% of that number. I guess the idea is to avoid any issues with the filesystem reserving space, or space being taken up by something outside of the package set somehow, and I guess to ensure there's at least a *bit* of free space on the filesystem after installation.
So, this really does just boil down to the ESP change as we thought. So I guess the question is, in this specific case, do we want/need the ESP to be 2G on the aarch64 disk images? And more generally, do we want this 2G max size to apply to image builds? I will try and find some logs from the failed composes and see if we got 2G ESPs on e.g. the live images as well...
Doesn't seem like this affects the live images at all. There's a lot of other deliverables these days, I'll have to think which other ones might be affected.
Well, it'd be good if folks could chime in on whether or not it makes sense to have the large ESP in these disk images - CCing some ARM folks for that - but for now, I've filed a PR to bump the image sizes:
Just a note, it should be possible to vary the size of efi partition by arch by adding _bootloader_partition to the respective classes in platform.py, in this case Aarch64EFI and ArmEFI (below the original PR).
This bug appears to have been reported against 'rawhide' during the Fedora Linux 39 development cycle.
Changing version to 39.