Bug 2233840

Summary: xfsprogs 5.19 changed the log size, making /boot too small
Product: Red Hat Enterprise Linux 9 Reporter: Ondřej Budai <obudai>
Component: xfsprogsAssignee: Eric Sandeen <esandeen>
Status: CLOSED MIGRATED QA Contact: Zorro Lang <zlang>
Severity: high Docs Contact: Eliane Ramos Pereira <elpereir>
Priority: unspecified    
Version: 9.3CC: elpereir, libhe, linl, preichl, vkuznets, wshi, xiliang, xzhou, ymao, yuxisun
Target Milestone: rcKeywords: MigratedToJIRA
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-09-23 12:06:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ondřej Budai 2023-08-23 14:35:27 UTC
Description of problem:
Image Builder is responsible for creating all RHEL cloud images. In the case of /boot, it firstly creates a 500 MiB, and then runs `mkfs.xfs -m uuid=3e199562-8a95-43c5-b90b-1cdebb99a29c -L boot`. This process is the same for both 9.2 and 9.3 images.

However, we found a difference in the available space on the /boot partition on 9.2 and 9.3.

9.2:

[admin@vm ~]$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda3       495M  156M  340M  32% /boot
[admin@vm ~]$ lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sda      8:0    0   10G  0 disk 
├─sda3   8:3    0  500M  0 part /boot

9.3:

[admin@vm ~]$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda3       436M  170M  267M  39% /boot
[admin@vm ~]$ lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sda      8:0    0   10G  0 disk 
├─sda3   8:3    0  500M  0 part /boot

See that /boot on 9.2 is 495M big, but on 9.3 it's 436M.

-----

I investigated more and discovered this:

9.2:

$ rpm -q xfsprogs
xfsprogs-5.14.2-1.el9.x86_64
$ truncate -s 500M /tmp/file
$ mkfs.xfs /tmp/file
meta-data=/tmp/file              isize=512    agcount=4, agsize=32000 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1    bigtime=1 inobtcount=1
data     =                       bsize=4096   blocks=128000, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=1368, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
$ sudo mkdir /mnt/xfs
$ sudo mount /tmp/file /mnt/xfs
$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda3       495M  437M   59M  89% /boot


9.3 (well, actually CentOS Stream 9):

$ rpm -q xfsprogs
xfsprogs-5.19.0-4.el9.x86_64
$ truncate -s 500M /tmp/file
$ mkfs.xfs /tmp/file
meta-data=/tmp/file              isize=512    agcount=4, agsize=32000 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1    bigtime=1 inobtcount=1 nrext64=0
data     =                       bsize=4096   blocks=128000, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=16384, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
$ sudo mkdir /mnt/xfs
$ sudo mount /tmp/file /mnt/xfs
$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda3       495M  401M   95M  81% /boot

-----

See that mkfs.xfs reports a different number of blocks in the log section. 9.2 shows 1368, whereas 9.3 shows 16384.

((16384 - 1368)*4096 B) to MiB is 58.6 MiB, which is equal to the size difference between those two df -h calls.

-----

I'm not sure if this is an expected change, thus reporting a bug. Note that customers might see this as a bug, because we usually don't change partitioning during a major release lifecycle. I think this leaves us with the following options:

1) The change is reverted in xfsprogs
2) Image Builder passes the right arguments to mkfs.xfs to revert the log size change.
3) We make the partition bigger.

Comment 1 Eric Sandeen 2023-08-23 15:30:46 UTC
Yes, the log size was increased, for a combination of correctness and performance issues.

You've said that boot is now "too small" - by what metric? Has something failed, or is this just noticing the difference? I would not have expected that /boot was sized so tightly that a 59 megabyte delta would be a critical difference, but if it is, we can explore the options as you've suggested above (though probably not reverting the change, as this is an important change overall,  with a better risk/benefit for larger filesystems I think.)

Comment 3 Ondřej Budai 2023-08-23 15:43:30 UTC
@xiliang has seen a failing test (installing 3 kernels, one of them being a debug one). If this is a somewhat usual customer scenario, we might indeed need to do something about this.

Comment 4 Vitaly Kuznetsov 2023-08-23 15:44:53 UTC
Sorry for derailing the conversation a little bit but for ImageBuilder images, does it actually make sense to create separate /boot partition
by default?

Note, as UEFI boot is now almost mandatory, you will also need a separate ESP -- which is VFAT and to have e.g. 3 UKI images there it must be at
least 256M, better 512M to make it possible to install debug kernels if needed. With a separate /boot, it takes at least 1G of space and this
is not nothing.

Comment 5 Eric Sandeen 2023-08-24 17:48:04 UTC
We could possibly work around this by shifting the log size increase to filesystems above 500M or something like that, but that would be a deviation from upstream that I'd like to avoid if possible.

But if it's a choice between changing xfsprogs in that way, or chasing down every type of provisioning to make sure /boot is still big enough, it might be a decent tradeoff ...

Comment 6 Eric Sandeen 2023-08-24 17:58:47 UTC
I'll also note that the default RHEL9 install seems to create a 1G /boot

I'd like to see more info about the failing case, though - on my system, there is one rescue initramfs that is about 150M (rounding up), and one initramfs for each kernel (about 60M each), vmlinux is about 12M or 23M for debug.  so if we had 2 kernels plus a kernel-debug it might look something like:

 12M vmlinux-1
 12M vmlinux-2
 23M vmlinux-debug
 60M initramfs-1
 60M initramfs-2
 50M initramfs-debug
150M initramfs-rescue

That's about 370M, with the other misc files taking maybe another 20M, which I think would still fit.

So I wonder if the failure case was a combination of something unique, plus a very tightly-chosen /boot size for these images...

Comment 7 Frank Liang 2023-08-25 04:04:48 UTC
Thanks, Eric

Here is the actual output on my system with 3 kernel installed(kernel, kernel-debug and earlier-kernel).
The current booting kernel is 5.14.0-356.el9.x86_64, there is no space to store kdump file. So the kdump service is failed.

I am not sure if customer will be aware of this change in RHEL-9.3, but the missing 60M is not a small percentage from expected 500M.

In our test scenario, we pre-install 3 kernels for kernel fast switch test and debug kernel boot, memleaks check.
With the current limitation, we can workaround it in our test. This is the background of this discussion. Hope it is reviewed and can have some conclusion before RHEL-9.3 go out. Thanks

[root@ip-10-116-2-31 boot]# du -sch .[!.]* * | sort -h
0	symvers-5.14.0-329.el9.x86_64.gz
0	symvers-5.14.0-356.el9.x86_64+debug.gz
0	symvers-5.14.0-356.el9.x86_64.gz
4.0K	.vmlinuz-5.14.0-329.el9.x86_64.hmac
4.0K	.vmlinuz-5.14.0-356.el9.x86_64+debug.hmac
4.0K	.vmlinuz-5.14.0-356.el9.x86_64.hmac
16K	loader
212K	config-5.14.0-329.el9.x86_64
216K	config-5.14.0-356.el9.x86_64
216K	config-5.14.0-356.el9.x86_64+debug
5.6M	grub2
5.8M	System.map-5.14.0-329.el9.x86_64
7.0M	efi
8.2M	System.map-5.14.0-356.el9.x86_64
8.9M	System.map-5.14.0-356.el9.x86_64+debug
12M	vmlinuz-5.14.0-329.el9.x86_64
13M	vmlinuz-5.14.0-356.el9.x86_64
26M	vmlinuz-5.14.0-356.el9.x86_64+debug
29M	initramfs-5.14.0-329.el9.x86_64kdump.img
83M	initramfs-5.14.0-329.el9.x86_64.img
84M	initramfs-5.14.0-356.el9.x86_64.img
107M	initramfs-5.14.0-356.el9.x86_64+debug.img
388M	total
[root@ip-10-116-2-31 boot]# df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        4.0M     0  4.0M   0% /dev
tmpfs           854M     0  854M   0% /dev/shm
tmpfs           342M  9.1M  333M   3% /run
/dev/nvme0n1p4  9.3G  2.8G  6.6G  30% /
/dev/nvme0n1p3  436M  409M   28M  94% /boot
/dev/nvme0n1p2  200M  7.0M  193M   4% /boot/efi
tmpfs           171M     0  171M   0% /run/user/1000
[root@ip-10-116-2-31 boot]# uname -r
5.14.0-356.el9.x86_64

[root@ip-10-116-2-31 boot]# cat /tmp/kdump 
× kdump.service - Crash recovery kernel arming
     Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; preset: enabled)
     Active: failed (Result: exit-code) since Fri 2023-08-25 02:28:19 UTC; 15min ago
    Process: 1046 ExecStart=/usr/bin/kdumpctl start (code=exited, status=1/FAILURE)
   Main PID: 1046 (code=exited, status=1/FAILURE)
        CPU: 15.857s

Aug 25 02:28:18 ip-10-116-2-31.us-west-2.compute.internal dracut[1393]: *** Creating image file '/boot/initramfs-5.14.0-356.el9.x86_64kdump.img' ***
Aug 25 02:28:19 ip-10-116-2-31.us-west-2.compute.internal kdumpctl[4014]: cp: error writing '/boot/initramfs-5.14.0-356.el9.x86_64kdump.img': No space left on device
Aug 25 02:28:19 ip-10-116-2-31.us-west-2.compute.internal kdumpctl[1361]: dracut: dracut: creation of /boot/initramfs-5.14.0-356.el9.x86_64kdump.img failed
Aug 25 02:28:19 ip-10-116-2-31.us-west-2.compute.internal dracut[1393]: dracut: creation of /boot/initramfs-5.14.0-356.el9.x86_64kdump.img failed
Aug 25 02:28:19 ip-10-116-2-31.us-west-2.compute.internal kdumpctl[1059]: kdump: mkdumprd: failed to make kdump initrd
Aug 25 02:28:19 ip-10-116-2-31.us-west-2.compute.internal kdumpctl[1059]: kdump: Starting kdump: [FAILED]
Aug 25 02:28:19 ip-10-116-2-31.us-west-2.compute.internal systemd[1]: kdump.service: Main process exited, code=exited, status=1/FAILURE
Aug 25 02:28:19 ip-10-116-2-31.us-west-2.compute.internal systemd[1]: kdump.service: Failed with result 'exit-code'.
Aug 25 02:28:19 ip-10-116-2-31.us-west-2.compute.internal systemd[1]: Failed to start Crash recovery kernel arming.
Aug 25 02:28:19 ip-10-116-2-31.us-west-2.compute.internal systemd[1]: kdump.service: Consumed 15.857s CPU time.

Comment 9 RHEL Program Management 2023-09-23 12:05:03 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 10 RHEL Program Management 2023-09-23 12:06:59 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.