Hide Forgot
1. Please describe the problem: Kernels on ppc64le appear to be much larger than on say x86_64: On Fedora CoreOS nodes on `37.20221215.20.0` I see: - `x86_64`: ``` $ ls -lh /boot/ostree/fedora-coreos-cbe65104658d968ba9257535af887e57369292356928a2f0aa19de9183ac9e9e/ total 86M -rw-r--r--. 1 root root 74M Dec 15 22:20 initramfs-6.0.12-300.fc37.x86_64.img -rwxr-xr-x. 1 root root 13M Dec 15 22:20 vmlinuz-6.0.12-300.fc37.x86_64 ``` - `ppc64le`: ``` $ ls -lh /boot/ostree/fedora-coreos-6ccef70b6f4af574fc3b2486258f527111f66dee29e9004b5a18fa97332c04c5/ total 113M -rw-r--r--. 1 root root 70M Dec 15 23:25 initramfs-6.0.12-300.fc37.ppc64le.img -rwxr-xr-x. 1 root root 43M Dec 15 23:25 vmlinuz-6.0.12-300.fc37.ppc64le ``` So the kernel is a whole `30M` larger on `ppc64le`. 2. What is the Version-Release number of the kernel: kernel-6.0.12-300.fc37 3. Did it work previously in Fedora? If so, what kernel version did the issue *first* appear? Old kernels are available for download at https://koji.fedoraproject.org/koji/packageinfo?packageID=8 : Not sure. This is the first I've looked at this problem. 4. Can you reproduce this issue? If so, please provide the steps to reproduce the issue below: Yes. Just look at the files in the RPMs. 5. Does this problem occur with the latest Rawhide kernel? To install the Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by ``sudo dnf update --enablerepo=rawhide kernel``: Yes. 6. Are you running any modules that not shipped with directly Fedora's kernel?: No. 7. Please attach the kernel logs. You can get the complete kernel log for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the issue occurred on a previous boot, use the journalctl ``-b`` flag. Not needed since this can be observed easily without booting.
Mike Ellerman, are you able to help us understand why?
IIRC we had the discussion earlier also with dzickus@rh, vmlinux vs vmlinuz and being related to the usefulness of the separate debuginfo for the kernel. But I might recall it wrong :-)
------- Comment From ellerman.com 2023-01-13 04:46 EDT------- I don't know about CoreOS, but I do have regular Fedora which I assume is similar. On x86 the "vmlinuz" is actually a bzImage, ie. compressed. On powerpc the "vmlinuz" is actually a vmlinux, ie. *not* compressed. You can see with "file", eg: # file /boot/vmlinuz-6.0.18-200.fc36.x86_64 /boot/vmlinuz-6.0.18-200.fc36.x86_64: Linux kernel x86 boot executable bzImage, version 6.0.18-200.fc36.x86_64 (mockbuild.fedoraproject.org) #1 SMP PREEMPT_DYNAMIC Sat Jan 7 17:08:48 UTC 2023, RO-rootFS, swap_dev 0XC, Normal VGA # file /boot/vmlinuz-6.0.12-200.fc36.ppc64le /boot/vmlinuz-6.0.12-200.fc36.ppc64le: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, OpenPOWER ELF V2 ABI, version 1 (SYSV), statically linked, BuildID[sha1]=448a34e76e7ef15d7cb653792501ca6628e6bb0b, stripped Grub supports loading gzipped files, so you can just gzip the vmlinux in /boot and grub will still boot from it just fine, eg: # ls -lh vmlinuz-6.0.12-200.fc36.ppc64le -rwxr-xr-x. 1 root root 43M Dec 8 12:15 vmlinuz-6.0.12-200.fc36.ppc64le # gzip vmlinuz-6.0.12-200.fc36.ppc64le # mv vmlinuz-6.0.12-200.fc36.ppc64le.gz vmlinuz-6.0.12-200.fc36.ppc64le # ls -lh vmlinuz-6.0.12-200.fc36.ppc64le -rwxr-xr-x. 1 root root 14M Dec 8 12:15 vmlinuz-6.0.12-200.fc36.ppc64le # reboot ...
(In reply to IBM Bug Proxy from comment #3) > ------- Comment From ellerman.com 2023-01-13 04:46 EDT------- > I don't know about CoreOS, but I do have regular Fedora which I assume is > similar. Indeed. It uses the exact same kernel RPMs that are built for Fedora. > > On x86 the "vmlinuz" is actually a bzImage, ie. compressed. > > On powerpc the "vmlinuz" is actually a vmlinux, ie. *not* compressed. Right. This is why I opened this issue. I want to understand why it's not compressed by default on this architecture versus the others and if we can change it to be compressed by default. It obviously can work (as you show below). For full context on why I'm interested in the answer to this question see: https://github.com/coreos/fedora-coreos-tracker/issues/1247#issuecomment-1355314761 > > You can see with "file", eg: > > # file /boot/vmlinuz-6.0.18-200.fc36.x86_64 > /boot/vmlinuz-6.0.18-200.fc36.x86_64: Linux kernel x86 boot executable > bzImage, version > 6.0.18-200.fc36.x86_64 (mockbuild.fedoraproject.org) #1 SMP > PREEMPT_DYNAMIC > Sat Jan 7 17:08:48 UTC 2023, RO-rootFS, swap_dev 0XC, Normal VGA > > # file /boot/vmlinuz-6.0.12-200.fc36.ppc64le > /boot/vmlinuz-6.0.12-200.fc36.ppc64le: ELF 64-bit LSB executable, 64-bit > PowerPC or cisco > 7500, OpenPOWER ELF V2 ABI, version 1 (SYSV), statically linked, > BuildID[sha1]=448a34e76e7ef15d7cb653792501ca6628e6bb0b, stripped > > Grub supports loading gzipped files, so you can just gzip the vmlinux in > /boot and grub > will still boot from it just fine, eg: > > # ls -lh vmlinuz-6.0.12-200.fc36.ppc64le > -rwxr-xr-x. 1 root root 43M Dec 8 12:15 vmlinuz-6.0.12-200.fc36.ppc64le > > # gzip vmlinuz-6.0.12-200.fc36.ppc64le > # mv vmlinuz-6.0.12-200.fc36.ppc64le.gz vmlinuz-6.0.12-200.fc36.ppc64le > > # ls -lh vmlinuz-6.0.12-200.fc36.ppc64le > -rwxr-xr-x. 1 root root 14M Dec 8 12:15 vmlinuz-6.0.12-200.fc36.ppc64le > > # reboot > ...
I was able to find the old thread which seems to be captured in the linuxppc-dev archives, see https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-June/thread.html#244234
> Grub supports loading gzipped files, so you can just gzip the vmlinux in /boot and grub will still boot from it just fine, eg: We could probably forcibly compress the kernel (via an opt-in) in rpm-ostree builds. > I was able to find the old thread which seems to be captured in the linuxppc-dev archives, see https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-June/thread.html#244234 That thread seems to be about having the kernel build process generate a compressed image and this apparently bypasses the build-id or other processing. But if we only care about booting from grub, then it seems at least on FCOS and derivatives we could just switch to compressing the kernel today?
(In reply to Colin Walters from comment #6) > > Grub supports loading gzipped files, so you can just gzip the vmlinux in /boot and grub > will still boot from it just fine, eg: > > We could probably forcibly compress the kernel (via an opt-in) in rpm-ostree > builds. Yeah. It's not ideal but would get us past this. > > > I was able to find the old thread which seems to be captured in the linuxppc-dev archives, see > https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-June/thread.html#244234 > > That thread seems to be about having the kernel build process generate a > compressed image and this apparently bypasses the build-id or other > processing. > > But if we only care about booting from grub, then it seems at least on FCOS > and derivatives we could just switch to compressing the kernel today? So is it the kernel build scripts/tooling that are deficient here? Could we just an extra step to the RPM build that compresses the built file after the `make` from the kernel sources finish? i.e. we're not asking the kernel build scripts to compress the kernel (so nothing lost as mentioned in the mailing list thread), but doing it after the kernel build is finished but before the RPM picks up the file.
> So is it the kernel build scripts/tooling that are deficient here? I had to read the linked thread a few times but if I'm understanding correctly, the reason ppc64le kernel isn't compressed by default is that it would break booting via OpenFirmware directly, but we always boot from grub today (again, AFAIK).
(In reply to Colin Walters from comment #8) > > So is it the kernel build scripts/tooling that are deficient here? > > I had to read the linked thread a few times but if I'm understanding > correctly, the reason ppc64le kernel isn't compressed by default is that it > would break booting via OpenFirmware directly, but we always boot from grub > today (again, AFAIK). We do not always boot from grub today. While that is the majority of users, we also have to at least boot with petitboot. There are not a whole lot of users for Fedora ppc, even less outside of IBM. How many FCOS users are there? And how much of an issue is an extra 30MB really? It this an actual problem, or just something noticed?
> There are not a whole lot of users for Fedora ppc, even less outside of IBM. How many FCOS users are there? Note that FCOS is the upstream of RHEL CoreOS, which is the default node for OpenShift 4 which is a Red Hat product that very definitely has users on ppc64le. And we do share code across FCOS and RHCOS...this leads to the next: > And how much of an issue is an extra 30MB really? It this an actual problem, or just something noticed? Yes, we're here because for $historical reasons we chose to have a relatively small 384MB /boot by default - across both FCOS and RHCOS. And since the kernel configuration is also the same here in RHEL, this is causing us problems across both operating systems. (Now, due to general growth we will potentially in theory start to run out of space on other architectures at some point, but the 30MB here actually really pushes just ppc64le over the edge *now*) If it helps, we can re-file a corresponding RHEL bug, but I think the OP (Dusty) wanted to start upstream.
(In reply to Colin Walters from comment #10) > > There are not a whole lot of users for Fedora ppc, even less outside of IBM. How many FCOS users are there? > > Note that FCOS is the upstream of RHEL CoreOS, which is the default node for > OpenShift 4 which is a Red Hat product that very definitely has users on > ppc64le. And we do share code across FCOS and RHCOS...this leads to the > next: > > > And how much of an issue is an extra 30MB really? It this an actual problem, or just something noticed? > > Yes, we're here because for $historical reasons we chose to have a > relatively small 384MB /boot by default - across both FCOS and RHCOS. > And since the kernel configuration is also the same here in RHEL, this is > causing us problems across both operating systems. Right. We keep around 2 sets of kernel/initrd on the system and while a new one is getting installed we need space (temporarily) for 3 sets of kernel/initrd. This is where the 30M difference adds up because it's actually 3*30M. You can definitely make an argument for why we should change the size of our boot partition and we've considered that too and will possibly do that in the future. The goal of this ticket was mostly to find out why things are the way they are today because no one seemed to know when I initially asked. Of course, now that we know why (which I believe is so that firmwares other than GRUB can boot the kernel because they don't have support for a compressed kernel) the next question is (or could be): is there a path to changing it? The answer doesn't have to be "yes", but it's worth having the discussion.
(In reply to Justin M. Forbes from comment #9) <snip> > > There are not a whole lot of users for Fedora ppc, even less outside of IBM. > How many FCOS users are there? And how much of an issue is an extra 30MB > really? It this an actual problem, or just something noticed? Specifically on the users of FCOS on ppc64le front. The answer is none. I've been blocking us releasing it to end users because of this problem [1]. We run a ppc64le build server for FCOS and at times it can't auto update without intervention because of this issue, so we have held back shipping it (i.e. via the website download page), though we are building it and running CI on it. [1] https://github.com/coreos/fedora-coreos-tracker/issues/987#issuecomment-1281123396
(In reply to Dusty Mabe from comment #11) > Of course, now that we know why (which I believe is so that firmwares other > than GRUB can boot the > kernel because they don't have support for a compressed kernel) the next > question is (or could be): > is there a path to changing it? The answer doesn't have to be "yes", but > it's worth having the discussion. I am sure there is a path. I am not horribly familiar with petitboot, but I suppose the path to getting things changed is to either verify that petitboot can actually boot a compressed kernel. If it can, we could probably compress ourselves in the spec. If it can't, I believe that functionality would have to be added. I suppose another possibility is to make something in the FCOS ecosystem compress the kernel image specifically for FCOS users, and then say that FCOS does not support petitboot.
and because petitboot uses kexec to boot the new kernel, we can transform the problem to "if kexec supports compressed kernel", which should be easier to test
Just a clarification: note that we do support petitboot for booting RHCOS. I'm not sure what the proportion of the customer base it represents, but we've definitely accommodated it in the past (e.g. https://github.com/coreos/coreos-assembler/pull/2005).
My understanding it that we will always use petitboot for Power, petitboot will start and then calls Grub. I'm not familiar with OpenFirmware, but I guess that's the default firmware for Power? Maybe other models such as OPAL may have a different firmware? In any case, if we can't boot in OpenFirmware, that's sound a huge impact. Klaus, Any chance you can help us to understand the impact for compressing the Kernel and how it would affect the firmware? Do you suggest some server models where we can test/validate it?
From a Fedora standpoint, we should be able to boot on any of Little Endian IBM power systems, and I believe we have a number of Raptor PPC users. In fact, in terms of community users who aren't working for IBM, I would guess we have more Raptor than IBM PPC for Fedora.
I'm afraid I don't really see a good solution. There is a powerpc zImage, like the x86 bzImage, but it has several down sides. When booting on IBM PowerVM LPARs (all that RHEL supports), there is a limit on how much memory is available during early boot. Using the zImage requires more memory during boot, because you need to have space for the zImage as well as space to decompress the zImage. If there's not enough space booting fails. So switching to the zImage is likely to break booting on some systems. On powernv we boot with petitboot (not grub), petitboot can't boot a zImage at all. petitboot also can't boot a gzipped vmlinux. Getting petitboot updated so it can boot a gzipped vmlinux could be done, but AFAIK petitboot is mostly unmaintained these days. Then there is the problem that users would need to flash a new petitboot on their system in order to boot newer kernels. Finally I believe the Raptor systems ship with a fork of petitboot, I'm not sure how easy it would be to get changes into that version. I don't know if it's possible, but one option would be to gzip the vmlinux by default - which works when booting with grub on pseries, and then have the install script ungzip it when installing on powernv.