1. Please describe the problem: On recent rawhide kernels, loading the nf_conntrack module fails on s390x: # modprobe nf_conntrack modprobe: ERROR: could not insert 'nf_conntrack': Unknown symbol in module, or unknown parameter (see dmesg) modprobe: ERROR: libkmod/libkmod-module.c:990 command_do() Error running install command '/sbin/modprobe --ignore-install nf_conntrack && /sbin/sysctl --quiet --pattern 'net[.]netfilter[.]nf_conntrack.*' --system' for module nf_conntrack: retcode 1 modprobe: ERROR: could not insert 'nf_conntrack': Invalid argument # dmesg [...] [ 650.842248] missing module BTF, cannot register kfuncs # Other modules seem to load fine (didn't find any other than nf_conntrack that would fail). 2. What is the Version-Release number of the kernel: 5.18.0-0.rc0.20220401gite8b767f5e04097a.15.fc37.s390x 3. Did it work previously in Fedora? If so, what kernel version did the issue *first* appear? Old kernels are available for download at https://koji.fedoraproject.org/koji/packageinfo?packageID=8 : Yes, the first broken rawhide kernel is: 5.18.0-0.rc0.20220325git34af78c4e616.7.fc37.s390x https://koji.fedoraproject.org/koji/buildinfo?buildID=1939079 The one before didn't have the bug: 5.18.0-0.rc0.20220324gited4643521e6a.6.fc37.s390x https://koji.fedoraproject.org/koji/buildinfo?buildID=1938423 Also, this issue doesn't seem to affect the ELN kernels. 4. Can you reproduce this issue? If so, please provide the steps to reproduce the issue below: Just run `modprobe nf_conntrack`. 5. Does this problem occur with the latest Rawhide kernel? To install the Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by ``sudo dnf update --enablerepo=rawhide kernel``: Yes. 6. Are you running any modules that not shipped with directly Fedora's kernel?: No. 7. Please attach the kernel logs. You can get the complete kernel log for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the issue occurred on a previous boot, use the journalctl ``-b`` flag. After each failed attempt to load the module, the following appears in dmesg: [ 650.842248] missing module BTF, cannot register kfuncs
*** Bug 2072313 has been marked as a duplicate of this bug. ***
*** Bug 2072320 has been marked as a duplicate of this bug. ***
CC Red Hat BPF maintainers - guys, could someone from your team have a look at this?
works for me with kernel-5.18.0-0.rc2.23.fc37.x86_64 Can you recheck?
The problem is only on the s390x arch.
Ok, I see CONFIG_DEBUG_INFO_BTF_MODULES=y, but module in the kernel does not have the section: # llvm-readelf --sections /tmp/nf_conntrack.ko | grep BTF [37] .BTF_ids PROGBITS 0000000000000000 0243ad 00002c 00 A 0 0 1 while it present in both self-built kernel: # llvm-readelf --sections /lib/modules/5.17.0+/kernel/net/netfilter/nf_conntrack.ko | grep -i btf [39] .BTF_ids PROGBITS 0000000000000000 00015d35 [69] .BTF PROGBITS 0000000000000000 0001aedb and x86: # llvm-readelf --sections /tmp/nf_conntrack.ko| grep -i btf [31] .BTF_ids PROGBITS 0000000000000000 035d3e 00002c 00 A 0 0 1 [51] .BTF PROGBITS 0000000000000000 03df80 0572c6 00 0 0 1 Can it be some stripping problem?
(it's triggered by the patch c446fdacb10d ("bpf: fix register_btf_kfunc_id_set for !CONFIG_DEBUG_INFO_BTF") which came with 5.18.0-0.rc0.20220325git34af78c4e616.7.fc37)
Hm... so in the kernel.spec there is this: # skip BTF in kernel modules for s390x %ifnarch s390x %define with_kmod_btf --keep-section '.BTF' %endif ...and on s390x we do have CONFIG_DEBUG_INFO_BTF_MODULES=y, so that probably explains why on s390x the error condition is hit (it is guarded only by CONFIG_DEBUG_INFO_BTF_MODULES and has no idea that we strip out module BTF intentionally). That snippet was added by: commit 6a8653ed6b346ed322084e4c38724fefbaf53f9a Author: Jiri Olsa <jolsa> Date: Thu Oct 14 13:34:00 2021 +0200 spec: Keep .BTF section in modules ...which explains: This increases size of each module. Currently there's known dedup issue on s390x, that makes the size of all modules double [1] so I'm not enabling it for s390x for the moment. [...] [1] https://lore.kernel.org/bpf/20211023120452.212885-1-jolsa@kernel.org/ So either we need to remove the 'ifnarch s390x' condition (perhaps the dedup issue is already fixed?), or somehow make it such that the missing BTF section doesn't lead to an error when it's stripped away intentionally.
Yes, the fix is there efdd3eb8015e ("libbpf: Accommodate DWARF/compiler bug with duplicated structs") So, the spec workaround can be reverted.
This is already fixed.