Bug 2071969 - Loading the nf_conntrack module fails on s390x with "missing module BTF, cannot register kfuncs"
Summary: Loading the nf_conntrack module fails on s390x with "missing module BTF, cann...
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: s390x
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 2072313 2072320 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-05 10:30 UTC by Ondrej Mosnacek
Modified: 2022-04-26 11:09 UTC (History)
25 users (show)

Fixed In Version: kernel-5.18.0-0.rc3.20220421gitb253435746d9a4a.30.fc37
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-04-26 11:09:38 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gitlab cki-project kernel-ark merge_requests 1751 0 None opened spec: keep .BTF section in modules for s390 2022-04-13 18:01:19 UTC

Description Ondrej Mosnacek 2022-04-05 10:30:23 UTC
1. Please describe the problem:

On recent rawhide kernels, loading the nf_conntrack module fails on s390x:

# modprobe nf_conntrack
modprobe: ERROR: could not insert 'nf_conntrack': Unknown symbol in module, or unknown parameter (see dmesg)
modprobe: ERROR: libkmod/libkmod-module.c:990 command_do() Error running install command '/sbin/modprobe --ignore-install nf_conntrack  && /sbin/sysctl --quiet --pattern 'net[.]netfilter[.]nf_conntrack.*' --system' for module nf_conntrack: retcode 1
modprobe: ERROR: could not insert 'nf_conntrack': Invalid argument
# dmesg
[...]
[  650.842248] missing module BTF, cannot register kfuncs
#

Other modules seem to load fine (didn't find any other than nf_conntrack that would fail).

2. What is the Version-Release number of the kernel:

5.18.0-0.rc0.20220401gite8b767f5e04097a.15.fc37.s390x

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

Yes, the first broken rawhide kernel is:
5.18.0-0.rc0.20220325git34af78c4e616.7.fc37.s390x
https://koji.fedoraproject.org/koji/buildinfo?buildID=1939079

The one before didn't have the bug:
5.18.0-0.rc0.20220324gited4643521e6a.6.fc37.s390x
https://koji.fedoraproject.org/koji/buildinfo?buildID=1938423

Also, this issue doesn't seem to affect the ELN kernels.

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

Just run `modprobe nf_conntrack`.

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

Yes.

6. Are you running any modules that not shipped with directly Fedora's kernel?:

No.

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

After each failed attempt to load the module, the following appears in dmesg:

[  650.842248] missing module BTF, cannot register kfuncs

Comment 1 Jianwen Ji 2022-04-06 13:25:39 UTC
*** Bug 2072313 has been marked as a duplicate of this bug. ***

Comment 2 Jianwen Ji 2022-04-06 13:26:21 UTC
*** Bug 2072320 has been marked as a duplicate of this bug. ***

Comment 3 Ondrej Mosnacek 2022-04-12 11:37:12 UTC
CC Red Hat BPF maintainers - guys, could someone from your team have a look at this?

Comment 4 Yauheni Kaliuta 2022-04-12 16:14:02 UTC
works for me with kernel-5.18.0-0.rc2.23.fc37.x86_64
Can you recheck?

Comment 5 Ondrej Mosnacek 2022-04-12 17:34:00 UTC
The problem is only on the s390x arch.

Comment 6 Yauheni Kaliuta 2022-04-12 18:10:06 UTC
Ok, I see CONFIG_DEBUG_INFO_BTF_MODULES=y, but module in the kernel does not have the section:

# llvm-readelf --sections /tmp/nf_conntrack.ko | grep BTF
  [37] .BTF_ids          PROGBITS        0000000000000000 0243ad 00002c 00   A  0   0  1

while it present in both self-built kernel:

# llvm-readelf --sections /lib/modules/5.17.0+/kernel/net/netfilter/nf_conntrack.ko | grep -i btf
  [39] .BTF_ids          PROGBITS         0000000000000000  00015d35
  [69] .BTF              PROGBITS         0000000000000000  0001aedb

and x86:
# llvm-readelf --sections /tmp/nf_conntrack.ko| grep -i btf
  [31] .BTF_ids          PROGBITS        0000000000000000 035d3e 00002c 00   A  0   0  1
  [51] .BTF              PROGBITS        0000000000000000 03df80 0572c6 00      0   0  1

Can it be some stripping problem?

Comment 7 Yauheni Kaliuta 2022-04-12 18:19:58 UTC
(it's triggered by the patch  c446fdacb10d ("bpf: fix register_btf_kfunc_id_set for !CONFIG_DEBUG_INFO_BTF") which came with 5.18.0-0.rc0.20220325git34af78c4e616.7.fc37)

Comment 8 Ondrej Mosnacek 2022-04-13 08:48:19 UTC
Hm... so in the kernel.spec there is this:

    # skip BTF in kernel modules for s390x
    %ifnarch s390x
    %define with_kmod_btf --keep-section '.BTF'
    %endif

...and on s390x we do have CONFIG_DEBUG_INFO_BTF_MODULES=y, so that probably explains why on s390x the error condition is hit (it is guarded only by CONFIG_DEBUG_INFO_BTF_MODULES and has no idea that we strip out module BTF intentionally).

That snippet was added by:

commit 6a8653ed6b346ed322084e4c38724fefbaf53f9a
Author: Jiri Olsa <jolsa>
Date:   Thu Oct 14 13:34:00 2021 +0200

    spec: Keep .BTF section in modules

...which explains:

    This increases size of each module. Currently there's known
    dedup issue on s390x, that makes the size of all modules
    double [1] so I'm not enabling it for s390x for the moment.

    [...]

    [1] https://lore.kernel.org/bpf/20211023120452.212885-1-jolsa@kernel.org/

So either we need to remove the 'ifnarch s390x' condition (perhaps the dedup issue is already fixed?), or somehow make it such that the missing BTF section doesn't lead to an error when it's stripped away intentionally.

Comment 9 Yauheni Kaliuta 2022-04-13 11:50:31 UTC
Yes, the fix is there efdd3eb8015e ("libbpf: Accommodate DWARF/compiler bug with duplicated structs")
So, the spec workaround can be reverted.

Comment 10 Ondrej Mosnacek 2022-04-26 11:09:38 UTC
This is already fixed.


Note You need to log in before you can comment on or make changes to this bug.