Bug 1920857
Summary: | in-kernel BTF is malformed | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Chris Murphy <bugzilla> | ||||||||||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||||||||
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||||
Priority: | unspecified | ||||||||||||||
Version: | rawhide | CC: | acaringi, adscvr, agedosier, airlied, alciregi, aoliva, berrange, bskeggs, clalancette, dmalcolm, fweimer, hdegoede, itamar, jakub, jarodwilson, jeremy, jforbes, jglisse, jonathan, josef, jwakely, kernel-maint, laine, law, lgoncalv, libvirt-maint, linville, masami256, mchehab, mpolacek, mprivozn, msebor, ncarrilho, nickc, ptalbert, pwhalen, santiago, sipoyare, steved, veillard, virt-maint, vrothber | ||||||||||||
Target Milestone: | --- | ||||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | Unspecified | ||||||||||||||
OS: | Unspecified | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | dwarves1-1.20-1.fc34 | Doc Type: | If docs needed, set a value | ||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2021-02-08 20:47:36 UTC | Type: | Bug | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Attachments: |
|
Description
Chris Murphy
2021-01-27 06:49:48 UTC
Created attachment 1751144 [details]
virsh dumpxml
Created attachment 1751145 [details]
journal
systemd-247.2-1.fc34.x86_64 I've managed to reproduce and found that virBPFLoadProg() logs the following message: in-kernel BTF is malformed\nprocessed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0\n This error message is produced here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/verifier.c#n11994 if (IS_ERR(btf_vmlinux)) { /* Either gcc or pahole or kernel are broken. */ verbose(env, "in-kernel BTF is malformed\n"); ret = PTR_ERR(btf_vmlinux); goto skip_full_check; } Based on the comment, I don't think that libvirt is the broken one. I think the bug should be switched over to kernel. In fact, I was able to write a small reproducer that takes libvirt out of the picture. Created attachment 1752813 [details]
bpf.c
Seeing the same error when using a Dockerfile with podman: STEP 1: FROM registry.fedoraproject.org/fedora:latest STEP 2: RUN /usr/bin/dnf install -y httpd error running container: error creating container for [/bin/sh -c /usr/bin/dnf install -y httpd]: bpf create `in-kernel BTF is malformed processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 `: Invalid argument : exit status 1 Error: error building at STEP "RUN /usr/bin/dnf install -y httpd": error while running runtime: exit status 1 In OpenQA - https://openqa.fedoraproject.org/tests/768948#step/podman/18 podman-plugins-3.0.0-0.184.dev.gitfb653c4.fc34.x86_64 podman-3.0.0-0.184.dev.gitfb653c4.fc34.x86_64 5.11.0-0.rc6.141.fc34.x86_64 I sent an email to the usptream list: https://lore.kernel.org/bpf/CAJCQCtSQLc0VHqO4BY_-YB2OmCNNmHCS6fNdQKmMWGn2v=Jpdw@mail.gmail.com/T/#u Also fails with Kernel 5.11.0-0.rc5.134.fc34. Downgrading to Kernel 5.11.0-0.rc4.129.fc34, podman works without error. Also seen in OpenQA - https://openqa.fedoraproject.org/tests/763612#step/podman/16 Confirmed, no error in virt-manager using 5.11.0-0.rc4.129.fc34.x86_64+debug either. The VM and its guest OS start up normally. Created attachment 1754912 [details]
dmesg
Maybe it's related...
Feb 03 15:06:26 fmac.local kernel: BPF: sched_reset_on_fork type_id=6 bitfield_size=0 bits_offset=0
Feb 03 15:06:26 fmac.local kernel: BPF:
Feb 03 15:06:26 fmac.local kernel: BPF:Invalid member bits_offset
Feb 03 15:06:26 fmac.local kernel: BPF:
I'm trying to bisect between rc4 and rc5, but the resulting kernel is failing to startup due to many messages like this, many of which also repeat. Feb 03 15:05:41 kernel: failed to validate module [fuse] BTF: -22 Feb 03 15:05:41 kernel: failed to validate module [hid_apple] BTF: -22 Feb 03 15:05:41 kernel: failed to validate module [hid_apple] BTF: -22 Feb 03 15:05:41 kernel: failed to validate module [video] BTF: -22 Feb 03 15:05:41 kernel: failed to validate module [crc_itu_t] BTF: -22 Feb 03 15:05:41 kernel: failed to validate module [hid_apple] BTF: -22 Feb 03 15:05:41 kernel: failed to validate module [hid_appleir] BTF: -22 ..snip... So I'm stuck for the moment until I figure that out. Created attachment 1754934 [details]
dmesg 5.11.0-0.rc6.134.fc34.x86_64
Looks like this bug is gcc 11 related, doesn't happen when I recompile with gcc 10.2.1. It just seems kernel or whatever kernel uses (pahole) isn't able to deal with DWARF5. Yep, dwarves package is currently broken wrt DWARF5 format. There are patches upstream and a new release coming real soon, meanwhile you have to build with -gdwarf-4 in CFLAGS if you need to use any of the dwarves tools like pahole. See https://bugzilla.redhat.com/show_bug.cgi?id=1919965 And to be precise, it is broken for DWARF4 already, just GCC doesn't emit that DWARF4 attribute unless -gdwarf-5 because too many consumers were broken 4 years ago. *** Bug 1925158 has been marked as a duplicate of this bug. *** kernel-5.11.0-0.rc6.20210203git3aaf0a27ffc2.143.fc34.x86_64 does not resolve this issue :( Still had to go back to 5.11.0-0.rc4.129.fc34 dwarves-1.20-1.fc34.x86_64 libdwarves1-1.20-1.fc34.x86_64 Fixes both "failed to validate module [?????] BTF: -22" type errors, and 'in-kernel BTF is malformed" with qemu-kvm and libvirt. OK maybe the second problem is fixed with gcc-11.0.0-0.18.fc34.x86_64 which results in (GCC) 11.0.0 20210130 attached to the kernel build I just did. I'm not sure what else it could have been. As Nuno reports, this kernel still has the problem, but it also is built with GCC 11.0.0 20210130 so... that's not it? Bah! Anyway, I've also updated the upstream thread. Feb 04 18:36:31 fmac.local kernel: Linux version 5.11.0-0.rc6.20210203git3aaf0a27ffc2.143.fc34.x86_64 (mockbuild.fedoraproject.org) (gcc (GCC) 11.0.0 20210130 (Red Hat 11.0.0-0), GNU ld version 2.35.1-26.fc34) #1 SMP Wed Feb 3 19:07:40 UTC 2021 Hi Chris, I believe dwarves-1.20-1.fc34.x86_64 and libdwarves1-1.20-1.fc34.x86_64 packages nothing have to do with this issue. O my server both are not installed. Thanks From https://kojipkgs.fedoraproject.org//packages/kernel/5.11.0/0.rc6.20210203git3aaf0a27ffc2.143.fc34/data/logs/x86_64/root.log: DEBUG util.py:446: dwarves x86_64 1.19-2.fc34 build 121 k DEBUG util.py:446: libdwarves1 x86_64 1.19-2.fc34 build 186 k So the .143. kernel was still built using the old dwarves version, so I think the failure is expected? Podman is working again with Kernel 5.11.0-0.rc7.149.fc34.aarch64, which was built with libdwarves1-1.20-1.fc34 And also fixed in 5.11.0-0.rc7.149.fc34, et al. |