1. Please describe the problem: Attempting to boot F33 on armhfp causes a kernel panic(perhaps due to a crash in systemd): [ 180.374848] audit: type=1701 audit(1587127162.868:73): auid=4294967295 uid=0 gid=0 ses=4294967295 subj=system_u:system_r:kernel_t:s0 pid=1 comm="systemd" exe="/usr/lib/systemd/systemd" sig=11 res=1 [ 181.252086] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000008b [ 181.259898] CPU: 1 PID: 1 Comm: systemd Not tainted 5.7.0-0.rc3.20200501gitc45e8bccecaf.1.fc33.armv7hl #1 [ 181.269621] Hardware name: BCM2835 [ 181.273116] [<c0310fc8>] (unwind_backtrace) from [<c030b23c>] (show_stack+0x18/0x1c) [ 181.281003] [<c030b23c>] (show_stack) from [<c0768924>] (dump_stack+0xd0/0x104) [ 181.288450] [<c0768924>] (dump_stack) from [<c034d9d4>] (panic+0x104/0x350) [ 181.295543] [<c034d9d4>] (panic) from [<c0352d14>] (do_exit+0x1bc/0xb58) [ 181.302368] [<c0352d14>] (do_exit) from [<c0353750>] (do_group_exit+0x64/0xc4) [ 181.309724] [<c0353750>] (do_group_exit) from [<c035ff84>] (get_signal+0x1d0/0x764) [ 181.317520] [<c035ff84>] (get_signal) from [<c030aa7c>] (do_work_pending+0xe4/0x424) [ 181.325404] [<c030aa7c>] (do_work_pending) from [<c03000f4>] (slow_work_pending+0xc/0x20) [ 181.333719] Exception stack(0xef12dfb0 to 0xef12dff8) [ 181.338865] dfa0: be898ba4 be898ba4 b6f1ee30 00000000 [ 181.347187] dfc0: 00000000 00000001 00000000 00000000 00639260 005c651c be898fa1 00637de8 [ 181.355507] dfe0: b6dcbdb0 be898b78 b6bd3674 b6e62798 a00f0010 ffffffff [ 181.362897] CPU0: stopping [ 181.366050] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-0.rc3.20200501gitc45e8bccecaf.1.fc33.armv7hl #1 [ 181.375953] Hardware name: BCM2835 [ 181.379440] [<c0310fc8>] (unwind_backtrace) from [<c030b23c>] (show_stack+0x18/0x1c) [ 181.387326] [<c030b23c>] (show_stack) from [<c0768924>] (dump_stack+0xd0/0x104) [ 181.394774] [<c0768924>] (dump_stack) from [<c030eb88>] (handle_IPI+0x1c0/0x394) [ 181.402309] [<c030eb88>] (handle_IPI) from [<c0300b70>] (__irq_svc+0x70/0x98) [ 181.409568] Exception stack(0xc1401f18 to 0xc1401f60) [ 181.414712] 1f00: 00000001 00000006 [ 181.423036] 1f20: c140d5c0 00000000 00000000 00000000 c1400000 c1401f70 00000000 c14090b0 [ 181.431361] 1f40: efffcd80 00000111 0000002a c1401f68 c0307bf8 c0307bfc 60010013 ffffffff [ 181.439690] [<c0300b70>] (__irq_svc) from [<c0307bfc>] (arch_cpu_idle+0x28/0x58) [ 181.447226] [<c0307bfc>] (arch_cpu_idle) from [<c0383460>] (do_idle+0xe8/0x268) [ 181.454671] [<c0383460>] (do_idle) from [<c0383968>] (cpu_startup_entry+0x20/0x28) [ 181.462384] [<c0383968>] (cpu_startup_entry) from [<c120129c>] (start_kernel+0x6c0/0x7a4) [ 181.470708] CPU2: stopping [ 181.473481] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.7.0-0.rc3.20200501gitc45e8bccecaf.1.fc33.armv7hl #1 [ 181.483381] Hardware name: BCM2835 [ 181.486862] [<c0310fc8>] (unwind_backtrace) from [<c030b23c>] (show_stack+0x18/0x1c) [ 181.494748] [<c030b23c>] (show_stack) from [<c0768924>] (dump_stack+0xd0/0x104) [ 181.502194] [<c0768924>] (dump_stack) from [<c030eb88>] (handle_IPI+0x1c0/0x394) [ 181.509727] [<c030eb88>] (handle_IPI) from [<c0300b70>] (__irq_svc+0x70/0x98) [ 181.516986] Exception stack(0xef171f68 to 0xef171fb0) [ 181.522134] 1f60: 00000001 00000006 ef16ad00 00000000 00000000 00000000 [ 181.530458] 1f80: ef170000 ef171fc0 00000002 c14090b0 00000000 00000000 0000002a ef171fb8 [ 181.538777] 1fa0: c0307bf8 c0307bfc 60070113 ffffffff [ 181.543931] [<c0300b70>] (__irq_svc) from [<c0307bfc>] (arch_cpu_idle+0x28/0x58) [ 181.551466] [<c0307bfc>] (arch_cpu_idle) from [<c0383460>] (do_idle+0xe8/0x268) [ 181.558912] [<c0383460>] (do_idle) from [<c0383968>] (cpu_startup_entry+0x20/0x28) [ 181.566622] [<c0383968>] (cpu_startup_entry) from [<003017cc>] (0x3017cc) [ 181.573535] CPU3: stopping [ 181.576306] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.7.0-0.rc3.20200501gitc45e8bccecaf.1.fc33.armv7hl #1 [ 181.586208] Hardware name: BCM2835 [ 181.589691] [<c0310fc8>] (unwind_backtrace) from [<c030b23c>] (show_stack+0x18/0x1c) [ 181.597577] [<c030b23c>] (show_stack) from [<c0768924>] (dump_stack+0xd0/0x104) [ 181.605023] [<c0768924>] (dump_stack) from [<c030eb88>] (handle_IPI+0x1c0/0x394) [ 181.612556] [<c030eb88>] (handle_IPI) from [<c0300b70>] (__irq_svc+0x70/0x98) [ 181.619814] Exception stack(0xef173f68 to 0xef173fb0) [ 181.624964] 3f60: 00000001 00000006 ef168000 00000000 00000000 00000000 [ 181.633288] 3f80: ef172000 ef173fc0 00000003 c14090b0 00000000 00000000 ffffe000 ef173fb8 [ 181.641607] 3fa0: c0307bf8 c0307bfc 60070013 ffffffff [ 181.646761] [<c0300b70>] (__irq_svc) from [<c0307bfc>] (arch_cpu_idle+0x28/0x58) [ 181.654297] [<c0307bfc>] (arch_cpu_idle) from [<c0383460>] (do_idle+0xe8/0x268) [ 181.661744] [<c0383460>] (do_idle) from [<c0383968>] (cpu_startup_entry+0x20/0x28) [ 181.669453] [<c0383968>] (cpu_startup_entry) from [<003017cc>] (0x3017cc) [ 181.676415] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000008b ]--- 2. What is the Version-Release number of the kernel: 5.7.0-0.rc3.20200501gitc45e8bccecaf.1.fc33.armv7hl systemd-245.5-2.fc33 3. Did it work previously in Fedora? If so, what kernel version did the issue *first* appear? Old kernels are available for download at https://koji.fedoraproject.org/koji/packageinfo?packageID=8 : 4. Can you reproduce this issue? If so, please provide the steps to reproduce the issue below: Attempt F33 on armhfp on either hardware or in qemu. 5. Does this problem occur with the latest Rawhide kernel? To install the Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by ``sudo dnf update --enablerepo=rawhide kernel``: Yes, todays compose includes 5.7.0-0.rc3.20200501gitc45e8bccecaf.1.fc33 6. Are you running any modules that not shipped with directly Fedora's kernel?: No. 7. Please attach the kernel logs. You can get the complete kernel log for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the issue occurred on a previous boot, use the journalctl ``-b`` flag. Booting with selinux disabled (selinux=0): .. [ OK ] Stopped Hardware RNG Entropy Gatherer Daemon. [ OK ] Finished Cleanup udevd DB. [ OK ] Reached target Switch Root. Starting Switch Root... [ 123.144677] systemd-journald[305]: Received SIGTERM from PID 1 (systemd). [ 125.670556] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000008b [ 125.673324] CPU: 0 PID: 1 Comm: systemd Not tainted 5.7.0-0.rc3.20200501gitc45e8bccecaf.1.fc33.armv7hl #1 [ 125.676519] Hardware name: Generic DT based system [ 125.678189] [<c0310fc8>] (unwind_backtrace) from [<c030b23c>] (show_stack+0x18/0x1c) [ 125.680920] [<c030b23c>] (show_stack) from [<c0768924>] (dump_stack+0xd0/0x104) [ 125.683424] [<c0768924>] (dump_stack) from [<c034d9d4>] (panic+0x104/0x350) [ 125.685794] [<c034d9d4>] (panic) from [<c0352d14>] (do_exit+0x1bc/0xb58) [ 125.688062] [<c0352d14>] (do_exit) from [<c0353750>] (do_group_exit+0x64/0xc4) [ 125.690594] [<c0353750>] (do_group_exit) from [<c035ff84>] (get_signal+0x1d0/0x764) [ 125.693254] [<c035ff84>] (get_signal) from [<c030aa7c>] (do_work_pending+0xe4/0x424) [ 125.695872] [<c030aa7c>] (do_work_pending) from [<c03000f4>] (slow_work_pending+0xc/0x20) [ 125.698638] Exception stack(0xee113fb0 to 0xee113ff8) [ 125.700424] 3fa0: beef1b94 beef1b94 b6ef0e30 00000000 [ 125.703215] 3fc0: 00000000 00000001 00000000 00000000 0061d260 005aa51c beef1f9a 0061bde8 [ 125.705966] 3fe0: b6d9ddb0 beef1b68 b6ba5674 b6e34798 a0010010 ffffffff [ 125.708333] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000008b ]---
Created attachment 1684930 [details] Boot with 'selinux=0 systemd.log_level=debug systemd.log_target=console'
dnf updated an F32 system to F33, booting with the 5.6 still hitting this. 5.7 boots ok on F32 userspace. Moving to systemd.
Compose Fedora-Rawhide-20200421.n.0, which included systemd-245.5-1.fc33 looks the same. [ OK ] Reached target Switch Root. initrd-switch-root.service: ConditionPathExists=/etc/initrd-release succeeded. Failed to read pids.max attribute of cgroup root, ignoring: No such file or directory initrd-switch-root.service: Passing 0 fds to service initrd-switch-root.service: About to execute: /usr/bin/systemctl --no-block switch-root /sysroot initrd-switch-root.service: Forked /usr/bin/systemctl as 1363 initrd-switch-root.service: Changed dead -> start Starting Switch Root... initrd-switch-root.service: Executing: /usr/bin/systemctl --no-block switch-root /sysroot systemd-journald.service: Got notification message from PID 313 (FDSTORE=1) systemd-journald.service: Added fd 28 (n/a) to fd store. Bus private-bus-connection: changing state UNSET → OPENING Bus private-bus-connection: changing state OPENING → AUTHENTICATING Accepted new private connection. Bus private-bus-connection: changing state AUTHENTICATING → RUNNING Got message type=method_call sender=n/a destination=org.freedesktop.systemd1 path=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=SwitchRoot cookie=1 reply_cookie=0 signature=ss error-name=n/a error-message=n/a Sent message type=method_return sender=org.freedesktop.systemd1 destination=n/a path=n/a interface=n/a member=n/a cookie=1 reply_cookie=1 signature=n/a error-name=n/a error-message=n/a Serializing systemd-state to memfd. Bus private-bus-connection: changing state RUNNING → CLOSING Failed to send reloading signal: Connection reset by peer Switching root. systemd-journald.service: Releasing resources. systemd-journald.service: Releasing all stored fds Bus private-bus-connection: changing state CLOSING → CLOSED [ 107.621174] kauditd_printk_skb: 20 callbacks suppressed [ 107.621179] audit: type=1701 audit(1588699446.745:76): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=1 comm="systemd" exe="/usr/lib/systemd/systemd" sig=11 res=1 [ 107.794106] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000008b
Created attachment 1685323 [details] Fedora-Rawhide-20200421.n.0 compose boot logs
Proposing as an automatic blocker for F33-Beta, armhfp fails to boot.
The armhfp boot breaks when updating to glibc-2.31.9000-1.fc33+. Moving to glibc.
Hmmm is this something to do with annobin again? the fedpkg local build libraries appear to work, if they are extracted from the rpm, they fail (differing md5sums too).
(In reply to Jeremy Linton from comment #7) > Hmmm is this something to do with annobin again? the fedpkg local build > libraries appear to work, if they are extracted from the rpm, they fail > (differing md5sums too). Thanks for starting to debug this Jeremy. Can you try building in a mock chroot please? That will give you the most accurate buildroot environment (short of a scratch build in koji). e.g. mock -r fedora-rawhide-armhfp --init fedpkg srpm mock -r fedora-rawhide-armhfp --no-clean --rebuild ./path/to/glibc.src.rpm Then try to use those rpms for testing?
Yes, it turns out I was having build/tree issues. In the meantime, I've discovered that the current fedora package has commits from the master/2.32 branch, rather than the release 2.31 version it appears to listed as. So bisecting the master branch, the first problem i'm having is at ec935dea "elf: Implement __libc_early_init" on my arm vm running a f32 1.6 image. That commit is in the fedora glibc, and for sure is causing crashes when I build with it.
(In reply to Jeremy Linton from comment #9) > Yes, it turns out I was having build/tree issues. In the meantime, I've > discovered that the current fedora package has commits from the master/2.32 > branch, rather than the release 2.31 version it appears to listed as. Fedora Rawhide tracks tracks upstream glibc master branch, which is called 2.31.9000, and is the development branch for the forthcoming glibc 2.32 which freezes in July for release on August 1st. > So bisecting the master branch, the first problem i'm having is at ec935dea > "elf: Implement __libc_early_init" on my arm vm running a f32 1.6 image. > > That commit is in the fedora glibc, and for sure is causing crashes when I > build with it. Are you able to get a core dump?
(In reply to Jeremy Linton from comment #9) > Yes, it turns out I was having build/tree issues. In the meantime, I've > discovered that the current fedora package has commits from the master/2.32 > branch, rather than the release 2.31 version it appears to listed as. > > So bisecting the master branch, the first problem i'm having is at ec935dea > "elf: Implement __libc_early_init" on my arm vm running a f32 1.6 image. > > That commit is in the fedora glibc, and for sure is causing crashes when I > build with it. Would you please attach libc.so.6 from the broken image to this bug report? Thanks.
To be clear, I suspect that you have a mismatch between ld-linux-armhf.so.3 and libc.so.6. It would totally explain a crash if the ld is newer than libc.
Yah, its possible i've had mismatches in the past, the current test cycle is build it in the VM, install it to /usr/local/test, then copy the .so's (with their embedded "9000") to /lib. Then rerun ldconfig to reconfigure the links to point at the "9000".so. I've not found a way to break it once this is in place. But then when the machine is rebooted it hangs farther into the boot process after switching root (i've not seen a core doing it this way). Then I remount the disk on the host machine, re-adjust the /lib symlinks, and it can then be restarted. So, I guess there could be some kind of mismatch between the ld-linux on the initrd being "stale" and the one on the rootfs, but i'm not sure how to fix that without making the recovery sequence worse.
Jeremy, Have you been able to make any progress this issue? Unfortunately the Fedora glibc team is not able to work on this issue, but if you can produce a core dump, or a more detailed analysis we would be more than happy to review.
Any status update on this? This is the only bug we've seen reporting crashes like this.
Well, this is an odd one. And its been an interesting experience trying to pin down what its doing. Anyway, its not out of the box possible to get a core dump from it because the kernel panics as soon as init exits. Well, that is without some kernel hacking i've been doing recently. But of course getting the core requires me to follow the systemd re-exec's, which itself has further problems at the moment. Weirdly, whatever the bug, it only seems to affect systemd during switch-root/reexec. I've got the latest bleeding edge glibc running on the machine and its fine, well at least enough to rebuilt itself and do various other normal operations via a chroot to /sysroot when booted off a functional initrd. So, at the moment i'm trying to follow the systemd's re-exec at the point of evecv() going into the loader.... At this point if there is a loader "mismatch" issue, its because the "stale" loader on the initrd I use to bootstrap the system, which is the one linked to the systemd image called as /init doesn't match the one in the rootfs/lib/ld-*. That shouldn't be a problem AFAIK, and there isn't anything in the systemd parm serialization that I see which might cause the problem. So that would mean that somehow the stale loader is being remapped by the kernel rather than using the correct one? If there is a bug in there, simply swapping the new libraries into the initrd should be sufficient to fix the boot problem (I haven't tried that, rather trying to isolate what is actually happening). Its probably a good idea if someone a bit more familiar with the loader intricacies also looks at this, since i'm in pretty deep. IWorse case I can re-running the again now, that I've got a more robust boot/test system running.
I dumped the proper, build a/b test plan, and just debugged it through the switch-root blackhole hot-patching systemd/etc on the fly by hand. (gdb) bt #0 0xb6e901d8 in settimeofday () from /lib/libc.so.6 #1 0xb6bff994 in clock_reset_timewarp () from /usr/lib/systemd/libsystemd-shared-245.so #2 0x0043a4d8 in ?? () (gdb) disassemble settimeofday Dump of assembler code for function settimeofday: 0x00094768 <+0>: push {lr} ; (str lr, [sp, #-4]!) 0x0009476c <+4>: ldr r2, [pc, #152] ; 0x9480c <settimeofday+164> 0x00094770 <+8>: sub sp, sp, #36 ; 0x24 0x00094774 <+12>: ldr r3, [pc, #148] ; 0x94810 <settimeofday+168> 0x00094778 <+16>: str r0, [sp, #4] 0x0009477c <+20>: add r2, pc, r2 0x00094780 <+24>: ldr r3, [r2, r3] 0x00094784 <+28>: ldr r3, [r3] 0x00094788 <+32>: str r3, [sp, #28] 0x0009478c <+36>: mov r3, #0 0x00094790 <+40>: mov r3, r0 0x00094794 <+44>: subs r0, r1, #0 0x00094798 <+48>: ldrd r2, [r3] 0x0009479c <+52>: bne 0x947ec <settimeofday+132> 0x000947a0 <+56>: rsb r1, r3, r3, lsl #5 0x000947a4 <+60>: add r3, r3, r1, lsl #2 0x000947a8 <+64>: asr r12, r2, #31 0x000947ac <+68>: lsl r3, r3, #3 0x000947b0 <+72>: add r1, sp, #8 0x000947b4 <+76>: str r2, [sp, #8] 0x000947b8 <+80>: str r12, [sp, #12] 0x000947bc <+84>: str r3, [sp, #16] 0x000947c0 <+88>: bl 0xa0a14 <__clock_settime64> 0x000947c4 <+92>: ldr r2, [pc, #72] ; 0x94814 <settimeofday+172> 0x000947c8 <+96>: ldr r3, [pc, #64] ; 0x94810 <settimeofday+168> 0x000947cc <+100>: add r2, pc, r2 0x000947d0 <+104>: ldr r3, [r2, r3] 0x000947d4 <+108>: ldr r2, [r3] 0x000947d8 <+112>: ldr r3, [sp, #28] 0x000947dc <+116>: eors r2, r3, r2 0x000947e0 <+120>: bne 0x94808 <settimeofday+160> 0x000947e4 <+124>: add sp, sp, #36 ; 0x24 0x000947e8 <+128>: pop {pc} ; (ldr pc, [sp], #4) 0x000947ec <+132>: mrc 15, 0, r2, cr13, cr0, {3} 0x000947f0 <+136>: mov r1, #22 0x000947f4 <+140>: ldr r3, [pc, #28] ; 0x94818 <settimeofday+176> 0x000947f8 <+144>: mvn r0, #0 0x000947fc <+148>: ldr r3, [pc, r3] 0x00094800 <+152>: str r1, [r2, r3] 0x00094804 <+156>: b 0x947c4 <settimeofday+92> 0x00094808 <+160>: bl 0xf02e0 <__stack_chk_fail> 0x0009480c <+164>: andeq r12, r11, r12, lsr #13 0x00094810 <+168>: andeq r0, r0, r8, asr r1 0x00094814 <+172>: andeq r12, r11, r12, asr r6 0x00094818 <+176>: andeq r12, r11, r8, lsl #14 (gdb) info registers r0 0xbee1dbb4 3202472884 r1 0xbee1dbb4 3202472884 r2 0xb6f4ce28 3069496872 r3 0x0 0 r4 0x0 0 r5 0x1 1 r6 0x0 0 r7 0x0 0 r8 0x583238 5780024 r9 0x50f9bc 5306812 r10 0xbee1dfae 3202473902 r11 0x581de0 5774816 r12 0xb6df8db4 3068104116 sp 0xbee1db88 0xbee1db88 lr 0xb6bff994 -1228932716 pc 0xb6e901d8 0xb6e901d8 <settimeofday+48> cpsr 0xa0010010 -1610547184 fpscr 0x0 0 drop a nop in at 0x00094798 like: (gdb) set {int}0x00094798 = 0xe1a00000 (gdb) c machine boots: [root@localhost ~]# uname -a Linux localhost.localdomain 5.8.0-0.rc2.20200625git8be3a53e18e0.1.fc33.armv7hl #1 SMP Thu Jun 25 19:10:12 UTC 2020 armv7l armv7l armv7l GNU/Linux [root@localhost ~]# cat /etc/os-release NAME=Fedora VERSION="33 (Server Edition Prerelease)"
And mostly for my own memory, key trick to keeping systemd from killing gdb, is whitelisting the debugger in do_reexecute()->broadcast_signal(). Following the execve's, and mount/pivot-roots is mostly normal gdb foo via a rd.break and manually signal 15'ing the process.
Just in case it wasn't obvious: systemd: int clock_reset_timewarp(void) { const struct timeval *tv_null = NULL; struct timezone tz; tz.tz_minuteswest = 0; tz.tz_dsttime = 0; /* DST_NONE */ /* * The very first call to settimeofday() does time warp magic. Do a * dummy call here, so the time warping is sealed and all later calls * behave as expected. */ if (settimeofday(tv_null, &tz) < 0) return -errno; return 0; } glibc: /* Convert a known valid struct timeval into a struct __timeval64. */ static inline struct __timeval64 valid_timeval_to_timeval64 (const struct timeval tv) { struct __timeval64 tv64; tv64.tv_sec = tv.tv_sec; tv64.tv_usec = tv.tv_usec; return tv64; } int __settimeofday (const struct timeval *tv, const struct timezone *tz) { struct __timeval64 tv64 = valid_timeval_to_timeval64 (*tv); return __settimeofday64 (&tv64, tz); }
Oh, sorry. Indeed glibc broke the compatibility emulation code. I'll fix this upstream. The systemd issue has some background information why we need to preserve compatibility here.
Upstream fix posted: https://sourceware.org/pipermail/libc-alpha/2020-June/115557.html
Fixed upstream with: commit 5f40e4b1ba69a22923f6ec692d2d0f65733ccb0b Author: Florian Weimer <fweimer> Date: Tue Jun 30 21:19:43 2020 +0200 Linux: Fix UTC offset setting in settimeofday for __TIMESIZE != 64 The time argument is NULL in this case, and attempt to convert it leads to a null pointer dereference. This fixes commit d2e3b697da2433c08702f95c76458c51545c3df1 ("y2038: linux: Provide __settimeofday64 implementation"). Reviewed-by: Adhemerval Zanella <adhemerval.zanella> Jeremy, many thanks for tracking this one down!
This is now fixed in Fedora Rawhide with the current build of glibc-2.31.9000-17.fc33: https://koji.fedoraproject.org/koji/taskinfo?taskID=46478201