Red Hat Bugzilla – Bug 807019
kernel 3.3.0 (including rcs) broken on ARM: Segfaults and oom-killer during boot.
Last modified: 2012-06-28 08:16:36 EDT
Created attachment 572849 [details]
Log of a kernel 3.3.0-based kernel booting on a trimslice, going badly.
Description of problem:
When booting kernel-tegra-3.3.0-5 on a Trimslice the system fails to come up fully. Warnings start up at the 0.0 timestamp between NR_IRQS and sched_clock messages. Segmentation faults begin when starting the network. Shortly thereafter systemd invokes the oom-killer and backtraces start flying.
Version-Release number of selected component (if applicable):
Present in 3.3.0-5. Was also a problem in earlier 3.3.0-rc4 kernel.
Steps to Reproduce:
1. Install kernel-tegra-3-3-*
3. Watch the fireworks
I'm attaching 2 logs, the first is the bad 3.3.0 kernel boot log. The second is a successful boot of the same system using a "2.6.41" kernel from F15.
Created attachment 572851 [details]
Log of a kernel 2.6.41-based kernel booting on a trimslice, operating normally
I tried booting a Panda Board (ARM OMAP4) using the 3.3.0-0.rc4.git3.1.fc17.armv7hl.omap kernel with an F17 armv7hl rootfs and I see the same type of errors that as I saw on Trim Slice (Tegra2), i.e.,
[ 39.046722] BUG: sleeping function called from invalid context at include/linux/freezer.h:46
[ 39.051757] in_atomic(): 0, irqs_disabled(): 128, pid: 1417, name: systemd
[ 39.051757] INFO: lockdep is turned off.
[ 39.051757] irq event stamp: 0
[ 39.051757] hardirqs last enabled at (0): [< (null)>] (null)
[ 39.072662] hardirqs last disabled at (0): [<c003b7f0>] copy_process.part.35+0x3dc/0x117c
[ 39.085052] softirqs last enabled at (0): [<c003b7f0>] copy_process.part.35+0x3dc/0x117c
[ 39.085449] softirqs last disabled at (0): [< (null)>] (null)
[ 39.100006] [<c0015b24>] (unwind_backtrace+0x0/0xf8) from [<c0010bac>] (do_signal+0x40/0x550)
[ 39.103851] [<c0010bac>] (do_signal+0x40/0x550) from [<c0011654>] (do_notify_resume+0x54/0x60)
[ 39.111694] [<c0011654>] (do_notify_resume+0x54/0x60) from [<c000dec4>] (work_pending+0x24/0x28)
[ 1108.801422] Out of memory: Kill process 368 (systemd) score 1 or sacrifice child
[ 1108.810119] Killed process 375 (systemd) total-vm:5408kB, anon-rss:1320kB, file-rss:68kB
so this issue does not seem to be isolated to Tegra-based systems.
Created attachment 573756 [details]
boot log for trimslice kernel-3.3 built using gcc-4.6
This is the boot log from a Trim Slice using an upstream 3.3 kernel and Fedora Tegra kernel config, built with gcc version 4.6.1 20110908 (Red Hat 4.6.1-9).
The systemd fork-bomb is no longer happening on my dreamplug. The only difference I know of is I have done a yum update recently. I'm still running
I am still getting "sleeping function called from invalid context" traces pretty much constantly.
I have to boot with libertas.disable=1 or udev hangs. I'm investigating that.
I spoke too soon. I just rebooted and the systemd fork-bomb is back. I don't know why it worked once.
Created attachment 578158 [details]
Log output with systemd debugging enabled
According to http://fedoraproject.org/wiki/How_to_debug_Systemd_problems there are kernel boot parameters that can be set to find out what systemd is doing. The attached log was generated with "systemd.log_level=debug systemd.log_target=console" on the command line of a linux-3.4.0-rc4 kernel that exhibits the problem.
Looking through the pid of the "Forked ... as <pid>" one sees that the pid numbers increasing very quickly after NetworkManager starts:
Forked /sbin/auditd as 308
Forked /sbin/auditctl as 309
Forked /usr/sbin/rootfs-resize as 312
Forked /usr/sbin/NetworkManager as 318
Forked /usr/lib/systemd/systemd-logind as 637
Forked /usr/lib/systemd/systemd-user-sessions as 780
Forked /bin/plymouth as 951
Forked /bin/plymouth as 1131
For a non-problem kernel (188.8.131.52-1.fc15.armv7hl.tegra) see:
Forked /sbin/auditd as 344
Forked /sbin/auditctl as 345
Forked /usr/sbin/rootfs-resize as 348
Forked /usr/sbin/NetworkManager as 354
Forked /usr/lib/systemd/systemd-logind as 356
Forked /usr/lib/systemd/systemd-user-sessions as 357
Forked /bin/plymouth as 358
Forked /bin/plymouth as 360
Noticed in attachment "572849: Log of a kernel 3.3.0-based kernel booting on a trimslice, going badly." there are a number of segmentation faults related to networking like:
network: /etc/sysconfig/network-scripts/init.ipv6-global: line 67: 461 Segmentation fault /sbin/sysctl -e -w net.ipv6.conf.$sinter
Is there some way to turn off NetworkManager and see if the problem kernel boots up then?
you can turn off NM and enable the old "network" service. Also make sure you have openssl 1.0.0 and not 1.0.1 as the later has issues
Disabling both NM and network does not stop the OOM storm.
Booting with a read-only root fs seems somehow to avoid the systemd fork-bomb. I don't know if that's a useful datapoint, but there it is. I can get up to multi-user that way, and then remount the rootfs as rw.
Tried booting trimslice ro, no help. Perhaps this is only effective on armv5tel or only on kirkwood kernels.
On #fedora-arm jonmasters mentioned turning off auditd allows a 3.3/3.4 kernel to boot. Verified able to boot Linux 3.4.0-rc3+ after turning off audit with:
# systemctl disable auditd.service
As a work around you might append "audit=0" to the kernel cmdline.
There was a patch sent upstream which stopped the crash when audit was enabled but there was a still issues with audit on ARM, not sure the status of it being properly fixed. 3.4.x doesn't crash and boots fine on both PandaBoard and Trimslice