Red Hat Bugzilla – Full Text Bug Listing
|Summary:||kernel 3.3.0 (including rcs) broken on ARM: Segfaults and oom-killer during boot.|
|Product:||[Fedora] Fedora||Reporter:||Brendan Conoboy <blc>|
|Component:||kernel||Assignee:||Jon Masters <jcm>|
|Status:||CLOSED CURRENTRELEASE||QA Contact:||Fedora Extras Quality Assurance <extras-qa>|
|Version:||17||CC:||alexvillacislasso, dmarlin, gansalmon, itamar, jdisnard, jonathan, kernel-maint, madhu.chinakonda, pbrobinson, wcohen|
|Whiteboard:||first=3.3 audit arm|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2012-06-28 08:16:36 EDT||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Bug Depends On:|
Description Brendan Conoboy 2012-03-26 15:10:29 EDT
Created attachment 572849 [details] Log of a kernel 3.3.0-based kernel booting on a trimslice, going badly. Description of problem: When booting kernel-tegra-3.3.0-5 on a Trimslice the system fails to come up fully. Warnings start up at the 0.0 timestamp between NR_IRQS and sched_clock messages. Segmentation faults begin when starting the network. Shortly thereafter systemd invokes the oom-killer and backtraces start flying. Version-Release number of selected component (if applicable): Present in 3.3.0-5. Was also a problem in earlier 3.3.0-rc4 kernel. How reproducible: Every time. Steps to Reproduce: 1. Install kernel-tegra-3-3-* 2. Boot 3. Watch the fireworks Additional info: I'm attaching 2 logs, the first is the bad 3.3.0 kernel boot log. The second is a successful boot of the same system using a "2.6.41" kernel from F15.
Comment 1 Brendan Conoboy 2012-03-26 15:11:54 EDT
Created attachment 572851 [details] Log of a kernel 2.6.41-based kernel booting on a trimslice, operating normally
Comment 2 D. Marlin 2012-03-28 20:04:10 EDT
I tried booting a Panda Board (ARM OMAP4) using the 3.3.0-0.rc4.git3.1.fc17.armv7hl.omap kernel with an F17 armv7hl rootfs and I see the same type of errors that as I saw on Trim Slice (Tegra2), i.e., [ 39.046722] BUG: sleeping function called from invalid context at include/linux/freezer.h:46 [ 39.051757] in_atomic(): 0, irqs_disabled(): 128, pid: 1417, name: systemd [ 39.051757] INFO: lockdep is turned off. [ 39.051757] irq event stamp: 0 [ 39.051757] hardirqs last enabled at (0): [< (null)>] (null) [ 39.072662] hardirqs last disabled at (0): [<c003b7f0>] copy_process.part.35+0x3dc/0x117c [ 39.085052] softirqs last enabled at (0): [<c003b7f0>] copy_process.part.35+0x3dc/0x117c [ 39.085449] softirqs last disabled at (0): [< (null)>] (null) [ 39.100006] [<c0015b24>] (unwind_backtrace+0x0/0xf8) from [<c0010bac>] (do_signal+0x40/0x550) [ 39.103851] [<c0010bac>] (do_signal+0x40/0x550) from [<c0011654>] (do_notify_resume+0x54/0x60) [ 39.111694] [<c0011654>] (do_notify_resume+0x54/0x60) from [<c000dec4>] (work_pending+0x24/0x28) : [ 1108.801422] Out of memory: Kill process 368 (systemd) score 1 or sacrifice child [ 1108.810119] Killed process 375 (systemd) total-vm:5408kB, anon-rss:1320kB, file-rss:68kB so this issue does not seem to be isolated to Tegra-based systems.
Comment 3 D. Marlin 2012-03-29 13:55:09 EDT
Created attachment 573756 [details] boot log for trimslice kernel-3.3 built using gcc-4.6 This is the boot log from a Trim Slice using an upstream 3.3 kernel and Fedora Tegra kernel config, built with gcc version 4.6.1 20110908 (Red Hat 4.6.1-9).
Comment 4 Guy Streeter 2012-04-12 16:06:12 EDT
The systemd fork-bomb is no longer happening on my dreamplug. The only difference I know of is I have done a yum update recently. I'm still running fedora-arm 3.3.0-0.rc4.git3.1.fc17.armv5tel.kirkwood I am still getting "sleeping function called from invalid context" traces pretty much constantly. I have to boot with libertas.disable=1 or udev hangs. I'm investigating that.
Comment 5 Guy Streeter 2012-04-12 16:32:17 EDT
I spoke too soon. I just rebooted and the systemd fork-bomb is back. I don't know why it worked once.
Comment 6 William Cohen 2012-04-17 17:25:22 EDT
Created attachment 578158 [details] Log output with systemd debugging enabled According to http://fedoraproject.org/wiki/How_to_debug_Systemd_problems there are kernel boot parameters that can be set to find out what systemd is doing. The attached log was generated with "systemd.log_level=debug systemd.log_target=console" on the command line of a linux-3.4.0-rc4 kernel that exhibits the problem. Looking through the pid of the "Forked ... as <pid>" one sees that the pid numbers increasing very quickly after NetworkManager starts: Forked /sbin/auditd as 308 Forked /sbin/auditctl as 309 Forked /usr/sbin/rootfs-resize as 312 Forked /usr/sbin/NetworkManager as 318 Forked /usr/lib/systemd/systemd-logind as 637 Forked /usr/lib/systemd/systemd-user-sessions as 780 Forked /bin/plymouth as 951 Forked /bin/plymouth as 1131 For a non-problem kernel (188.8.131.52-1.fc15.armv7hl.tegra) see: Forked /sbin/auditd as 344 Forked /sbin/auditctl as 345 Forked /usr/sbin/rootfs-resize as 348 Forked /usr/sbin/NetworkManager as 354 Forked /usr/lib/systemd/systemd-logind as 356 Forked /usr/lib/systemd/systemd-user-sessions as 357 Forked /bin/plymouth as 358 Forked /bin/plymouth as 360 Noticed in attachment "572849: Log of a kernel 3.3.0-based kernel booting on a trimslice, going badly." there are a number of segmentation faults related to networking like: network: /etc/sysconfig/network-scripts/init.ipv6-global: line 67: 461 Segmentation fault /sbin/sysctl -e -w net.ipv6.conf.$sinter Is there some way to turn off NetworkManager and see if the problem kernel boots up then?
Comment 7 Peter Robinson 2012-04-17 17:36:46 EDT
you can turn off NM and enable the old "network" service. Also make sure you have openssl 1.0.0 and not 1.0.1 as the later has issues
Comment 8 Brendan Conoboy 2012-04-17 20:18:00 EDT
Disabling both NM and network does not stop the OOM storm.
Comment 9 Guy Streeter 2012-04-18 16:08:15 EDT
Booting with a read-only root fs seems somehow to avoid the systemd fork-bomb. I don't know if that's a useful datapoint, but there it is. I can get up to multi-user that way, and then remount the rootfs as rw.
Comment 10 Brendan Conoboy 2012-04-18 16:26:36 EDT
Tried booting trimslice ro, no help. Perhaps this is only effective on armv5tel or only on kirkwood kernels.
Comment 11 William Cohen 2012-04-20 09:40:22 EDT
On #fedora-arm jonmasters mentioned turning off auditd allows a 3.3/3.4 kernel to boot. Verified able to boot Linux 3.4.0-rc3+ after turning off audit with: # systemctl disable auditd.service
Comment 12 Jon Disnard 2012-04-20 17:09:31 EDT
As a work around you might append "audit=0" to the kernel cmdline.
Comment 13 Peter Robinson 2012-06-28 08:16:36 EDT
There was a patch sent upstream which stopped the crash when audit was enabled but there was a still issues with audit on ARM, not sure the status of it being properly fixed. 3.4.x doesn't crash and boots fine on both PandaBoard and Trimslice