Bug 807019
Summary: | kernel 3.3.0 (including rcs) broken on ARM: Segfaults and oom-killer during boot. | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Brendan Conoboy <blc> |
Component: | kernel | Assignee: | Jon Masters <jcm> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 17 | CC: | alexvillacislasso, dmarlin, gansalmon, itamar, jdisnard, jonathan, kernel-maint, madhu.chinakonda, pbrobinson, wcohen |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | arm | ||
OS: | Linux | ||
Whiteboard: | first=3.3 audit arm | ||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2012-06-28 12:16:36 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 245418 | ||
Attachments: |
Created attachment 572851 [details]
Log of a kernel 2.6.41-based kernel booting on a trimslice, operating normally
I tried booting a Panda Board (ARM OMAP4) using the 3.3.0-0.rc4.git3.1.fc17.armv7hl.omap kernel with an F17 armv7hl rootfs and I see the same type of errors that as I saw on Trim Slice (Tegra2), i.e., [ 39.046722] BUG: sleeping function called from invalid context at include/linux/freezer.h:46 [ 39.051757] in_atomic(): 0, irqs_disabled(): 128, pid: 1417, name: systemd [ 39.051757] INFO: lockdep is turned off. [ 39.051757] irq event stamp: 0 [ 39.051757] hardirqs last enabled at (0): [< (null)>] (null) [ 39.072662] hardirqs last disabled at (0): [<c003b7f0>] copy_process.part.35+0x3dc/0x117c [ 39.085052] softirqs last enabled at (0): [<c003b7f0>] copy_process.part.35+0x3dc/0x117c [ 39.085449] softirqs last disabled at (0): [< (null)>] (null) [ 39.100006] [<c0015b24>] (unwind_backtrace+0x0/0xf8) from [<c0010bac>] (do_signal+0x40/0x550) [ 39.103851] [<c0010bac>] (do_signal+0x40/0x550) from [<c0011654>] (do_notify_resume+0x54/0x60) [ 39.111694] [<c0011654>] (do_notify_resume+0x54/0x60) from [<c000dec4>] (work_pending+0x24/0x28) : [ 1108.801422] Out of memory: Kill process 368 (systemd) score 1 or sacrifice child [ 1108.810119] Killed process 375 (systemd) total-vm:5408kB, anon-rss:1320kB, file-rss:68kB so this issue does not seem to be isolated to Tegra-based systems. Created attachment 573756 [details]
boot log for trimslice kernel-3.3 built using gcc-4.6
This is the boot log from a Trim Slice using an upstream 3.3 kernel and Fedora Tegra kernel config, built with gcc version 4.6.1 20110908 (Red Hat 4.6.1-9).
The systemd fork-bomb is no longer happening on my dreamplug. The only difference I know of is I have done a yum update recently. I'm still running fedora-arm 3.3.0-0.rc4.git3.1.fc17.armv5tel.kirkwood I am still getting "sleeping function called from invalid context" traces pretty much constantly. I have to boot with libertas.disable=1 or udev hangs. I'm investigating that. I spoke too soon. I just rebooted and the systemd fork-bomb is back. I don't know why it worked once. Created attachment 578158 [details] Log output with systemd debugging enabled According to http://fedoraproject.org/wiki/How_to_debug_Systemd_problems there are kernel boot parameters that can be set to find out what systemd is doing. The attached log was generated with "systemd.log_level=debug systemd.log_target=console" on the command line of a linux-3.4.0-rc4 kernel that exhibits the problem. Looking through the pid of the "Forked ... as <pid>" one sees that the pid numbers increasing very quickly after NetworkManager starts: Forked /sbin/auditd as 308 Forked /sbin/auditctl as 309 Forked /usr/sbin/rootfs-resize as 312 Forked /usr/sbin/NetworkManager as 318 Forked /usr/lib/systemd/systemd-logind as 637 Forked /usr/lib/systemd/systemd-user-sessions as 780 Forked /bin/plymouth as 951 Forked /bin/plymouth as 1131 For a non-problem kernel (2.6.42.12-1.fc15.armv7hl.tegra) see: Forked /sbin/auditd as 344 Forked /sbin/auditctl as 345 Forked /usr/sbin/rootfs-resize as 348 Forked /usr/sbin/NetworkManager as 354 Forked /usr/lib/systemd/systemd-logind as 356 Forked /usr/lib/systemd/systemd-user-sessions as 357 Forked /bin/plymouth as 358 Forked /bin/plymouth as 360 Noticed in attachment "572849: Log of a kernel 3.3.0-based kernel booting on a trimslice, going badly." there are a number of segmentation faults related to networking like: network[301]: /etc/sysconfig/network-scripts/init.ipv6-global: line 67: 461 Segmentation fault /sbin/sysctl -e -w net.ipv6.conf.$sinter Is there some way to turn off NetworkManager and see if the problem kernel boots up then? you can turn off NM and enable the old "network" service. Also make sure you have openssl 1.0.0 and not 1.0.1 as the later has issues Disabling both NM and network does not stop the OOM storm. Booting with a read-only root fs seems somehow to avoid the systemd fork-bomb. I don't know if that's a useful datapoint, but there it is. I can get up to multi-user that way, and then remount the rootfs as rw. Tried booting trimslice ro, no help. Perhaps this is only effective on armv5tel or only on kirkwood kernels. On #fedora-arm jonmasters mentioned turning off auditd allows a 3.3/3.4 kernel to boot. Verified able to boot Linux 3.4.0-rc3+ after turning off audit with: # systemctl disable auditd.service As a work around you might append "audit=0" to the kernel cmdline. There was a patch sent upstream which stopped the crash when audit was enabled but there was a still issues with audit on ARM, not sure the status of it being properly fixed. 3.4.x doesn't crash and boots fine on both PandaBoard and Trimslice |
Created attachment 572849 [details] Log of a kernel 3.3.0-based kernel booting on a trimslice, going badly. Description of problem: When booting kernel-tegra-3.3.0-5 on a Trimslice the system fails to come up fully. Warnings start up at the 0.0 timestamp between NR_IRQS and sched_clock messages. Segmentation faults begin when starting the network. Shortly thereafter systemd invokes the oom-killer and backtraces start flying. Version-Release number of selected component (if applicable): Present in 3.3.0-5. Was also a problem in earlier 3.3.0-rc4 kernel. How reproducible: Every time. Steps to Reproduce: 1. Install kernel-tegra-3-3-* 2. Boot 3. Watch the fireworks Additional info: I'm attaching 2 logs, the first is the bad 3.3.0 kernel boot log. The second is a successful boot of the same system using a "2.6.41" kernel from F15.