Bug 807019 - kernel 3.3.0 (including rcs) broken on ARM: Segfaults and oom-killer during boot.
kernel 3.3.0 (including rcs) broken on ARM: Segfaults and oom-killer during ...
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
17
arm Linux
unspecified Severity high
: ---
: ---
Assigned To: Jon Masters
Fedora Extras Quality Assurance
first=3.3 audit arm
:
Depends On:
Blocks: ARMTracker
  Show dependency treegraph
 
Reported: 2012-03-26 15:10 EDT by Brendan Conoboy
Modified: 2012-06-28 08:16 EDT (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-06-28 08:16:36 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Log of a kernel 3.3.0-based kernel booting on a trimslice, going badly. (33.15 KB, text/plain)
2012-03-26 15:10 EDT, Brendan Conoboy
no flags Details
Log of a kernel 2.6.41-based kernel booting on a trimslice, operating normally (34.69 KB, text/plain)
2012-03-26 15:11 EDT, Brendan Conoboy
no flags Details
boot log for trimslice kernel-3.3 built using gcc-4.6 (2.42 MB, text/x-log)
2012-03-29 13:55 EDT, D. Marlin
no flags Details
Log output with systemd debugging enabled (521.73 KB, text/plain)
2012-04-17 17:25 EDT, William Cohen
no flags Details

  None (edit)
Description Brendan Conoboy 2012-03-26 15:10:29 EDT
Created attachment 572849 [details]
Log of a kernel 3.3.0-based kernel booting on a trimslice, going badly.

Description of problem:

When booting kernel-tegra-3.3.0-5 on a Trimslice the system fails to come up fully.  Warnings start up at the 0.0 timestamp between NR_IRQS and sched_clock messages.  Segmentation faults begin when starting the network.  Shortly thereafter systemd invokes the oom-killer and backtraces start flying.

Version-Release number of selected component (if applicable):

Present in 3.3.0-5.  Was also a problem in earlier 3.3.0-rc4 kernel.

How reproducible:

Every time.

Steps to Reproduce:
1. Install kernel-tegra-3-3-*
2. Boot
3. Watch the fireworks

Additional info:

I'm attaching 2 logs, the first is the bad 3.3.0 kernel boot log.  The second is a successful boot of the same system using a "2.6.41" kernel from F15.
Comment 1 Brendan Conoboy 2012-03-26 15:11:54 EDT
Created attachment 572851 [details]
Log of a kernel 2.6.41-based kernel booting on a trimslice, operating normally
Comment 2 D. Marlin 2012-03-28 20:04:10 EDT
I tried booting a Panda Board (ARM OMAP4) using the 3.3.0-0.rc4.git3.1.fc17.armv7hl.omap kernel with an F17 armv7hl rootfs and I see the same type of errors that as I saw on Trim Slice (Tegra2), i.e.,

[   39.046722] BUG: sleeping function called from invalid context at include/linux/freezer.h:46
[   39.051757] in_atomic(): 0, irqs_disabled(): 128, pid: 1417, name: systemd
[   39.051757] INFO: lockdep is turned off.
[   39.051757] irq event stamp: 0
[   39.051757] hardirqs last  enabled at (0): [<  (null)>]   (null)
[   39.072662] hardirqs last disabled at (0): [<c003b7f0>] copy_process.part.35+0x3dc/0x117c
[   39.085052] softirqs last  enabled at (0): [<c003b7f0>] copy_process.part.35+0x3dc/0x117c
[   39.085449] softirqs last disabled at (0): [<  (null)>]   (null)
[   39.100006] [<c0015b24>] (unwind_backtrace+0x0/0xf8) from [<c0010bac>] (do_signal+0x40/0x550)
[   39.103851] [<c0010bac>] (do_signal+0x40/0x550) from [<c0011654>] (do_notify_resume+0x54/0x60)
[   39.111694] [<c0011654>] (do_notify_resume+0x54/0x60) from [<c000dec4>] (work_pending+0x24/0x28)
        :
[ 1108.801422] Out of memory: Kill process 368 (systemd) score 1 or sacrifice child
[ 1108.810119] Killed process 375 (systemd) total-vm:5408kB, anon-rss:1320kB, file-rss:68kB


so this issue does not seem to be isolated to Tegra-based systems.
Comment 3 D. Marlin 2012-03-29 13:55:09 EDT
Created attachment 573756 [details]
boot log for trimslice kernel-3.3 built using gcc-4.6

This is the boot log from a Trim Slice using an upstream 3.3 kernel and Fedora Tegra kernel config, built with gcc version 4.6.1 20110908 (Red Hat 4.6.1-9).
Comment 4 Guy Streeter 2012-04-12 16:06:12 EDT
The systemd fork-bomb is no longer happening on my dreamplug. The only difference I know of is I have done a yum update recently. I'm still running
fedora-arm 3.3.0-0.rc4.git3.1.fc17.armv5tel.kirkwood

I am still getting "sleeping function called from invalid context" traces pretty much constantly.

I have to boot with libertas.disable=1 or udev hangs. I'm investigating that.
Comment 5 Guy Streeter 2012-04-12 16:32:17 EDT
I spoke too soon. I just rebooted and the systemd fork-bomb is back. I don't know why it worked once.
Comment 6 William Cohen 2012-04-17 17:25:22 EDT
Created attachment 578158 [details]
Log output with systemd debugging enabled

According to http://fedoraproject.org/wiki/How_to_debug_Systemd_problems there are kernel boot parameters that can be set to find out what systemd is doing.  The attached log was generated with "systemd.log_level=debug systemd.log_target=console" on the command line of a linux-3.4.0-rc4 kernel that exhibits the problem.

Looking through the pid of the "Forked ... as <pid>" one sees that the pid numbers increasing very quickly after NetworkManager starts:

Forked /sbin/auditd as 308
Forked /sbin/auditctl as 309
Forked /usr/sbin/rootfs-resize as 312
Forked /usr/sbin/NetworkManager as 318
Forked /usr/lib/systemd/systemd-logind as 637
Forked /usr/lib/systemd/systemd-user-sessions as 780
Forked /bin/plymouth as 951
Forked /bin/plymouth as 1131

For a non-problem kernel (2.6.42.12-1.fc15.armv7hl.tegra) see:

Forked /sbin/auditd as 344
Forked /sbin/auditctl as 345
Forked /usr/sbin/rootfs-resize as 348
Forked /usr/sbin/NetworkManager as 354
Forked /usr/lib/systemd/systemd-logind as 356
Forked /usr/lib/systemd/systemd-user-sessions as 357
Forked /bin/plymouth as 358
Forked /bin/plymouth as 360

Noticed in attachment "572849: Log of a kernel 3.3.0-based kernel booting on a trimslice, going badly." there are a number of segmentation faults related to networking like:

network[301]: /etc/sysconfig/network-scripts/init.ipv6-global: line 67:   461 Segmentation fault      /sbin/sysctl -e -w net.ipv6.conf.$sinter

Is there some way to turn off NetworkManager and see if the problem kernel boots up then?
Comment 7 Peter Robinson 2012-04-17 17:36:46 EDT
you can turn off NM and enable the old "network" service. Also make sure you have openssl 1.0.0 and not 1.0.1 as the later has issues
Comment 8 Brendan Conoboy 2012-04-17 20:18:00 EDT
Disabling both NM and network does not stop the OOM storm.
Comment 9 Guy Streeter 2012-04-18 16:08:15 EDT
Booting with a read-only root fs seems somehow to avoid the systemd fork-bomb. I don't know if that's a useful datapoint, but there it is. I can get up to multi-user that way, and then remount the rootfs as rw.
Comment 10 Brendan Conoboy 2012-04-18 16:26:36 EDT
Tried booting trimslice ro, no help.  Perhaps this is only effective on armv5tel or only on kirkwood kernels.
Comment 11 William Cohen 2012-04-20 09:40:22 EDT
On #fedora-arm jonmasters mentioned turning off auditd allows a 3.3/3.4 kernel to boot. Verified able to boot Linux 3.4.0-rc3+ after turning off audit with:

# systemctl disable auditd.service
Comment 12 Jon Disnard 2012-04-20 17:09:31 EDT
As a work around you might append "audit=0" to the kernel cmdline.
Comment 13 Peter Robinson 2012-06-28 08:16:36 EDT
There was a patch sent upstream which stopped the crash when audit was enabled but there was a still issues with audit on ARM, not sure the status of it being properly fixed. 3.4.x doesn't crash and boots fine on both PandaBoard and Trimslice

Note You need to log in before you can comment on or make changes to this bug.