Bug 1086404 - kernel crash from lockdep resource exhaustion, due to heavy module heap/lock activity
Summary: kernel crash from lockdep resource exhaustion, due to heavy module heap/lock ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-04-10 19:50 UTC by Frank Ch. Eigler
Modified: 2014-08-10 19:33 UTC (History)
10 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2014-08-10 19:33:52 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
a demonstration module (887 bytes, text/plain)
2014-04-10 19:51 UTC, Frank Ch. Eigler
no flags Details

Description Frank Ch. Eigler 2014-04-10 19:50:32 UTC
Noticed that during running git systemtap testsuite on rawhide
(3.15.0-0.rc0.git9.1.fc21.x86_64), the testsuite eventually
crashes the kernel.  This appears to be new with the 3.15 series.


[ 3454.344969] BUG: MAX_LOCKDEP_ENTRIES too low!
[ 3454.345021] turning off the locking correctness validator.
[ 3454.345021] Please attach the output of /proc/lock_stat to the bug report
[ 3454.345021] CPU: 0 PID: 3932 Comm: stapio Tainted: GF          O 3.15.0-0.rc0.git9.1.fc21.x86_64 #1
[ 3454.345021] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 3454.345021]  0000000000000000 00000000fb02c4ae ffff8800406edc78 ffffffff817db690
[ 3454.345021]  ffff8800984f4168 ffff8800406edc88 ffffffff810f7396 ffff8800406edd58
[ 3454.345021]  ffffffff810fbba1 000000000000e1c0 ffff8800984f35c0 0000000100000000
[ 3454.345021] Call Trace:
[ 3454.345021]  [<ffffffff817db690>] dump_stack+0x4d/0x66
[ 3454.345021]  [<ffffffff810f7396>] add_lock_to_list.isra.28.constprop.47+0xa6/0xb0
[ 3454.345021]  [<ffffffff810fbba1>] __lock_acquire+0x1221/0x1ce0
[ 3454.345021]  [<ffffffff810fce52>] lock_acquire+0xa2/0x1d0
[ 3454.345021]  [<ffffffffa060d52a>] ? utrace_attach_task+0x10a/0x3b0 [stap_81791c9756ecbae81269ea39dd22072f_3932]
[ 3454.345021]  [<ffffffff817e477e>] _raw_spin_lock+0x3e/0x80

etc.

My investigation implicated nothing in the systemtap runtime code,
but instead in the kernel's lockdep subsystem.  It looks as though
some combination of locks, being allocated on a heap, nesting with
various standard kernel locks, ends up resulting in a monotonic
increase of lockdep database consumption.  After enough modules 
come & go, the lockdep error occurs, and a system freeze comes
not long afterwards.

Reproduced without systemtap using attached dummy module that
nests some heap-allocated locks with current->mm->mmap_sem.
insmod & rmmod that module in a nice tight loop, while monitoring
/proc/lockdep_stats.  Note rapid increase in "direct dependencies"
(and others).  Note when the max: limits are hit (nr_list_entries
== MAX_LOCKDEP_ENTRIES), the machine goes down.

I'm not sure what to suggest.  Systemtap could arrange to turn
lockdep off entirely, but that'd be sad.  There is no exported
lockdep_free_key_range() function we could invoke from the module
to have lockdep forget about all our heap/data locks.

I'm not sure to what extent other modules could trigger this same
behavior (and I hope users can't arrange to load/unload them
repeatedly, leading to a DoS).

Comment 1 Frank Ch. Eigler 2014-04-10 19:51:19 UTC
Created attachment 885111 [details]
a demonstration module

Comment 2 Josh Stone 2014-04-10 20:41:42 UTC
With kernel-debug-3.13.9-200.fc20.x86_64, I get the same "BUG: MAX_LOCKDEP_ENTRIES too low!" message, but the system keeps on running.

On rawhide, same kernel as Frank, I also get the hang.  It does eventually start detecting its own stall though.

[  918.352679] BUG: MAX_LOCKDEP_ENTRIES too low!
[  918.353012] turning off the locking correctness validator.
[  918.353012] Please attach the output of /proc/lock_stat to the bug report
[  918.353012] CPU: 1 PID: 5692 Comm: insmod Tainted: GF          O 3.15.0-0.rc0.git9.1.fc21.x86_64 #1
[  918.353012] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  918.353012]  0000000000000000 00000000738becea ffff880064fa9b78 ffffffff817db690
[  918.353012]  ffff88006a6adc48 ffff880064fa9b88 ffffffff810f7396 ffff880064fa9c58
[  918.353012]  ffffffff810fbba1 000000000000c500 ffff88006a6ad0a0 0000000100000000
[  918.353012] Call Trace:
[  918.353012]  [<ffffffff817db690>] dump_stack+0x4d/0x66
[  918.353012]  [<ffffffff810f7396>] add_lock_to_list.isra.28.constprop.47+0xa6/0xb0
[  918.353012]  [<ffffffff810fbba1>] __lock_acquire+0x1221/0x1ce0
[  918.353012]  [<ffffffff811302b9>] ? __module_address+0xf9/0x150
[  918.353012]  [<ffffffff810fce52>] lock_acquire+0xa2/0x1d0
[  918.353012]  [<ffffffffa032e096>] ? init_module+0x96/0xf0 [foo]
[  918.353012]  [<ffffffff817e477e>] _raw_spin_lock+0x3e/0x80
[  918.353012]  [<ffffffffa032e096>] ? init_module+0x96/0xf0 [foo]
[  918.353012]  [<ffffffffa032e096>] init_module+0x96/0xf0 [foo]
[  918.353012]  [<ffffffffa032e000>] ? 0xffffffffa032dfff
[  918.353012]  [<ffffffff8100216a>] do_one_initcall+0xfa/0x1b0
[  918.353012]  [<ffffffff81063f43>] ? set_memory_nx+0x43/0x50
[  918.353012]  [<ffffffff811336b6>] load_module+0x1e16/0x2700
[  918.353012]  [<ffffffff8112e660>] ? store_uevent+0x70/0x70
[  918.353012]  [<ffffffff8123b850>] ? kernel_read+0x50/0x80
[  918.353012]  [<ffffffff81134176>] SyS_finit_module+0xa6/0xd0
[  918.353012]  [<ffffffff817ef669>] system_call_fastpath+0x16/0x1b
[  918.388661] hello ffffc90000531000
[  983.394028] INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 1, t=65002 jiffies, g=14322, c=14321, q=250)
[  983.395023] INFO: Stall ended before state dump start
[ 1081.202049] INFO: task kworker/u4:0:6 blocked for more than 120 seconds.
[ 1081.203823]       Tainted: GF          O 3.15.0-0.rc0.git9.1.fc21.x86_64 #1
[ 1081.205271] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1081.207177] kworker/u4:0    D ffff88007bcfd0a0  3912     6      2 0x00000000
[ 1081.208620] Workqueue: dm_bufio_cache work_fn
[ 1081.209651]  ffff88007bcefc80 0000000000000046 ffff88007bcfd0a0 ffff88007bceffd8
[ 1081.211568]  00000000001d5800 00000000001d5800 ffff880066ff1ae0 ffffffff81d2fce0
[ 1081.213544]  ffffffff81d2fce8 0000000000000246 ffff88007bcfd0a0 0000000000000100
[ 1081.216085] Call Trace:
[ 1081.216840]  [<ffffffff817dfed1>] schedule_preempt_disabled+0x31/0x80
[ 1081.218138]  [<ffffffff817e10f4>] mutex_lock_nested+0x194/0x430
[ 1081.219376]  [<ffffffff81620779>] ? work_fn+0x29/0x120
[ 1081.220503]  [<ffffffff81620779>] ? work_fn+0x29/0x120
[ 1081.221627]  [<ffffffff81620779>] work_fn+0x29/0x120
[ 1081.222724]  [<ffffffff810bbec0>] process_one_work+0x220/0x6f0
[ 1081.223892]  [<ffffffff810bbe54>] ? process_one_work+0x1b4/0x6f0
[ 1081.225085]  [<ffffffff810bc4ab>] worker_thread+0x11b/0x3a0
[ 1081.226220]  [<ffffffff810bc390>] ? process_one_work+0x6f0/0x6f0
[ 1081.227426]  [<ffffffff810c487f>] kthread+0xff/0x120
[ 1081.228492]  [<ffffffff810c4780>] ? insert_kthread_work+0x80/0x80
[ 1081.229703]  [<ffffffff817ef5bc>] ret_from_fork+0x7c/0xb0
[ 1081.230841]  [<ffffffff810c4780>] ? insert_kthread_work+0x80/0x80
[ 1081.232049] INFO: lockdep is turned off.
[ 1081.232957] INFO: task rcu_sched:7 blocked for more than 120 seconds.
[ 1081.234162]       Tainted: GF          O 3.15.0-0.rc0.git9.1.fc21.x86_64 #1
[ 1081.235434] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1081.237192] rcu_sched       D ffff88007bcf9ae0  5976     7      2 0x00000000
[ 1081.238512]  ffff88007bd09d00 0000000000000046 ffff88007bcf9ae0 ffff88007bd09fd8
[ 1081.240280]  00000000001d5800 00000000001d5800 ffffffff81c164c0 ffffffff81ca85b8
[ 1081.242069]  ffffffff81ca85c0 0000000000000246 ffff88007bcf9ae0 0000000000000002
[ 1081.243996] Call Trace:
[ 1081.244927]  [<ffffffff817dfed1>] schedule_preempt_disabled+0x31/0x80
[ 1081.246172]  [<ffffffff817e10f4>] mutex_lock_nested+0x194/0x430
[ 1081.247152]  [<ffffffff81117e5b>] ? rcu_gp_kthread+0x13b/0x690
[ 1081.248271]  [<ffffffff81117e5b>] ? rcu_gp_kthread+0x13b/0x690
[ 1081.249361]  [<ffffffff81117e5b>] rcu_gp_kthread+0x13b/0x690
[ 1081.250423]  [<ffffffff810f0620>] ? abort_exclusive_wait+0xb0/0xb0
[ 1081.251545]  [<ffffffff81117d20>] ? __note_gp_changes+0x60/0x60
[ 1081.252639]  [<ffffffff810c487f>] kthread+0xff/0x120
[ 1081.253640]  [<ffffffff810c4780>] ? insert_kthread_work+0x80/0x80
[ 1081.254768]  [<ffffffff817ef5bc>] ret_from_fork+0x7c/0xb0
[ 1081.255771]  [<ffffffff810c4780>] ? insert_kthread_work+0x80/0x80
[ 1081.256855] INFO: lockdep is turned off.

... and many more "blocked for more than 102 seconds" messages...

Comment 3 Josh Stone 2014-04-10 21:10:36 UTC
kernel-debug-3.14.0-1.fc21.x86_64 also keeps going fine after lockdep exhaustion.

Comment 4 Josh Stone 2014-04-10 21:41:37 UTC
kernel-3.15.0-0.rc0.git2.1.fc21.x86_64 stalls.

There are no koji builds between 3.14.0-1 and 3.15.0-0.rc0.git2.1, so I can't easily narrow it down further.  The latter is marked v3.14-751-g683b6c6f82a6, so that's not *too* many raw commits to dig into, I suppose...

The config differences don't look too interesting for the most part, but here are a few that I think might be meaningful:

-# CONFIG_MAXSMP is not set
-CONFIG_NR_CPUS=1024
+CONFIG_MAXSMP=y
+CONFIG_NR_CPUS=8192

-# CONFIG_DEBUG_FORCE_WEAK_PER_CPU is not set
+CONFIG_DEBUG_FORCE_WEAK_PER_CPU=y

-# CONFIG_SLUB_DEBUG_ON is not set
+CONFIG_SLUB_DEBUG_ON=y

Comment 5 Josh Stone 2014-04-10 21:54:31 UTC
I notice there are 4 "lockdep:..." commits in v3.14..683b6c6f82a6.  Oleg, can you comment whether you think any of those may cause this exhaustion issue?

It's fine that we're reaching MAX_LOCKDEP_ENTRIES, although it would be nice to find a way for frequent systemtap insertions to consolidate their entries.  But crashing afterward is a regression.

Comment 6 Oleg Nesterov 2014-04-11 19:48:27 UTC
(In reply to Josh Stone from comment #5)
> I notice there are 4 "lockdep:..." commits in v3.14..683b6c6f82a6.  Oleg,
> can you comment whether you think any of those may cause this exhaustion
> issue?

At first glance, they can't... I'll try to recheck tomorrow, but so far
they "obviously can't". But do not assume I can answer authoritatively,
I look at this code from time to time just to immediately forget what
I have learned ;)

Comment 7 Josh Stone 2014-04-11 23:10:40 UTC
Ok, that was just a wild guess. :)

I ran a git bisect of v3.14..683b6c6f82a6 (3.15.0-0.rc0.git2.1).  My log:

> # bad: [683b6c6f82a60fabf47012581c2cfbf1b037ab95] Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> # good: [455c6fdbd219161bd09b1165f11699d6d73de11c] Linux 3.14
> git bisect start '683b6c6f82a6' 'v3.14'
> # bad: [176ab02d4916f09d5d8cb63372d142df4378cdea] Merge branch 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect bad 176ab02d4916f09d5d8cb63372d142df4378cdea
> # bad: [8c292f11744297dfb3a69f4a0bccbe4a6417b50d] Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect bad 8c292f11744297dfb3a69f4a0bccbe4a6417b50d
> # bad: [b3fd4ea9df2d5c39cd6ce08faf965ed669eb3b56] Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect bad b3fd4ea9df2d5c39cd6ce08faf965ed669eb3b56
> # good: [e086481baf9d0436bdd6e9b739bfa4a83fb89ef5] rcutorture: Add a lock_busted to test the test
> git bisect good e086481baf9d0436bdd6e9b739bfa4a83fb89ef5
> # good: [62c206bd514600d4d73751ade00dca8e488390a3] Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
> git bisect good 62c206bd514600d4d73751ade00dca8e488390a3
> # good: [f207dbe63c61158c234e2e8929a3725a7f6b2b9b] Revert "sched/wait: Suppress Sparse 'variable shadowing' warning"
> git bisect good f207dbe63c61158c234e2e8929a3725a7f6b2b9b
> # good: [47667fa1502e4d759df87e9cc7fbc0f202483361] locking/mutexes: Modify the way optimistic spinners are queued
> git bisect good 47667fa1502e4d759df87e9cc7fbc0f202483361
> # bad: [6f008e72cd111a119b5d8de8c5438d892aae99eb] locking/mutex: Fix debug checks
> git bisect bad 6f008e72cd111a119b5d8de8c5438d892aae99eb
> # good: [fb0527bd5ea99bfeb2dd91e3c1433ecf745d6b99] locking/mutexes: Introduce cancelable MCS lock for adaptive spinning
> git bisect good fb0527bd5ea99bfeb2dd91e3c1433ecf745d6b99
> # good: [34c6bc2c919a55e5ad4e698510a2f35ee13ab900] locking/mutexes: Add extra reschedule point
> git bisect good 34c6bc2c919a55e5ad4e698510a2f35ee13ab900
> # first bad commit: [6f008e72cd111a119b5d8de8c5438d892aae99eb] locking/mutex: Fix debug checks

Commit 6f008e72cd11 is not very big - I'm not sure how it would cause this.  However, I did confirm the bug still exists on current HEAD (79d2d21ee06f), and reverting just 6f008e72cd11 from there does avoid the crash.  The WARNING that it's supposed to solve also comes back though.

Actually, after reverting that, I don't even exceed MAX_LOCKDEP_ENTRIES -- /proc/lockdep_stats doesn't keep climbing anymore when looping over insmod/rmmod of Frank's module.  So I'm not confident that this is the full root cause, but at least it's the immediate trigger.

Peter, thoughts?

Comment 8 Josh Boyer 2014-04-12 03:21:14 UTC
I think this is the same underlying issue as reported here upstream: http://thread.gmane.org/gmane.linux.kernel/1678494 If so, then I believe this is fixed by http://article.gmane.org/gmane.linux.kernel/1681114 It would be great if Peter could confirm.

Comment 9 Josh Boyer 2014-04-17 13:25:23 UTC
commit a227960fe0cafcc229a8d6bb8b454a3a0b33719d
Author: Peter Zijlstra <peterz>
Date:   Thu Apr 10 16:15:59 2014 +0200

    locking/mutex: Fix debug_mutexes
    
    debug_mutex_unlock() would bail when !debug_locks and forgets to
    actually unlock.
    
    Reported-by: "Michael L. Semon" <mlsemon35>
    Reported-by: "Kirill A. Shutemov" <kirill>
    Reported-by: Valdis Kletnieks <Valdis.Kletnieks>
    Fixes: 6f008e72cd11 ("locking/mutex: Fix debug checks")
    Tested-by: Dave Jones <davej>
    Cc: Jason Low <jason.low2>
    Signed-off-by: Peter Zijlstra <peterz>
    Link: http://lkml.kernel.org/r/20140410141559.GE13658@twins.programming.kick
    Signed-off-by: Ingo Molnar <mingo>


Should be in the rawhide build I do later today.

Comment 10 Josh Boyer 2014-04-17 13:44:11 UTC
Build started.  I'll leave this in MODIFIED for now until Frank or Josh confirms the 3.15.0-0.rc1.git3.1 build works.

Comment 11 Josh Stone 2014-04-22 00:00:30 UTC
I just tried on 3.15.0-0.rc1.git4.1.fc21.x86_64.  Looping Frank's module still reaches MAX_LOCKDEP_ENTRIES, but it continues fine afterward.

Comment 12 Frank Ch. Eigler 2014-08-10 19:33:52 UTC
crash not recently seen


Note You need to log in before you can comment on or make changes to this bug.