Bug 201726 - slab allocator lock recursion.
Summary: slab allocator lock recursion.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 6
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
Assignee: Peter Zijlstra
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: FCMETA_LOCKDEP
TreeView+ depends on / blocked
 
Reported: 2006-08-08 15:27 UTC by Lonni J Friedman
Modified: 2014-08-11 05:40 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-11-12 05:45:18 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
dmesg which includes the copious backtraces (104.23 KB, text/plain)
2006-08-08 15:27 UTC, Lonni J Friedman
no flags Details
sysreport (373.22 KB, application/x-bzip)
2006-08-08 15:29 UTC, Lonni J Friedman
no flags Details
Serial output showing similar problem. (44.92 KB, text/plain)
2006-08-08 16:23 UTC, Konrad Rzeszutek
no flags Details

Description Lonni J Friedman 2006-08-08 15:27:16 UTC
Description of problem:


Version-Release number of selected component (if applicable):
2.6.17-1.2517.fc6 and 2.6.17-1.2530.fc6

How reproducible:


Steps to Reproduce:
1. Boot, and the fireworks begin early on at "checking if image is initramfs..."
2. Even after booting, there's a ton more spewage, seemingly at random when
doing things as simple as ssh/scp, and also when running the attached sysreport.

  
Actual results:

#######
checking if image is initramfs...
=============================================
[ INFO: possible recursive locking detected ]
---------------------------------------------
swapper/1 is trying to acquire lock:
 (&nc->lock){....}, at: [<ffffffff8020782c>] kmem_cache_free+0x1a1/0x26c

but task is already holding lock:
 (&nc->lock){....}, at: [<ffffffff8020b47e>] kfree+0x1b3/0x27e

other info that might help us debug this:
2 locks held by swapper/1:
 #0:  (&nc->lock){....}, at: [<ffffffff8020b47e>] kfree+0x1b3/0x27e
 #1:  (&parent->list_lock){....}, at: [<ffffffff802dae22>]
__drain_alien_cache+0x37/0
x77

stack backtrace:

Call Trace:
 [<ffffffff8026e77d>] show_trace+0xae/0x30e
 [<ffffffff8026e9f2>] dump_stack+0x15/0x17
 [<ffffffff802a7f23>] __lock_acquire+0x135/0xa54
 [<ffffffff802a8de3>] lock_acquire+0x4b/0x69
 [<ffffffff8026774b>] _spin_lock+0x25/0x31
 [<ffffffff8020782c>] kmem_cache_free+0x1a1/0x26c
 [<ffffffff802da9a8>] slab_destroy+0x12b/0x138
 [<ffffffff802dab5f>] free_block+0x1aa/0x1ee
 [<ffffffff802dae48>] __drain_alien_cache+0x5d/0x77
 [<ffffffff8020b49b>] kfree+0x1d0/0x27e
 [<ffffffff80968444>] free+0x9/0xb
 [<ffffffff80968461>] huft_free+0x1b/0x27
 [<ffffffff8096961c>] inflate_dynamic+0x4f0/0x525
 [<ffffffff80969b18>] unpack_to_rootfs+0x4c7/0x930
 [<ffffffff80969fe6>] populate_rootfs+0x65/0xe7
 [<ffffffff8026d710>] init+0x190/0x3cd
 [<ffffffff8026135e>] child_rip+0x8/0x12
DWARF2 unwinder stuck at child_rip+0x8/0x12
Leftover inexact backtrace:
 [<ffffffff80267a32>] _spin_unlock_irq+0x2b/0x31
 [<ffffffff8026099c>] restore_args+0x0/0x30
 [<ffffffff8036c696>] acpi_os_acquire_lock+0x9/0xb
 [<ffffffff8026d580>] init+0x0/0x3cd
 [<ffffffff80261356>] child_rip+0x0/0x12

 it is
##############

############
                                                       sibling
  task                 PC          pid father child younger older
init          S ffff8101442eda08     0     1      0     2               (NOTLB)
 ffff8101442eda08 ffff8101442ed988 ffffffff806c0d80 0000000000000008
 ffff81013fc6a040 ffffffff80567e80 000000a047ea2983 0000000000015fe9
 ffff81013fc6a228 ffff810100000000 ffffffff806c0d80 ffff8101442eda08
Call Trace:
 [<ffffffff80265886>] schedule_timeout+0x8c/0xb3
 [<ffffffff8021204c>] do_select+0x470/0x4de
 [<ffffffff802e705b>] core_sys_select+0x1b7/0x266
 [<ffffffff80217106>] sys_select+0x147/0x172
 [<ffffffff8026040e>] system_call+0x7e/0x83
DWARF2 unwinder stuck at system_call+0x7e/0x83
Leftover inexact backtrace:

migration/0   S ffff81013fc77e98     0     2      1             3       (L-TLB)
 ffff81013fc77e98 0000000100000296 0000000000000002 0000000000000001
 ffff8101442ea040 ffffffff80567e80 000000a0598b50f5 000000000000119a
 ffff8101442ea228 ffff810100000000 0000000000000046 ffff81013fc77e78
Call Trace:
 [<ffffffff80246a22>] migration_thread+0x1a2/0x23f
 [<ffffffff802353fe>] kthread+0x100/0x136
 [<ffffffff8026135e>] child_rip+0x8/0x12
DWARF2 unwinder stuck at child_rip+0x8/0x12
Leftover inexact backtrace:
 [<ffffffff80267a32>] _spin_unlock_irq+0x2b/0x31
 [<ffffffff8026099c>] restore_args+0x0/0x30
 [<ffffffff802352fe>] kthread+0x0/0x136
 [<ffffffff80261356>] child_rip+0x0/0x12
#########

Expected results:
No spewage

Additional info:
I'm seeing all of this behavior on an HP xw9300 workstation

Comment 1 Lonni J Friedman 2006-08-08 15:27:17 UTC
Created attachment 133794 [details]
dmesg which includes the copious backtraces

Comment 2 Lonni J Friedman 2006-08-08 15:29:11 UTC
Created attachment 133795 [details]
sysreport

attaching sysreport generated after the spewage

Comment 3 Konrad Rzeszutek 2006-08-08 16:23:23 UTC
Created attachment 133803 [details]
Serial output showing similar problem.

I get stuff like this:
Loading scsi_mod.ko module
BUG: warning at kernel/lockdep.c:565/print_infinite_recursion_bug() (Not
tainted)

Call Trace:
 [<ffffffff8026e67d>] show_trace+0xae/0x319
 [<ffffffff8026e8fd>] dump_stack+0x15/0x17
 [<ffffffff802a71da>] print_infinite_recursion_bug+0x45/0x49
 [<ffffffff802a7cad>] check_noncircular+0x30/0x9c
 [<ffffffff802a7cf2>] check_noncircular+0x75/0x9c
 [<ffffffff802a7cf2>] check_noncircular+0x75/0x9c
 [<ffffffff802a7cf2>] check_noncircular+0x75/0x9c
 [<ffffffff802a7cf2>] check_noncircular+0x75/0x9c
 [<ffffffff802a7cf2>] check_noncircular+0x75/0x9c
 [<ffffffff802a7cf2>] check_noncircular+0x75/0x9c
 [<ffffffff802a7cf2>] check_noncircular+0x75/0x9c
 [<ffffffff802a7cf2>] check_noncircular+0x75/0x9c
 [<ffffffff802a7cf2>] check_noncircular+0x75/0x9c
 [<ffffffff802a7cf2>] check_noncircular+0x75/0x9c
 [<ffffffff802a7cf2>] check_noncircular+0x75/0x9c
 [<ffffffff802a7cf2>] check_noncircular+0x75/0x9c
 [<ffffffff802a7cf2>] check_noncircular+0x75/0x9c
 [<ffffffff802a7cf2>] check_noncircular+0x75/0x9c
 [<ffffffff802a7cf2>] check_noncircular+0x75/0x9c
 [<ffffffff802a7cf2>] check_noncircular+0x75/0x9c
 [<ffffffff802a7cf2>] check_noncircular+0x75/0x9c
 [<ffffffff802a7cf2>] check_noncircular+0x75/0x9c
 [<ffffffff802a7cf2>] check_noncircular+0x75/0x9c
 [<ffffffff802a7cf2>] check_noncircular+0x75/0x9c
 [<ffffffff802a8558>] __lock_acquire+0x83f/0xa5f
 [<ffffffff802a8d1b>] lock_acquire+0x4b/0x69
 [<ffffffff80266009>] __mutex_lock_slowpath+0xe5/0x261
 [<ffffffff802661af>] mutex_lock+0x2a/0x2e
 [<ffffffff802ac2a4>] lock_cpu_hotplug+0x7a/0x85
 [<ffffffff802bb1ed>] stop_machine_run+0x1a/0x4a
 [<ffffffff802af2d3>] sys_init_module+0x16d6/0x18cc
 [<ffffffff8026030e>] system_call+0x7e/0x83
DWARF2 unwinder stuck at system_call+0x7e/0x83
Leftover inexact backtrace:


-> #20 (&rq->rq_lock_key#16){++..}:
       [<ffffffff802a8d1a>] lock_acquire+0x4a/0x69
       [<ffffffff8026764a>] _spin_lock+0x24/0x31
       [<ffffffff8028d34e>] double_rq_lock+0x2d/0x33
       [<ffffffff8028d61d>] __migrate_task+0x63/0xea
       [<ffffffff802469c8>] migration_thread+0x1e0/0x23f
       [<ffffffff802352e9>] kthread+0xff/0x136
       [<ffffffff8026125d>] child_rip+0x7/0x12

-> #19 (&rq->rq_lock_key#15){++..}:
       [<ffffffff802a8d1a>] lock_acquire+0x4a/0x69
       [<ffffffff8026764a>] _spin_lock+0x24/0x31
       [<ffffffff8028d34e>] double_rq_lock+0x2d/0x33
       [<ffffffff8028d61d>] __migrate_task+0x63/0xea
       [<ffffffff802469c8>] migration_thread+0x1e0/0x23f
       [<ffffffff802352e9>] kthread+0xff/0x136
       [<ffffffff8026125d>] child_rip+0x7/0x12

-> #18 (&rq->rq_lock_key#14){++..}:
       [<ffffffff802a8d1a>] lock_acquire+0x4a/0x69
       [<ffffffff8026764a>] _spin_lock+0x24/0x31
       [<ffffffff8028d34e>] double_rq_lock+0x2d/0x33
       [<ffffffff8028d61d>] __migrate_task+0x63/0xea
       [<ffffffff802469c8>] migration_thread+0x1e0/0x23f
       [<ffffffff802352e9>] kthread+0xff/0x136
       [<ffffffff8026125d>] child_rip+0x7/0x12

-> #17 (&rq->rq_lock_key#13){++..}:
       [<ffffffff802a8d1a>] lock_acquire+0x4a/0x69
       [<ffffffff8026764a>] _spin_lock+0x24/0x31
       [<ffffffff8028d34e>] double_rq_lock+0x2d/0x33
       [<ffffffff8028d61d>] __migrate_task+0x63/0xea
       [<ffffffff802469c8>] migration_thread+0x1e0/0x23f
       [<ffffffff802352e9>] kthread+0xff/0x136
       [<ffffffff8026125d>] child_rip+0x7/0x12

-> #16 (&rq->rq_lock_key#12){++..}:
       [<ffffffff802a8d1a>] lock_acquire+0x4a/0x69
       [<ffffffff8026764a>] _spin_lock+0x24/0x31
       [<ffffffff8028d34e>] double_rq_lock+0x2d/0x33
       [<ffffffff8028d61d>] __migrate_task+0x63/0xea
       [<ffffffff802469c8>] migration_thread+0x1e0/0x23f
       [<ffffffff802352e9>] kthread+0xff/0x136
       [<ffffffff8026125d>] child_rip+0x7/0x12

-> #15 (&rq->rq_lock_key#11){++..}:
       [<ffffffff802a8d1a>] lock_acquire+0x4a/0x69
       [<ffffffff8026764a>] _spin_lock+0x24/0x31
       [<ffffffff8028d34e>] double_rq_lock+0x2d/0x33
       [<ffffffff8028d61d>] __migrate_task+0x63/0xea
       [<ffffffff802469c8>] migration_thread+0x1e0/0x23f
       [<ffffffff802352e9>] kthread+0xff/0x136
       [<ffffffff8026125d>] child_rip+0x7/0x12

-> #14 (&rq->rq_lock_key#10){++..}:
       [<ffffffff802a8d1a>] lock_acquire+0x4a/0x69
       [<ffffffff8026764a>] _spin_lock+0x24/0x31
       [<ffffffff8028d34e>] double_rq_lock+0x2d/0x33
       [<ffffffff8028d61d>] __migrate_task+0x63/0xea
       [<ffffffff802469c8>] migration_thread+0x1e0/0x23f
       [<ffffffff802352e9>] kthread+0xff/0x136
       [<ffffffff8026125d>] child_rip+0x7/0x12

-> #13 (&rq->rq_lock_key#9){++..}:
       [<ffffffff802a8d1a>] lock_acquire+0x4a/0x69
       [<ffffffff8026764a>] _spin_lock+0x24/0x31
       [<ffffffff8028d34e>] double_rq_lock+0x2d/0x33
       [<ffffffff8028d61d>] __migrate_task+0x63/0xea
       [<ffffffff802469c8>] migration_thread+0x1e0/0x23f
       [<ffffffff802352e9>] kthread+0xff/0x136
       [<ffffffff8026125d>] child_rip+0x7/0x12

-> #12 (&rq->rq_lock_key#8){++..}:
       [<ffffffff802a8d1a>] lock_acquire+0x4a/0x69
       [<ffffffff8026764a>] _spin_lock+0x24/0x31
       [<ffffffff8028d34e>] double_rq_lock+0x2d/0x33
       [<ffffffff8028d61d>] __migrate_task+0x63/0xea
       [<ffffffff802469c8>] migration_thread+0x1e0/0x23f
       [<ffffffff802352e9>] kthread+0xff/0x136
       [<ffffffff8026125d>] child_rip+0x7/0x12

-> #11 (&rq->rq_lock_key#7){++..}:
       [<ffffffff802a8d1a>] lock_acquire+0x4a/0x69
       [<ffffffff8026764a>] _spin_lock+0x24/0x31
       [<ffffffff8028d34e>] double_rq_lock+0x2d/0x33
       [<ffffffff8028d61d>] __migrate_task+0x63/0xea
       [<ffffffff802469c8>] migration_thread+0x1e0/0x23f
       [<ffffffff802352e9>] kthread+0xff/0x136
       [<ffffffff8026125d>] child_rip+0x7/0x12

-> #10 (&rq->rq_lock_key#6){++..}:
       [<ffffffff802a8d1a>] lock_acquire+0x4a/0x69
       [<ffffffff8026764a>] _spin_lock+0x24/0x31
       [<ffffffff8028d34e>] double_rq_lock+0x2d/0x33
       [<ffffffff8028d61d>] __migrate_task+0x63/0xea
       [<ffffffff802469c8>] migration_thread+0x1e0/0x23f
       [<ffffffff802352e9>] kthread+0xff/0x136
       [<ffffffff8026125d>] child_rip+0x7/0x12

-> #9 (&rq->rq_lock_key#5){++..}:
       [<ffffffff802a8d1a>] lock_acquire+0x4a/0x69
       [<ffffffff8026764a>] _spin_lock+0x24/0x31
       [<ffffffff8028d34e>] double_rq_lock+0x2d/0x33
       [<ffffffff8028d61d>] __migrate_task+0x63/0xea
       [<ffffffff802469c8>] migration_thread+0x1e0/0x23f
       [<ffffffff802352e9>] kthread+0xff/0x136
       [<ffffffff8026125d>] child_rip+0x7/0x12

-> #8 (&rq->rq_lock_key#4){++..}:
       [<ffffffff802a8d1a>] lock_acquire+0x4a/0x69
       [<ffffffff8026764a>] _spin_lock+0x24/0x31
       [<ffffffff8028d34e>] double_rq_lock+0x2d/0x33
       [<ffffffff8028d61d>] __migrate_task+0x63/0xea
       [<ffffffff802469c8>] migration_thread+0x1e0/0x23f
       [<ffffffff802352e9>] kthread+0xff/0x136
       [<ffffffff8026125d>] child_rip+0x7/0x12

-> #7 (&rq->rq_lock_key#3){++..}:
       [<ffffffff802a8d1a>] lock_acquire+0x4a/0x69
       [<ffffffff8026764a>] _spin_lock+0x24/0x31
       [<ffffffff8028d34e>] double_rq_lock+0x2d/0x33
       [<ffffffff8028d61d>] __migrate_task+0x63/0xea
       [<ffffffff802469c8>] migration_thread+0x1e0/0x23f
       [<ffffffff802352e9>] kthread+0xff/0x136
       [<ffffffff8026125d>] child_rip+0x7/0x12

-> #6 (&rq->rq_lock_key#2){++..}:
       [<ffffffff802a8d1a>] lock_acquire+0x4a/0x69
       [<ffffffff8026764a>] _spin_lock+0x24/0x31
       [<ffffffff8028d34e>] double_rq_lock+0x2d/0x33
       [<ffffffff8028d61d>] __migrate_task+0x63/0xea
       [<ffffffff802469c8>] migration_thread+0x1e0/0x23f
       [<ffffffff802352e9>] kthread+0xff/0x136
       [<ffffffff8026125d>] child_rip+0x7/0x12

-> #5 (&rq->rq_lock_key){++..}:
       [<ffffffff802a8d1a>] lock_acquire+0x4a/0x69
       [<ffffffff8026764a>] _spin_lock+0x24/0x31
       [<ffffffff8028dcc8>] task_rq_lock+0x41/0x74
       [<ffffffff8024887f>] try_to_wake_up+0x26/0x418
       [<ffffffff8028decc>] default_wake_function+0xc/0xf
       [<ffffffff8028c1cc>] __wake_up_common+0x3d/0x68
       [<ffffffff8028d30c>] complete+0x37/0x4c
       [<ffffffff802352c4>] kthread+0xda/0x136
       [<ffffffff8026125d>] child_rip+0x7/0x12

-> #4 (&q->lock){++..}:
       [<ffffffff802a8d1a>] lock_acquire+0x4a/0x69
       [<ffffffff80267a22>] _spin_lock_irqsave+0x2b/0x3c
       [<ffffffff8023a752>] __wake_up_sync+0x1d/0x4e
       [<ffffffff8029c27e>] do_notify_parent+0x197/0x1b5
       [<ffffffff80216481>] do_exit+0x809/0x919
       [<ffffffff8029feea>] ____call_usermodehelper+0x5e/0x5f
       [<ffffffff8026125d>] child_rip+0x7/0x12

-> #3 (&sighand->siglock){....}:
       [<ffffffff802a8d1a>] lock_acquire+0x4a/0x69
       [<ffffffff8026764a>] _spin_lock+0x24/0x31
       [<ffffffff8022e1d9>] flush_old_exec+0x49c/0xb14
       [<ffffffff80219053>] load_elf_binary+0x47d/0x18ee
       [<ffffffff80242803>] search_binary_handler+0xcb/0x2c7
       [<ffffffff80250a4a>] load_script+0x1ae/0x1c4
       [<ffffffff80242803>] search_binary_handler+0xcb/0x2c7
       [<ffffffff80241cc3>] do_execve+0x1a3/0x261
       [<ffffffff802575ea>] sys_execve+0x35/0x4d
       [<ffffffff802612cb>] execve+0x63/0xc8
       [<ffffffff8026d7d3>] init+0x353/0x3cd
       [<ffffffff8026125d>] child_rip+0x7/0x12

-> #2 (init_sighand.siglock){....}:
       [<ffffffff802a8d1a>] lock_acquire+0x4a/0x69
       [<ffffffff8026764a>] _spin_lock+0x24/0x31
       [<ffffffff80220e98>] copy_process+0x1196/0x167d
       [<ffffffff8023363c>] do_fork+0x7e/0x17f
       [<ffffffff802611f8>] kernel_thread+0x80/0xde
       [<ffffffff8096e8bb>] start_kernel+0x249/0x24c
       [<ffffffff8096e28a>] _sinittext+0x28a/0x292
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #1 (tasklist_lock){..--}:
       [<ffffffff802a8d1a>] lock_acquire+0x4a/0x69
       [<ffffffff8026771e>] _write_lock_irq+0x2a/0x37
       [<ffffffff80220e00>] copy_process+0x10fe/0x167d
       [<ffffffff80291bda>] fork_idle+0x36/0x61
       [<ffffffff80278e08>] do_fork_idle+0x13/0x27
       [<ffffffff8027919f>] __cpu_up+0x298/0x782
       [<ffffffff802abdfd>] _cpu_up+0x80/0xdd
       [<ffffffff802abe84>] cpu_up+0x2a/0x42
       [<ffffffff8026d537>] init+0xb7/0x3cd
       [<ffffffff8026125d>] child_rip+0x7/0x12

-> #0 (cpu_bitmask_lock){--..}:
       [<ffffffff802a8d1a>] lock_acquire+0x4a/0x69
       [<ffffffff80266008>] __mutex_lock_slowpath+0xe4/0x261
       [<ffffffff802661ae>] mutex_lock+0x29/0x2e
       [<ffffffff802ac2a3>] lock_cpu_hotplug+0x79/0x85
       [<ffffffff802bb1ec>] stop_machine_run+0x19/0x4a
       [<ffffffff802af2d2>] sys_init_module+0x16d5/0x18cc
       [<ffffffff8026030d>] system_call+0x7d/0x83

other info that might help us debug this:

1 lock held by insmod/583:
 #0:  (module_mutex){--..}, at: [<ffffffff80265de5>]
mutex_lock_interruptible+0x2a/0x2e

stack backtrace:

Call Trace:
 [<ffffffff8026e67d>] show_trace+0xae/0x319
 [<ffffffff8026e8fd>] dump_stack+0x15/0x17
 [<ffffffff802a6f67>] print_circular_bug_tail+0x6c/0x77
 [<ffffffff802a8561>] __lock_acquire+0x848/0xa5f
 [<ffffffff802a8d1b>] lock_acquire+0x4b/0x69
 [<ffffffff80266009>] __mutex_lock_slowpath+0xe5/0x261
 [<ffffffff802661af>] mutex_lock+0x2a/0x2e
 [<ffffffff802ac2a4>] lock_cpu_hotplug+0x7a/0x85
 [<ffffffff802bb1ed>] stop_machine_run+0x1a/0x4a
 [<ffffffff802af2d3>] sys_init_module+0x16d6/0x18cc
 [<ffffffff8026030e>] system_call+0x7e/0x83
DWARF2 unwinder stuck at system_call+0x7e/0x83
Leftover inexact backtrace:

SCSI subsystem initialized

Comment 4 Peter Zijlstra 2006-09-26 13:12:28 UTC
dup of BZ203098?

Comment 5 Dave Jones 2006-11-12 05:45:18 UTC
should be fixed in 2.6.18-1.2849.fc6 now in updates


Note You need to log in before you can comment on or make changes to this bug.