734519 – possible circular locking dependency detected in restore_regulatory_settings

Bug 734519 - possible circular locking dependency detected in restore_regulatory_settings

Summary: possible circular locking dependency detected in restore_regulatory_settings

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	16
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	John W. Linville
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-08-30 16:44 UTC by Mikko Tiihonen
Modified:	2012-09-07 16:07 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2012-09-07 16:07:55 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Mikko Tiihonen 2011-08-30 16:44:37 UTC

Description of problem:
=======================================================
[ INFO: possible circular locking dependency detected ]
3.1.0-0.rc4.git0.0.fc16.x86_64 #1
-------------------------------------------------------
kworker/5:2/511 is trying to acquire lock:
 (cfg80211_mutex){+.+.+.}, at: [<ffffffffa02ef603>] restore_regulatory_settings+0x2f/0x2e6 [cfg80211]

but task is already holding lock:
 ((reg_timeout).work){+.+...}, at: [<ffffffff81075b91>] process_one_work+0x14d/0x3e7

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 ((reg_timeout).work){+.+...}:
       [<ffffffff8108f143>] lock_acquire+0xf3/0x13e
       [<ffffffff81074f79>] wait_on_work+0x55/0xc7
       [<ffffffff810760b3>] __cancel_work_timer+0xcc/0x10a
       [<ffffffff81076103>] cancel_delayed_work_sync+0x12/0x14
       [<ffffffffa02eecb8>] reg_set_request_processed+0x4e/0x68 [cfg80211]
       [<ffffffffa02efeef>] set_regdom+0x43c/0x4c0 [cfg80211]
       [<ffffffffa02f9159>] nl80211_set_reg+0x1ce/0x22a [cfg80211]
       [<ffffffff8143af6e>] genl_rcv_msg+0x1db/0x206
       [<ffffffff8143a997>] netlink_rcv_skb+0x43/0x8f
       [<ffffffff8143ad8c>] genl_rcv+0x26/0x2d
       [<ffffffff8143a493>] netlink_unicast+0xec/0x156
       [<ffffffff8143a781>] netlink_sendmsg+0x284/0x2c5
       [<ffffffff81402f0f>] sock_sendmsg+0xe6/0x109
       [<ffffffff81404cc3>] __sys_sendmsg+0x226/0x2cf
       [<ffffffff81405efb>] sys_sendmsg+0x42/0x60
       [<ffffffff8150b742>] system_call_fastpath+0x16/0x1b

-> #1 (reg_mutex){+.+.+.}:
       [<ffffffff8108f143>] lock_acquire+0xf3/0x13e
       [<ffffffff81502cb3>] __mutex_lock_common+0x5d/0x39a
       [<ffffffff815030ff>] mutex_lock_nested+0x40/0x45
       [<ffffffffa02ef10d>] reg_todo+0x32/0x4a4 [cfg80211]
       [<ffffffff81075c49>] process_one_work+0x205/0x3e7
       [<ffffffff810768f7>] worker_thread+0xda/0x15d
       [<ffffffff8107a2bd>] kthread+0xa8/0xb0
       [<ffffffff8150d944>] kernel_thread_helper+0x4/0x10

-> #0 (cfg80211_mutex){+.+.+.}:
       [<ffffffff8108e963>] __lock_acquire+0xa2f/0xd0c
       [<ffffffff8108f143>] lock_acquire+0xf3/0x13e
       [<ffffffff81502cb3>] __mutex_lock_common+0x5d/0x39a
       [<ffffffff815030ff>] mutex_lock_nested+0x40/0x45
       [<ffffffffa02ef603>] restore_regulatory_settings+0x2f/0x2e6 [cfg80211]
       [<ffffffffa02ef8cd>] reg_timeout_work+0x13/0x15 [cfg80211]
       [<ffffffff81075c49>] process_one_work+0x205/0x3e7
       [<ffffffff810768f7>] worker_thread+0xda/0x15d
       [<ffffffff8107a2bd>] kthread+0xa8/0xb0
       [<ffffffff8150d944>] kernel_thread_helper+0x4/0x10

other info that might help us debug this:

Chain exists of:
  cfg80211_mutex --> reg_mutex --> (reg_timeout).work

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock((reg_timeout).work);
                               lock(reg_mutex);
                               lock((reg_timeout).work);
  lock(cfg80211_mutex);

 *** DEADLOCK ***

2 locks held by kworker/5:2/511:
 #0:  (events){.+.+.+}, at: [<ffffffff81075b91>] process_one_work+0x14d/0x3e7
 #1:  ((reg_timeout).work){+.+...}, at: [<ffffffff81075b91>] process_one_work+0x14d/0x3e7

stack backtrace:
Pid: 511, comm: kworker/5:2 Tainted: G        W   3.1.0-0.rc4.git0.0.fc16.x86_64 #1
Call Trace:
 [<ffffffff814fa254>] print_circular_bug+0x1f8/0x209
 [<ffffffff8108e963>] __lock_acquire+0xa2f/0xd0c
 [<ffffffff8108dd41>] ? mark_lock+0x2d/0x220
 [<ffffffff8108e439>] ? __lock_acquire+0x505/0xd0c
 [<ffffffffa02ef603>] ? restore_regulatory_settings+0x2f/0x2e6 [cfg80211]
 [<ffffffff8108f143>] lock_acquire+0xf3/0x13e
 [<ffffffffa02ef603>] ? restore_regulatory_settings+0x2f/0x2e6 [cfg80211]
 [<ffffffffa02ef603>] ? restore_regulatory_settings+0x2f/0x2e6 [cfg80211]
 [<ffffffffa02ef8ba>] ? restore_regulatory_settings+0x2e6/0x2e6 [cfg80211]
 [<ffffffff81502cb3>] __mutex_lock_common+0x5d/0x39a
 [<ffffffffa02ef603>] ? restore_regulatory_settings+0x2f/0x2e6 [cfg80211]
 [<ffffffff810804e6>] ? local_clock+0x36/0x4d
 [<ffffffff81014ded>] ? paravirt_read_tsc+0x9/0xd
 [<ffffffff810152b7>] ? native_sched_clock+0x34/0x36
 [<ffffffff8108b9b5>] ? trace_hardirqs_off+0xd/0xf
 [<ffffffffa02ef8ba>] ? restore_regulatory_settings+0x2e6/0x2e6 [cfg80211]
 [<ffffffff815030ff>] mutex_lock_nested+0x40/0x45
 [<ffffffffa02ef603>] restore_regulatory_settings+0x2f/0x2e6 [cfg80211]
 [<ffffffffa02ef8cd>] reg_timeout_work+0x13/0x15 [cfg80211]
 [<ffffffff81075c49>] process_one_work+0x205/0x3e7
 [<ffffffff81075b91>] ? process_one_work+0x14d/0x3e7
 [<ffffffff8108d01b>] ? lock_acquired+0x210/0x243
 [<ffffffff810768f7>] worker_thread+0xda/0x15d
 [<ffffffff8107681d>] ? manage_workers+0x176/0x176
 [<ffffffff8107a2bd>] kthread+0xa8/0xb0
 [<ffffffff8150d944>] kernel_thread_helper+0x4/0x10
 [<ffffffff81504db4>] ? retint_restore_args+0x13/0x13
 [<ffffffff8107a215>] ? __init_kthread_worker+0x5a/0x5a
 [<ffffffff8150d940>] ? gs_change+0x13/0x13


Version-Release number of selected component (if applicable):
kernel 3.1.0-0.rc4.git0.0.fc16.x86_64

How reproducible:
I had flaky wlan that caused the connection to switch between world regulatory domain and the local regulatory domain. So far I have only found this bug once from the system logs.

Comment 1 Dave Jones 2012-03-22 17:14:33 UTC

[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 2 Dave Jones 2012-03-22 17:16:48 UTC

[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 3 Dave Jones 2012-03-22 17:25:32 UTC

[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 4 Ben Greear 2012-05-18 22:48:56 UTC

I hit this same thing in a somewhat-hacked 3.3.6+ kernel, so it seems
that it still exists in the latest stable code.

Comment 5 Josh Boyer 2012-09-07 16:07:55 UTC

This appears to have been fixed with commit fe20b39ec32e975f1054c0b7866c873a954adf05 in 3.5.  That was backported to 3.4.5 and should thus be fixed in F16.

Note You need to log in before you can comment on or make changes to this bug.