Bug 734519

Summary:	possible circular locking dependency detected in restore_regulatory_settings
Product:	[Fedora] Fedora	Reporter:	Mikko Tiihonen <mikko.tiihonen>
Component:	kernel	Assignee:	John W. Linville <linville>
Status:	CLOSED ERRATA	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	16	CC:	gansalmon, greearb, itamar, jonathan, kernel-maint, madhu.chinakonda, mcgrof
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2012-09-07 16:07:55 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Mikko Tiihonen 2011-08-30 16:44:37 UTC

Description of problem:
=======================================================
[ INFO: possible circular locking dependency detected ]
3.1.0-0.rc4.git0.0.fc16.x86_64 #1
-------------------------------------------------------
kworker/5:2/511 is trying to acquire lock:
 (cfg80211_mutex){+.+.+.}, at: [<ffffffffa02ef603>] restore_regulatory_settings+0x2f/0x2e6 [cfg80211]

but task is already holding lock:
 ((reg_timeout).work){+.+...}, at: [<ffffffff81075b91>] process_one_work+0x14d/0x3e7

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 ((reg_timeout).work){+.+...}:
       [<ffffffff8108f143>] lock_acquire+0xf3/0x13e
       [<ffffffff81074f79>] wait_on_work+0x55/0xc7
       [<ffffffff810760b3>] __cancel_work_timer+0xcc/0x10a
       [<ffffffff81076103>] cancel_delayed_work_sync+0x12/0x14
       [<ffffffffa02eecb8>] reg_set_request_processed+0x4e/0x68 [cfg80211]
       [<ffffffffa02efeef>] set_regdom+0x43c/0x4c0 [cfg80211]
       [<ffffffffa02f9159>] nl80211_set_reg+0x1ce/0x22a [cfg80211]
       [<ffffffff8143af6e>] genl_rcv_msg+0x1db/0x206
       [<ffffffff8143a997>] netlink_rcv_skb+0x43/0x8f
       [<ffffffff8143ad8c>] genl_rcv+0x26/0x2d
       [<ffffffff8143a493>] netlink_unicast+0xec/0x156
       [<ffffffff8143a781>] netlink_sendmsg+0x284/0x2c5
       [<ffffffff81402f0f>] sock_sendmsg+0xe6/0x109
       [<ffffffff81404cc3>] __sys_sendmsg+0x226/0x2cf
       [<ffffffff81405efb>] sys_sendmsg+0x42/0x60
       [<ffffffff8150b742>] system_call_fastpath+0x16/0x1b

-> #1 (reg_mutex){+.+.+.}:
       [<ffffffff8108f143>] lock_acquire+0xf3/0x13e
       [<ffffffff81502cb3>] __mutex_lock_common+0x5d/0x39a
       [<ffffffff815030ff>] mutex_lock_nested+0x40/0x45
       [<ffffffffa02ef10d>] reg_todo+0x32/0x4a4 [cfg80211]
       [<ffffffff81075c49>] process_one_work+0x205/0x3e7
       [<ffffffff810768f7>] worker_thread+0xda/0x15d
       [<ffffffff8107a2bd>] kthread+0xa8/0xb0
       [<ffffffff8150d944>] kernel_thread_helper+0x4/0x10

-> #0 (cfg80211_mutex){+.+.+.}:
       [<ffffffff8108e963>] __lock_acquire+0xa2f/0xd0c
       [<ffffffff8108f143>] lock_acquire+0xf3/0x13e
       [<ffffffff81502cb3>] __mutex_lock_common+0x5d/0x39a
       [<ffffffff815030ff>] mutex_lock_nested+0x40/0x45
       [<ffffffffa02ef603>] restore_regulatory_settings+0x2f/0x2e6 [cfg80211]
       [<ffffffffa02ef8cd>] reg_timeout_work+0x13/0x15 [cfg80211]
       [<ffffffff81075c49>] process_one_work+0x205/0x3e7
       [<ffffffff810768f7>] worker_thread+0xda/0x15d
       [<ffffffff8107a2bd>] kthread+0xa8/0xb0
       [<ffffffff8150d944>] kernel_thread_helper+0x4/0x10

other info that might help us debug this:

Chain exists of:
  cfg80211_mutex --> reg_mutex --> (reg_timeout).work

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock((reg_timeout).work);
                               lock(reg_mutex);
                               lock((reg_timeout).work);
  lock(cfg80211_mutex);

 *** DEADLOCK ***

2 locks held by kworker/5:2/511:
 #0:  (events){.+.+.+}, at: [<ffffffff81075b91>] process_one_work+0x14d/0x3e7
 #1:  ((reg_timeout).work){+.+...}, at: [<ffffffff81075b91>] process_one_work+0x14d/0x3e7

stack backtrace:
Pid: 511, comm: kworker/5:2 Tainted: G        W   3.1.0-0.rc4.git0.0.fc16.x86_64 #1
Call Trace:
 [<ffffffff814fa254>] print_circular_bug+0x1f8/0x209
 [<ffffffff8108e963>] __lock_acquire+0xa2f/0xd0c
 [<ffffffff8108dd41>] ? mark_lock+0x2d/0x220
 [<ffffffff8108e439>] ? __lock_acquire+0x505/0xd0c
 [<ffffffffa02ef603>] ? restore_regulatory_settings+0x2f/0x2e6 [cfg80211]
 [<ffffffff8108f143>] lock_acquire+0xf3/0x13e
 [<ffffffffa02ef603>] ? restore_regulatory_settings+0x2f/0x2e6 [cfg80211]
 [<ffffffffa02ef603>] ? restore_regulatory_settings+0x2f/0x2e6 [cfg80211]
 [<ffffffffa02ef8ba>] ? restore_regulatory_settings+0x2e6/0x2e6 [cfg80211]
 [<ffffffff81502cb3>] __mutex_lock_common+0x5d/0x39a
 [<ffffffffa02ef603>] ? restore_regulatory_settings+0x2f/0x2e6 [cfg80211]
 [<ffffffff810804e6>] ? local_clock+0x36/0x4d
 [<ffffffff81014ded>] ? paravirt_read_tsc+0x9/0xd
 [<ffffffff810152b7>] ? native_sched_clock+0x34/0x36
 [<ffffffff8108b9b5>] ? trace_hardirqs_off+0xd/0xf
 [<ffffffffa02ef8ba>] ? restore_regulatory_settings+0x2e6/0x2e6 [cfg80211]
 [<ffffffff815030ff>] mutex_lock_nested+0x40/0x45
 [<ffffffffa02ef603>] restore_regulatory_settings+0x2f/0x2e6 [cfg80211]
 [<ffffffffa02ef8cd>] reg_timeout_work+0x13/0x15 [cfg80211]
 [<ffffffff81075c49>] process_one_work+0x205/0x3e7
 [<ffffffff81075b91>] ? process_one_work+0x14d/0x3e7
 [<ffffffff8108d01b>] ? lock_acquired+0x210/0x243
 [<ffffffff810768f7>] worker_thread+0xda/0x15d
 [<ffffffff8107681d>] ? manage_workers+0x176/0x176
 [<ffffffff8107a2bd>] kthread+0xa8/0xb0
 [<ffffffff8150d944>] kernel_thread_helper+0x4/0x10
 [<ffffffff81504db4>] ? retint_restore_args+0x13/0x13
 [<ffffffff8107a215>] ? __init_kthread_worker+0x5a/0x5a
 [<ffffffff8150d940>] ? gs_change+0x13/0x13


Version-Release number of selected component (if applicable):
kernel 3.1.0-0.rc4.git0.0.fc16.x86_64

How reproducible:
I had flaky wlan that caused the connection to switch between world regulatory domain and the local regulatory domain. So far I have only found this bug once from the system logs.

Comment 1 Dave Jones 2012-03-22 17:14:33 UTC

[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 2 Dave Jones 2012-03-22 17:16:48 UTC

[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 3 Dave Jones 2012-03-22 17:25:32 UTC

[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 4 Ben Greear 2012-05-18 22:48:56 UTC

I hit this same thing in a somewhat-hacked 3.3.6+ kernel, so it seems
that it still exists in the latest stable code.

Comment 5 Josh Boyer 2012-09-07 16:07:55 UTC

This appears to have been fixed with commit fe20b39ec32e975f1054c0b7866c873a954adf05 in 3.5.  That was backported to 3.4.5 and should thus be fixed in F16.