1155825 – possible circular locking dependency detected __drm_modeset_lock_all

Bug 1155825 - possible circular locking dependency detected __drm_modeset_lock_all

Summary: possible circular locking dependency detected __drm_modeset_lock_all

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	rawhide
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Rob Clark
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-10-22 23:25 UTC by Vinson Lee
Modified:	2014-11-05 23:58 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2014-11-05 23:58:56 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
log from deadlock (21.83 KB, text/plain) 2014-10-28 14:48 UTC, Thomas Hellström	no flags	Details
drm/vmwgfx: fix lock breakage (2.34 KB, patch) 2014-10-30 17:41 UTC, Rob Clark	no flags	Details \| Diff
View All

Description Vinson Lee 2014-10-22 23:25:49 UTC

Description of problem:
possible circular locking dependency detected

Version-Release number of selected component (if applicable):
3.18.0-0.rc0.git9.4.fc22.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Boot.
2.
3.

Actual results:
[   43.050637] ======================================================
[   43.050639] [ INFO: possible circular locking dependency detected ]
[   43.050642] 3.18.0-0.rc0.git9.4.fc22.x86_64 #1 Not tainted
[   43.050643] -------------------------------------------------------
[   43.050645] Xorg.bin/1090 is trying to acquire lock:
[   43.050647]  (&dev->mode_config.mutex){+.+.+.}, at: [<ffffffffa00767f0>] __drm_modeset_lock_all+0x90/0x120 [drm]
[   43.050666] 
but task is already holding lock:
[   43.050668]  (crtc_ww_class_acquire){+.+.+.}, at: [<ffffffffa0076921>] drm_modeset_lock_crtc+0x41/0xc0 [drm]
[   43.050681] 
which lock already depends on the new lock.

[   43.050684] 
the existing dependency chain (in reverse order) is:
[   43.050685] 
-> #1 (crtc_ww_class_acquire){+.+.+.}:
[   43.050690]        [<ffffffff8110a491>] lock_acquire+0xd1/0x2b0
[   43.050695]        [<ffffffffa007610c>] drm_modeset_acquire_init+0xbc/0xe0 [drm]
[   43.050706]        [<ffffffffa00767fa>] __drm_modeset_lock_all+0x9a/0x120 [drm]
[   43.050715]        [<ffffffffa0076890>] drm_modeset_lock_all+0x10/0x40 [drm]
[   43.050724]        [<ffffffffa0069770>] drm_mode_config_init+0x160/0x240 [drm]
[   43.050734]        [<ffffffffa010ed5e>] vmw_kms_init+0x1e/0x70 [vmwgfx]
[   43.050741]        [<ffffffffa0111038>] vmw_driver_load+0x8c8/0xd90 [vmwgfx]
[   43.050747]        [<ffffffffa0060025>] drm_dev_register+0xb5/0x110 [drm]
[   43.050755]        [<ffffffffa00632cd>] drm_get_pci_dev+0x8d/0x200 [drm]
[   43.050763]        [<ffffffffa010fda5>] vmw_probe+0x15/0x20 [vmwgfx]
[   43.050769]        [<ffffffff81468625>] local_pci_probe+0x45/0xa0
[   43.050773]        [<ffffffff81469b29>] pci_device_probe+0xf9/0x150
[   43.050775]        [<ffffffff8155bb13>] driver_probe_device+0xa3/0x400
[   43.050780]        [<ffffffff8155bf53>] __driver_attach+0xa3/0xb0
[   43.050782]        [<ffffffff815596e3>] bus_for_each_dev+0x73/0xc0
[   43.050785]        [<ffffffff8155b55e>] driver_attach+0x1e/0x20
[   43.050787]        [<ffffffff8155b118>] bus_add_driver+0x188/0x260
[   43.050790]        [<ffffffff8155cdb4>] driver_register+0x64/0xf0
[   43.050793]        [<ffffffff81467eb4>] __pci_register_driver+0x64/0x70
[   43.050795]        [<ffffffffa006353a>] drm_pci_init+0xfa/0x130 [drm]
[   43.050803]        [<ffffffffa0139019>] rpc_call_start+0x9/0x20 [sunrpc]
[   43.050818]        [<ffffffff81002148>] do_one_initcall+0xd8/0x210
[   43.050822]        [<ffffffff811539f2>] load_module+0x20c2/0x2870
[   43.050827]        [<ffffffff81154287>] SyS_init_module+0xe7/0x140
[   43.050829]        [<ffffffff818643a9>] system_call_fastpath+0x12/0x17
[   43.050835] 
-> #0 (&dev->mode_config.mutex){+.+.+.}:
[   43.050838]        [<ffffffff81109ac9>] __lock_acquire+0x1c39/0x1d50
[   43.050841]        [<ffffffff8110a491>] lock_acquire+0xd1/0x2b0
[   43.050844]        [<ffffffff8185f02e>] mutex_lock_nested+0x7e/0x440
[   43.050847]        [<ffffffffa00767f0>] __drm_modeset_lock_all+0x90/0x120 [drm]
[   43.050857]        [<ffffffffa0076890>] drm_modeset_lock_all+0x10/0x40 [drm]
[   43.050865]        [<ffffffffa010e14b>] vmw_du_crtc_cursor_move+0x5b/0xc0 [vmwgfx]
[   43.050870]        [<ffffffffa0067cd7>] drm_mode_cursor_common+0x2c7/0x390 [drm]
[   43.050878]        [<ffffffffa006b4e0>] drm_mode_cursor_ioctl+0x50/0x70 [drm]
[   43.050888]        [<ffffffffa005be0f>] drm_ioctl+0x1df/0x6a0 [drm]
[   43.050895]        [<ffffffffa010ff3d>] vmw_generic_ioctl+0x18d/0x2a0 [vmwgfx]
[   43.050901]        [<ffffffffa0110085>] vmw_unlocked_ioctl+0x15/0x20 [vmwgfx]
[   43.050905]        [<ffffffff8128aa50>] do_vfs_ioctl+0x2f0/0x520
[   43.050909]        [<ffffffff8128ad01>] SyS_ioctl+0x81/0xa0
[   43.050912]        [<ffffffff818643a9>] system_call_fastpath+0x12/0x17
[   43.050915] 
other info that might help us debug this:

[   43.050917]  Possible unsafe locking scenario:

[   43.050919]        CPU0                    CPU1
[   43.050920]        ----                    ----
[   43.050921]   lock(crtc_ww_class_acquire);
[   43.050923]                                lock(&dev->mode_config.mutex);
[   43.050925]                                lock(crtc_ww_class_acquire);
[   43.050927]   lock(&dev->mode_config.mutex);
[   43.050930] 
 *** DEADLOCK ***



Expected results:


Additional info:

Comment 1 Vinson Lee 2014-10-22 23:28:16 UTC

[   43.050932] 1 lock held by Xorg.bin/1090:
[   43.050934]  #0:  (crtc_ww_class_acquire){+.+.+.}, at: [<ffffffffa0076921>] drm_modeset_lock_crtc+0x41/0xc0 [drm]
[   43.050946] 
stack backtrace:
[   43.050950] CPU: 0 PID: 1090 Comm: Xorg.bin Not tainted 3.18.0-0.rc0.git9.4.fc22.x86_64 #1
[   43.050952] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[   43.050954]  0000000000000000 00000000b4807c9b ffff88001146b938 ffffffff8185a15d
[   43.050957]  0000000000000000 ffffffff82c02d30 ffff88001146b988 ffffffff818576dc
[   43.050961]  ffff88001145dae8 ffff88001146b9e8 ffff88001145cec0 ffff88001145dab8
[   43.050964] Call Trace:
[   43.050970]  [<ffffffff8185a15d>] dump_stack+0x4e/0x68
[   43.050973]  [<ffffffff818576dc>] print_circular_bug+0x203/0x214
[   43.050977]  [<ffffffff81109ac9>] __lock_acquire+0x1c39/0x1d50
[   43.050980]  [<ffffffff8110a491>] lock_acquire+0xd1/0x2b0
[   43.050991]  [<ffffffffa00767f0>] ? __drm_modeset_lock_all+0x90/0x120 [drm]
[   43.050994]  [<ffffffff8185f02e>] mutex_lock_nested+0x7e/0x440
[   43.051004]  [<ffffffffa00767f0>] ? __drm_modeset_lock_all+0x90/0x120 [drm]
[   43.051013]  [<ffffffffa00767f0>] ? __drm_modeset_lock_all+0x90/0x120 [drm]
[   43.051016]  [<ffffffff8124b9bc>] ? kmem_cache_alloc_trace+0x3cc/0x410
[   43.051025]  [<ffffffffa00767f0>] __drm_modeset_lock_all+0x90/0x120 [drm]
[   43.051034]  [<ffffffffa0076890>] drm_modeset_lock_all+0x10/0x40 [drm]
[   43.051048]  [<ffffffffa010e14b>] vmw_du_crtc_cursor_move+0x5b/0xc0 [vmwgfx]
[   43.051058]  [<ffffffffa0067cd7>] drm_mode_cursor_common+0x2c7/0x390 [drm]
[   43.051062]  [<ffffffff8110aa36>] ? lock_release_non_nested+0x3c6/0x3d0
[   43.051067]  [<ffffffff81028c4a>] ? native_sched_clock+0x2a/0xa0
[   43.051071]  [<ffffffff81216cde>] ? might_fault+0x5e/0xc0
[   43.051082]  [<ffffffffa006b4e0>] drm_mode_cursor_ioctl+0x50/0x70 [drm]
[   43.051091]  [<ffffffffa005be0f>] drm_ioctl+0x1df/0x6a0 [drm]
[   43.051095]  [<ffffffff81028c4a>] ? native_sched_clock+0x2a/0xa0
[   43.051101]  [<ffffffff8139de3a>] ? avc_has_perm+0x15a/0x2f0
[   43.051105]  [<ffffffff8139dd14>] ? avc_has_perm+0x34/0x2f0
[   43.051120]  [<ffffffffa005bc30>] ? drm_getmap+0xf0/0xf0 [drm]
[   43.051127]  [<ffffffffa010ff3d>] vmw_generic_ioctl+0x18d/0x2a0 [vmwgfx]
[   43.051133]  [<ffffffffa0110085>] vmw_unlocked_ioctl+0x15/0x20 [vmwgfx]
[   43.051136]  [<ffffffff8128aa50>] do_vfs_ioctl+0x2f0/0x520
[   43.051139]  [<ffffffff8128ad01>] SyS_ioctl+0x81/0xa0
[   43.051142]  [<ffffffff818643a9>] system_call_fastpath+0x12/0x17

Comment 2 Vinson Lee 2014-10-22 23:29:03 UTC

3.18.0-0.rc0.git9.4.fc22.x86_64 - bad
3.18.0-0.rc0.git4.1.fc22.x86_64 - good

Comment 3 Thomas Hellström 2014-10-28 09:52:42 UTC

Rob, is this due to one of your changes? 

It appears that Xorg deadlocks on the Vmware platform due to this.

/Thomas

Comment 4 Rob Clark 2014-10-28 12:36:42 UTC

(In reply to Thomas Hellström from comment #3)
> Rob, is this due to one of your changes? 
> 
> It appears that Xorg deadlocks on the Vmware platform due to this.
> 
> /Thomas

I'm guessing due to Daniel's trylock changes (cb597bb3).. I wonder how Xorg can deadlock with this, since it is a locking path on driver load vs a runtime path.  And I'm not quite seeing yet how mode_config.mutex is acquired *after* the drm_modeset_acquire_init(), since all appears to go via the same __drm_modeset_lock_all() path.

Do you have hung-task backtraces from the Xorg deadlock?

Comment 5 Thomas Hellström 2014-10-28 14:48:57 UTC

Created attachment 951411 [details]
log from deadlock

Attaching a log file from an Xorg deadlock.

Comment 6 Thomas Hellström 2014-10-28 14:50:09 UTC

(In reply to Rob Clark from comment #4)
> (In reply to Thomas Hellström from comment #3)
> > Rob, is this due to one of your changes? 
> > 
> > It appears that Xorg deadlocks on the Vmware platform due to this.
> > 
> > /Thomas
> 
> I'm guessing due to Daniel's trylock changes (cb597bb3).. I wonder how Xorg
> can deadlock with this, since it is a locking path on driver load vs a
> runtime path.  And I'm not quite seeing yet how mode_config.mutex is
> acquired *after* the drm_modeset_acquire_init(), since all appears to go via
> the same __drm_modeset_lock_all() path.
> 
> Do you have hung-task backtraces from the Xorg deadlock?

Attached. From a vanilla 3.18.0-rc2 kernel. (Although not Fedora.)

Thanks,
Thomas

Comment 7 Rob Clark 2014-10-30 13:58:39 UTC

(In reply to Thomas Hellström from comment #6)
> (In reply to Rob Clark from comment #4)
> > (In reply to Thomas Hellström from comment #3)
> > > Rob, is this due to one of your changes? 
> > > 
> > > It appears that Xorg deadlocks on the Vmware platform due to this.
> > > 
> > > /Thomas
> > 
> > I'm guessing due to Daniel's trylock changes (cb597bb3).. I wonder how Xorg
> > can deadlock with this, since it is a locking path on driver load vs a
> > runtime path.  And I'm not quite seeing yet how mode_config.mutex is
> > acquired *after* the drm_modeset_acquire_init(), since all appears to go via
> > the same __drm_modeset_lock_all() path.
> > 
> > Do you have hung-task backtraces from the Xorg deadlock?
> 
> Attached. From a vanilla 3.18.0-rc2 kernel. (Although not Fedora.)
> 
> Thanks,
> Thomas

I suspect you want something like:

----------------
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
index d2bc2b0..5be6583 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
@@ -273,7 +273,7 @@ int vmw_du_crtc_cursor_move(struct drm_crtc *crtc, int x, int y)
 	 * can do this since the caller in the drm core doesn't check anything
 	 * which is protected by any looks.
 	 */
-	drm_modeset_unlock(&crtc->mutex);
+	drm_modeset_unlock_crtc(crtc);
 	drm_modeset_lock_all(dev_priv->dev);
 
 	vmw_cursor_update_position(dev_priv, shown,
@@ -281,7 +281,7 @@ int vmw_du_crtc_cursor_move(struct drm_crtc *crtc, int x, int y)
 				   du->cursor_y + du->hotspot_y);
 
 	drm_modeset_unlock_all(dev_priv->dev);
-	drm_modeset_lock(&crtc->mutex, NULL);
+	drm_modeset_lock_crtc(crtc);
 
 	return 0;
 }

----------------

looks like there might be a couple other spots that want similar treatment.

Comment 8 Thomas Hellström 2014-10-30 14:36:37 UTC

It might be, but whoever broke the code should really be responsible for fixing it up...

Comment 9 Rob Clark 2014-10-30 17:41:58 UTC

Created attachment 952292 [details]
drm/vmwgfx: fix lock breakage

After:

commit d059f652e73c35678d28d4cd09ab2cec89696af9
Author:     Daniel Vetter <daniel.vetter>
AuthorDate: Fri Jul 25 18:07:40 2014 +0200

    drm: Handle legacy per-crtc locking with full acquire ctx

drm_mode_cursor_common() was switched to use drm_modeset_(un)lock_crtc()
which uses full aquire ctx.  So dropping/reaquiring the lock via
drm_modeset_(un)lock() directly isn't the right thing to do, as lockdep
kindly points out.

The 'FIXME's about sorting out whether vmwgfx *really* needs to lock-all
for cursor updates still apply.

Comment 10 Rob Clark 2014-10-31 15:23:27 UTC

fyi, patch posted on dri-devel

http://lists.freedesktop.org/archives/dri-devel/2014-October/070910.html

Comment 11 Josh Boyer 2014-11-05 23:58:56 UTC

This is fixed with commit 21e88620aa21b48d4f62d29275e3e2944a5ea2b5 which is in the rc3 kernels.

Thanks Rob!

Note You need to log in before you can comment on or make changes to this bug.