Bug 1155825

Summary: possible circular locking dependency detected __drm_modeset_lock_all
Product: [Fedora] Fedora Reporter: Vinson Lee <vlee>
Component: kernelAssignee: Rob Clark <rclark>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: brsmith, gansalmon, itamar, jakob, jonathan, kernel-maint, madhu.chinakonda, mchehab, rclark, robdclark, thellstrom, zackr
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-11-05 23:58:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
log from deadlock
none
drm/vmwgfx: fix lock breakage none

Description Vinson Lee 2014-10-22 23:25:49 UTC
Description of problem:
possible circular locking dependency detected

Version-Release number of selected component (if applicable):
3.18.0-0.rc0.git9.4.fc22.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Boot.
2.
3.

Actual results:
[   43.050637] ======================================================
[   43.050639] [ INFO: possible circular locking dependency detected ]
[   43.050642] 3.18.0-0.rc0.git9.4.fc22.x86_64 #1 Not tainted
[   43.050643] -------------------------------------------------------
[   43.050645] Xorg.bin/1090 is trying to acquire lock:
[   43.050647]  (&dev->mode_config.mutex){+.+.+.}, at: [<ffffffffa00767f0>] __drm_modeset_lock_all+0x90/0x120 [drm]
[   43.050666] 
but task is already holding lock:
[   43.050668]  (crtc_ww_class_acquire){+.+.+.}, at: [<ffffffffa0076921>] drm_modeset_lock_crtc+0x41/0xc0 [drm]
[   43.050681] 
which lock already depends on the new lock.

[   43.050684] 
the existing dependency chain (in reverse order) is:
[   43.050685] 
-> #1 (crtc_ww_class_acquire){+.+.+.}:
[   43.050690]        [<ffffffff8110a491>] lock_acquire+0xd1/0x2b0
[   43.050695]        [<ffffffffa007610c>] drm_modeset_acquire_init+0xbc/0xe0 [drm]
[   43.050706]        [<ffffffffa00767fa>] __drm_modeset_lock_all+0x9a/0x120 [drm]
[   43.050715]        [<ffffffffa0076890>] drm_modeset_lock_all+0x10/0x40 [drm]
[   43.050724]        [<ffffffffa0069770>] drm_mode_config_init+0x160/0x240 [drm]
[   43.050734]        [<ffffffffa010ed5e>] vmw_kms_init+0x1e/0x70 [vmwgfx]
[   43.050741]        [<ffffffffa0111038>] vmw_driver_load+0x8c8/0xd90 [vmwgfx]
[   43.050747]        [<ffffffffa0060025>] drm_dev_register+0xb5/0x110 [drm]
[   43.050755]        [<ffffffffa00632cd>] drm_get_pci_dev+0x8d/0x200 [drm]
[   43.050763]        [<ffffffffa010fda5>] vmw_probe+0x15/0x20 [vmwgfx]
[   43.050769]        [<ffffffff81468625>] local_pci_probe+0x45/0xa0
[   43.050773]        [<ffffffff81469b29>] pci_device_probe+0xf9/0x150
[   43.050775]        [<ffffffff8155bb13>] driver_probe_device+0xa3/0x400
[   43.050780]        [<ffffffff8155bf53>] __driver_attach+0xa3/0xb0
[   43.050782]        [<ffffffff815596e3>] bus_for_each_dev+0x73/0xc0
[   43.050785]        [<ffffffff8155b55e>] driver_attach+0x1e/0x20
[   43.050787]        [<ffffffff8155b118>] bus_add_driver+0x188/0x260
[   43.050790]        [<ffffffff8155cdb4>] driver_register+0x64/0xf0
[   43.050793]        [<ffffffff81467eb4>] __pci_register_driver+0x64/0x70
[   43.050795]        [<ffffffffa006353a>] drm_pci_init+0xfa/0x130 [drm]
[   43.050803]        [<ffffffffa0139019>] rpc_call_start+0x9/0x20 [sunrpc]
[   43.050818]        [<ffffffff81002148>] do_one_initcall+0xd8/0x210
[   43.050822]        [<ffffffff811539f2>] load_module+0x20c2/0x2870
[   43.050827]        [<ffffffff81154287>] SyS_init_module+0xe7/0x140
[   43.050829]        [<ffffffff818643a9>] system_call_fastpath+0x12/0x17
[   43.050835] 
-> #0 (&dev->mode_config.mutex){+.+.+.}:
[   43.050838]        [<ffffffff81109ac9>] __lock_acquire+0x1c39/0x1d50
[   43.050841]        [<ffffffff8110a491>] lock_acquire+0xd1/0x2b0
[   43.050844]        [<ffffffff8185f02e>] mutex_lock_nested+0x7e/0x440
[   43.050847]        [<ffffffffa00767f0>] __drm_modeset_lock_all+0x90/0x120 [drm]
[   43.050857]        [<ffffffffa0076890>] drm_modeset_lock_all+0x10/0x40 [drm]
[   43.050865]        [<ffffffffa010e14b>] vmw_du_crtc_cursor_move+0x5b/0xc0 [vmwgfx]
[   43.050870]        [<ffffffffa0067cd7>] drm_mode_cursor_common+0x2c7/0x390 [drm]
[   43.050878]        [<ffffffffa006b4e0>] drm_mode_cursor_ioctl+0x50/0x70 [drm]
[   43.050888]        [<ffffffffa005be0f>] drm_ioctl+0x1df/0x6a0 [drm]
[   43.050895]        [<ffffffffa010ff3d>] vmw_generic_ioctl+0x18d/0x2a0 [vmwgfx]
[   43.050901]        [<ffffffffa0110085>] vmw_unlocked_ioctl+0x15/0x20 [vmwgfx]
[   43.050905]        [<ffffffff8128aa50>] do_vfs_ioctl+0x2f0/0x520
[   43.050909]        [<ffffffff8128ad01>] SyS_ioctl+0x81/0xa0
[   43.050912]        [<ffffffff818643a9>] system_call_fastpath+0x12/0x17
[   43.050915] 
other info that might help us debug this:

[   43.050917]  Possible unsafe locking scenario:

[   43.050919]        CPU0                    CPU1
[   43.050920]        ----                    ----
[   43.050921]   lock(crtc_ww_class_acquire);
[   43.050923]                                lock(&dev->mode_config.mutex);
[   43.050925]                                lock(crtc_ww_class_acquire);
[   43.050927]   lock(&dev->mode_config.mutex);
[   43.050930] 
 *** DEADLOCK ***



Expected results:


Additional info:

Comment 1 Vinson Lee 2014-10-22 23:28:16 UTC
[   43.050932] 1 lock held by Xorg.bin/1090:
[   43.050934]  #0:  (crtc_ww_class_acquire){+.+.+.}, at: [<ffffffffa0076921>] drm_modeset_lock_crtc+0x41/0xc0 [drm]
[   43.050946] 
stack backtrace:
[   43.050950] CPU: 0 PID: 1090 Comm: Xorg.bin Not tainted 3.18.0-0.rc0.git9.4.fc22.x86_64 #1
[   43.050952] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[   43.050954]  0000000000000000 00000000b4807c9b ffff88001146b938 ffffffff8185a15d
[   43.050957]  0000000000000000 ffffffff82c02d30 ffff88001146b988 ffffffff818576dc
[   43.050961]  ffff88001145dae8 ffff88001146b9e8 ffff88001145cec0 ffff88001145dab8
[   43.050964] Call Trace:
[   43.050970]  [<ffffffff8185a15d>] dump_stack+0x4e/0x68
[   43.050973]  [<ffffffff818576dc>] print_circular_bug+0x203/0x214
[   43.050977]  [<ffffffff81109ac9>] __lock_acquire+0x1c39/0x1d50
[   43.050980]  [<ffffffff8110a491>] lock_acquire+0xd1/0x2b0
[   43.050991]  [<ffffffffa00767f0>] ? __drm_modeset_lock_all+0x90/0x120 [drm]
[   43.050994]  [<ffffffff8185f02e>] mutex_lock_nested+0x7e/0x440
[   43.051004]  [<ffffffffa00767f0>] ? __drm_modeset_lock_all+0x90/0x120 [drm]
[   43.051013]  [<ffffffffa00767f0>] ? __drm_modeset_lock_all+0x90/0x120 [drm]
[   43.051016]  [<ffffffff8124b9bc>] ? kmem_cache_alloc_trace+0x3cc/0x410
[   43.051025]  [<ffffffffa00767f0>] __drm_modeset_lock_all+0x90/0x120 [drm]
[   43.051034]  [<ffffffffa0076890>] drm_modeset_lock_all+0x10/0x40 [drm]
[   43.051048]  [<ffffffffa010e14b>] vmw_du_crtc_cursor_move+0x5b/0xc0 [vmwgfx]
[   43.051058]  [<ffffffffa0067cd7>] drm_mode_cursor_common+0x2c7/0x390 [drm]
[   43.051062]  [<ffffffff8110aa36>] ? lock_release_non_nested+0x3c6/0x3d0
[   43.051067]  [<ffffffff81028c4a>] ? native_sched_clock+0x2a/0xa0
[   43.051071]  [<ffffffff81216cde>] ? might_fault+0x5e/0xc0
[   43.051082]  [<ffffffffa006b4e0>] drm_mode_cursor_ioctl+0x50/0x70 [drm]
[   43.051091]  [<ffffffffa005be0f>] drm_ioctl+0x1df/0x6a0 [drm]
[   43.051095]  [<ffffffff81028c4a>] ? native_sched_clock+0x2a/0xa0
[   43.051101]  [<ffffffff8139de3a>] ? avc_has_perm+0x15a/0x2f0
[   43.051105]  [<ffffffff8139dd14>] ? avc_has_perm+0x34/0x2f0
[   43.051120]  [<ffffffffa005bc30>] ? drm_getmap+0xf0/0xf0 [drm]
[   43.051127]  [<ffffffffa010ff3d>] vmw_generic_ioctl+0x18d/0x2a0 [vmwgfx]
[   43.051133]  [<ffffffffa0110085>] vmw_unlocked_ioctl+0x15/0x20 [vmwgfx]
[   43.051136]  [<ffffffff8128aa50>] do_vfs_ioctl+0x2f0/0x520
[   43.051139]  [<ffffffff8128ad01>] SyS_ioctl+0x81/0xa0
[   43.051142]  [<ffffffff818643a9>] system_call_fastpath+0x12/0x17

Comment 2 Vinson Lee 2014-10-22 23:29:03 UTC
3.18.0-0.rc0.git9.4.fc22.x86_64 - bad
3.18.0-0.rc0.git4.1.fc22.x86_64 - good

Comment 3 Thomas Hellström 2014-10-28 09:52:42 UTC
Rob, is this due to one of your changes? 

It appears that Xorg deadlocks on the Vmware platform due to this.

/Thomas

Comment 4 Rob Clark 2014-10-28 12:36:42 UTC
(In reply to Thomas Hellström from comment #3)
> Rob, is this due to one of your changes? 
> 
> It appears that Xorg deadlocks on the Vmware platform due to this.
> 
> /Thomas

I'm guessing due to Daniel's trylock changes (cb597bb3).. I wonder how Xorg can deadlock with this, since it is a locking path on driver load vs a runtime path.  And I'm not quite seeing yet how mode_config.mutex is acquired *after* the drm_modeset_acquire_init(), since all appears to go via the same __drm_modeset_lock_all() path.

Do you have hung-task backtraces from the Xorg deadlock?

Comment 5 Thomas Hellström 2014-10-28 14:48:57 UTC
Created attachment 951411 [details]
log from deadlock

Attaching a log file from an Xorg deadlock.

Comment 6 Thomas Hellström 2014-10-28 14:50:09 UTC
(In reply to Rob Clark from comment #4)
> (In reply to Thomas Hellström from comment #3)
> > Rob, is this due to one of your changes? 
> > 
> > It appears that Xorg deadlocks on the Vmware platform due to this.
> > 
> > /Thomas
> 
> I'm guessing due to Daniel's trylock changes (cb597bb3).. I wonder how Xorg
> can deadlock with this, since it is a locking path on driver load vs a
> runtime path.  And I'm not quite seeing yet how mode_config.mutex is
> acquired *after* the drm_modeset_acquire_init(), since all appears to go via
> the same __drm_modeset_lock_all() path.
> 
> Do you have hung-task backtraces from the Xorg deadlock?

Attached. From a vanilla 3.18.0-rc2 kernel. (Although not Fedora.)

Thanks,
Thomas

Comment 7 Rob Clark 2014-10-30 13:58:39 UTC
(In reply to Thomas Hellström from comment #6)
> (In reply to Rob Clark from comment #4)
> > (In reply to Thomas Hellström from comment #3)
> > > Rob, is this due to one of your changes? 
> > > 
> > > It appears that Xorg deadlocks on the Vmware platform due to this.
> > > 
> > > /Thomas
> > 
> > I'm guessing due to Daniel's trylock changes (cb597bb3).. I wonder how Xorg
> > can deadlock with this, since it is a locking path on driver load vs a
> > runtime path.  And I'm not quite seeing yet how mode_config.mutex is
> > acquired *after* the drm_modeset_acquire_init(), since all appears to go via
> > the same __drm_modeset_lock_all() path.
> > 
> > Do you have hung-task backtraces from the Xorg deadlock?
> 
> Attached. From a vanilla 3.18.0-rc2 kernel. (Although not Fedora.)
> 
> Thanks,
> Thomas

I suspect you want something like:

----------------
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
index d2bc2b0..5be6583 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
@@ -273,7 +273,7 @@ int vmw_du_crtc_cursor_move(struct drm_crtc *crtc, int x, int y)
 	 * can do this since the caller in the drm core doesn't check anything
 	 * which is protected by any looks.
 	 */
-	drm_modeset_unlock(&crtc->mutex);
+	drm_modeset_unlock_crtc(crtc);
 	drm_modeset_lock_all(dev_priv->dev);
 
 	vmw_cursor_update_position(dev_priv, shown,
@@ -281,7 +281,7 @@ int vmw_du_crtc_cursor_move(struct drm_crtc *crtc, int x, int y)
 				   du->cursor_y + du->hotspot_y);
 
 	drm_modeset_unlock_all(dev_priv->dev);
-	drm_modeset_lock(&crtc->mutex, NULL);
+	drm_modeset_lock_crtc(crtc);
 
 	return 0;
 }

----------------

looks like there might be a couple other spots that want similar treatment.

Comment 8 Thomas Hellström 2014-10-30 14:36:37 UTC
It might be, but whoever broke the code should really be responsible for fixing it up...

Comment 9 Rob Clark 2014-10-30 17:41:58 UTC
Created attachment 952292 [details]
drm/vmwgfx: fix lock breakage

After:

commit d059f652e73c35678d28d4cd09ab2cec89696af9
Author:     Daniel Vetter <daniel.vetter>
AuthorDate: Fri Jul 25 18:07:40 2014 +0200

    drm: Handle legacy per-crtc locking with full acquire ctx

drm_mode_cursor_common() was switched to use drm_modeset_(un)lock_crtc()
which uses full aquire ctx.  So dropping/reaquiring the lock via
drm_modeset_(un)lock() directly isn't the right thing to do, as lockdep
kindly points out.

The 'FIXME's about sorting out whether vmwgfx *really* needs to lock-all
for cursor updates still apply.

Comment 10 Rob Clark 2014-10-31 15:23:27 UTC
fyi, patch posted on dri-devel

http://lists.freedesktop.org/archives/dri-devel/2014-October/070910.html

Comment 11 Josh Boyer 2014-11-05 23:58:56 UTC
This is fixed with commit 21e88620aa21b48d4f62d29275e3e2944a5ea2b5 which is in the rc3 kernels.

Thanks Rob!