Bug 1249850 - qxl lockdep warning on boot
qxl lockdep warning on boot
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
rawhide
x86_64 Linux
unspecified Severity unspecified
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-08-03 20:49 EDT by Andy Lutomirski
Modified: 2016-05-26 15:21 EDT (History)
14 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-05-26 15:21:27 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Andy Lutomirski 2015-08-03 20:49:34 EDT
Booting Rawhide under libvirt gets a lockdep warning on boot.  I've made no effort to figure out whether and when this regressed.  Dmesg excerpted below.

[    0.000000] Linux version 4.2.0-0.rc4.git4.1.fc24.x86_64 (mockbuild@bkernel01.phx2.fedoraproject.org) (gcc version 5.1.1 20150618 (Red Hat 5.1.1-4) (GCC) ) #1 SMP Fri Jul 31 16:32:22 UTC 2015

...

[    3.445920] [drm] Initialized drm 1.1.0 20060810
[    3.819995] [drm] Device Version 0.0
[    3.819997] [drm] Compression level 0 log level 0
[    3.819998] [drm] Currently using mode #0, list at 0x488
[    3.819999] [drm] 12286 io pages at offset 0x1000000
[    3.820014] [drm] 16777216 byte draw area at offset 0x0
[    3.820015] [drm] RAM header offset: 0x3ffe000
[    3.820016] [drm] rom modes offset 0x488 for 128 modes
[    3.820709] [TTM] Zone  kernel: Available graphics memory: 1013508 kiB
[    3.820712] [TTM] Initializing pool allocator
[    3.820755] [TTM] Initializing DMA pool allocator
[    3.821141] [drm] qxl: 16M of VRAM memory size
[    3.821142] [drm] qxl: 63M of IO pages memory ready (VRAM domain)
[    3.821143] [drm] qxl: 64M of Surface memory size
[    3.823810] [drm] main mem slot 1 [f4000000,3ffe000]
[    3.823813] [drm] surface mem slot 2 [f8000000,4000000]
[    3.824092] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    3.824094] [drm] No driver support for vblank timestamp query.
[    3.826735] [drm] fb mappable at 0xF4000000, size 3145728
[    3.826737] [drm] fb: depth 24, pitch 4096, width 1024, height 768
[    3.827722] fbcon: qxldrmfb (fb0) is primary device
[    3.849334] Console: switching to colour frame buffer device 128x48
[    3.850716] qxl 0000:00:02.0: fb0: qxldrmfb frame buffer device
[    3.850720] qxl 0000:00:02.0: registered panic notifier
[    3.855152] [drm] Initialized qxl 0.1.0 20120117 for 0000:00:02.0 on minor 0

...

[   13.687692] ======================================================
[   13.687693] [ INFO: possible circular locking dependency detected ]
[   13.687699] 4.2.0-0.rc4.git4.1.fc24.x86_64 #1 Not tainted
[   13.687700] -------------------------------------------------------
[   13.687701] gnome-shell/885 is trying to acquire lock:
[   13.687703]  (reservation_ww_class_acquire){+.+.+.}, at: [<ffffffffa0106c91>] qxl_release_reserve_list+0x51/0x100 [qxl]
[   13.687711] 
               but task is already holding lock:
[   13.687712]  (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffffa00fe70b>] qxl_crtc_page_flip+0xbb/0x210 [qxl]
[   13.687715] 
               which lock already depends on the new lock.

[   13.687717] 
               the existing dependency chain (in reverse order) is:
[   13.687718] 
               -> #1 (reservation_ww_class_mutex){+.+.+.}:
[   13.687720]        [<ffffffff81107a17>] lock_acquire+0xc7/0x270
[   13.687730]        [<ffffffff8186d8df>] __ww_mutex_lock+0x7f/0x720
[   13.687735]        [<ffffffffa009c88a>] ttm_eu_reserve_buffers+0x35a/0x600 [ttm]
[   13.687740]        [<ffffffffa0106c91>] qxl_release_reserve_list+0x51/0x100 [qxl]
[   13.687743]        [<ffffffffa0104831>] qxl_draw_opaque_fb+0xe1/0x3b0 [qxl]
[   13.687745]        [<ffffffffa0100f77>] qxl_fb_dirty_flush+0x1a7/0x250 [qxl]
[   13.687747]        [<ffffffffa0101039>] qxl_fb_work+0x19/0x20 [qxl]
[   13.687750]        [<ffffffff810cb362>] process_one_work+0x232/0x840
[   13.687756]        [<ffffffff810cb9be>] worker_thread+0x4e/0x450
[   13.687758]        [<ffffffff810d2544>] kthread+0x104/0x120
[   13.687760]        [<ffffffff8187145f>] ret_from_fork+0x3f/0x70
[   13.687762] 
               -> #0 (reservation_ww_class_acquire){+.+.+.}:
[   13.687764]        [<ffffffff81106e08>] __lock_acquire+0x1a78/0x1d00
[   13.687766]        [<ffffffff81107a17>] lock_acquire+0xc7/0x270
[   13.687768]        [<ffffffffa009c5fc>] ttm_eu_reserve_buffers+0xcc/0x600 [ttm]
[   13.687771]        [<ffffffffa0106c91>] qxl_release_reserve_list+0x51/0x100 [qxl]
[   13.687774]        [<ffffffffa0104c7f>] qxl_draw_dirty_fb+0x17f/0x470 [qxl]
[   13.687776]        [<ffffffffa00fe750>] qxl_crtc_page_flip+0x100/0x210 [qxl]
[   13.687778]        [<ffffffffa0055a04>] drm_mode_page_flip_ioctl+0x1a4/0x340 [drm]
[   13.687788]        [<ffffffffa0044795>] drm_ioctl+0x125/0x640 [drm]
[   13.687793]        [<ffffffff8128361e>] do_vfs_ioctl+0x2ee/0x550
[   13.687800]        [<ffffffff812838f9>] SyS_ioctl+0x79/0x90
[   13.687802]        [<ffffffff8187102e>] entry_SYSCALL_64_fastpath+0x12/0x76
[   13.687805] 
               other info that might help us debug this:

[   13.687806]  Possible unsafe locking scenario:

[   13.687807]        CPU0                    CPU1
[   13.687808]        ----                    ----
[   13.687808]   lock(reservation_ww_class_mutex);
[   13.687810]                                lock(reservation_ww_class_acquire);
[   13.687811]                                lock(reservation_ww_class_mutex);
[   13.687813]   lock(reservation_ww_class_acquire);
[   13.687814] 
                *** DEADLOCK ***

[   13.687816] 3 locks held by gnome-shell/885:
[   13.687816]  #0:  (crtc_ww_class_acquire){+.+.+.}, at: [<ffffffffa005fcb0>] drm_modeset_lock_crtc+0x50/0x110 [drm]
[   13.687826]  #1:  (crtc_ww_class_mutex){+.+.+.}, at: [<ffffffffa005f9be>] drm_modeset_lock+0x4e/0xe0 [drm]
[   13.687834]  #2:  (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffffa00fe70b>] qxl_crtc_page_flip+0xbb/0x210 [qxl]
[   13.687838] 
               stack backtrace:
[   13.687842] CPU: 0 PID: 885 Comm: gnome-shell Not tainted 4.2.0-0.rc4.git4.1.fc24.x86_64 #1
[   13.687843] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
[   13.687845]  0000000000000000 00000000d93e0f86 ffff88007986b948 ffffffff81867885
[   13.687847]  0000000000000000 ffffffff82bb1c80 ffff88007986b998 ffffffff81103913
[   13.687849]  00000000001d8240 ffff88007986b9f8 0000000000000003 ffff88007b574888
[   13.687852] Call Trace:
[   13.687858]  [<ffffffff81867885>] dump_stack+0x4c/0x65
[   13.687860]  [<ffffffff81103913>] print_circular_bug+0x1e3/0x250
[   13.687862]  [<ffffffff81106e08>] __lock_acquire+0x1a78/0x1d00
[   13.687865]  [<ffffffff81107a17>] lock_acquire+0xc7/0x270
[   13.687867]  [<ffffffffa0106c91>] ? qxl_release_reserve_list+0x51/0x100 [qxl]
[   13.687869]  [<ffffffff81104f99>] ? trace_hardirqs_on_caller+0x129/0x1b0
[   13.687873]  [<ffffffffa009c5fc>] ttm_eu_reserve_buffers+0xcc/0x600 [ttm]
[   13.687875]  [<ffffffffa0106c91>] ? qxl_release_reserve_list+0x51/0x100 [qxl]
[   13.687877]  [<ffffffffa0103376>] ? qxl_alloc_bo_reserved+0x56/0xb0 [qxl]
[   13.687880]  [<ffffffffa0106c91>] qxl_release_reserve_list+0x51/0x100 [qxl]
[   13.687882]  [<ffffffffa0104c7f>] qxl_draw_dirty_fb+0x17f/0x470 [qxl]
[   13.687885]  [<ffffffffa009645e>] ? ttm_bo_del_sub_from_lru+0x1e/0x50 [ttm]
[   13.687887]  [<ffffffffa00fe750>] qxl_crtc_page_flip+0x100/0x210 [qxl]
[   13.687894]  [<ffffffffa0055a04>] drm_mode_page_flip_ioctl+0x1a4/0x340 [drm]
[   13.687899]  [<ffffffffa0044795>] drm_ioctl+0x125/0x640 [drm]
[   13.687906]  [<ffffffff813991ad>] ? avc_has_perm+0x2d/0x290
[   13.687913]  [<ffffffffa0055860>] ? drm_mode_gamma_get_ioctl+0x130/0x130 [drm]
[   13.687916]  [<ffffffff8139dd95>] ? inode_has_perm.isra.46+0x55/0xa0
[   13.687917]  [<ffffffff8128361e>] do_vfs_ioctl+0x2ee/0x550
[   13.687919]  [<ffffffff8139e47b>] ? selinux_file_ioctl+0x5b/0xf0
[   13.687921]  [<ffffffff812838f9>] SyS_ioctl+0x79/0x90
[   13.687922]  [<ffffffff8187102e>] entry_SYSCALL_64_fastpath+0x12/0x76
Comment 1 Frediano Ziglio 2015-08-04 10:34:10 EDT
Reading from https://lwn.net/Articles/548909/ you should never have a ww_class_acquire after a ww_class_mutex as it is required that you get all locks with ww_class_mutex inside a ww_acquire_init+void ww_acquire_done, release all locks with ww_mutex_unlock and then release the ww_class_acquire with ww_acquire_fini. This means that the logic from the ioctl is wrong.


On qxl_release_reserve_list a ticket (acquire) is used but in qxl_bo_reserve (called by qxl_crtc_page_flip) a ticket is not used. This lead to the problem as is not expected to take multiple ww mutexes without a ticket.


This means that no function should call qxl_release_reserve_list with a reserved bo object (with qxl_bo_reserve). Looks like only qxl_hw_surface_alloc (not very used) and qxl_draw_dirty_fb (the one from this bug report) have this problem.


From struct drm_crtc_funcs documentation (where the call to qxl_draw_dirty_fb came from) for page_flip

/*
 * Flip to the given framebuffer.  This implements the page
 * flip ioctl descibed in drm_mode.h, specifically, the
 * implementation must return immediately and block all
 * rendering to the current fb until the flip has completed.
 * If userspace set the event flag in the ioctl, the event
 * argument will point to an event to send back when the flip
 * completes, otherwise it will be NULL.
 */

I don't know how many times users flip the framebuffer (or how to do it).
Comment 2 Frediano Ziglio 2015-09-21 06:29:22 EDT
Got a possible solution pinning object instead of keeping reference to them.
Comment 3 Frediano Ziglio 2015-09-24 10:37:25 EDT
Patch posted at http://lists.freedesktop.org/archives/dri-devel/2015-September/090890.html
Comment 4 Christophe Fergeau 2015-10-02 08:26:21 EDT
Moving back to the kernel as the patches referenced above are kernel patches. These patches can be added to the fedora kernel package if needed.
Comment 5 Josh Boyer 2015-10-05 11:25:06 EDT
Patches added in Fedora rawhide git.  Thanks.

Note You need to log in before you can comment on or make changes to this bug.