Bug 1571128

Summary: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
Product: [Fedora] Fedora Reporter: Joseph D. Wagner <joe>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: rawhideCC: airlied, bskeggs, ewk, hdegoede, ichavero, itamar, jarodwilson, jcline, jglisse, joe, john.j5live, jonathan, josef, kernel-maint, kraxel, labbott, linville, mchehab, mjg59, patrick, steved
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-07-10 16:48:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
oops files none

Description Joseph D. Wagner 2018-04-24 07:16:14 UTC
Description of problem:
kernel-core crashes about every three seconds.

abrt says:
The backtrace does not contain enough meaningful function frames to be reported. It is annoying but it does not necessary signalize a problem with your computer. ABRT will not allow you to create a report in a bug tracking system but you can contact kernel maintainers via e-mail.

Version-Release number of selected component (if applicable):
4.17.0-0.rc1.git3.1.fc29.x86_64

How reproducible:
100%.

Steps to Reproduce:
1. Boot into Xfce.
2. Logon.
3. Watch abrt notifications every few seconds.

I've tried the following to get crash info:
1. Install kernel-debuginfo.
2. Add crashkernel=128M to grub2.
3. Started kdump.

But I can't it to produce a backtrace, or even a dump in /var/crash.

Is there anything I can do to get this info to you? Or is it something in the build that is simply missing?

Comment 1 Joseph D. Wagner 2018-04-24 07:22:03 UTC
Created attachment 1425829 [details]
oops files

Comment 2 Gerd Hoffmann 2018-04-24 13:14:51 UTC
please retest with latest rawhide kernel which has one serve qxl issue fixed.

Comment 3 Joseph D. Wagner 2018-04-25 13:33:57 UTC
As far as I know, this is the latest version in the repository.
"dnf clean all; dnf -y upgrade" did not install a new kernel.

If there is a newer version, please provide a link or push to the repository.

Comment 4 Joseph D. Wagner 2018-04-27 22:53:05 UTC
This issue appears to be resolved in kernel-4.17.0-0.rc2.git0.1.fc29.x86_64, so I'm closing this bug.

Comment 5 Patrick Monnerat 2018-05-04 16:34:41 UTC
I still have the same problem with kernel-4.17.0-0.rc3.git2.1.fc29.x86_64 as guest of QEMU/kvm running on an up-to-date 64-bit F27 host. Using Mate desktop.

-----
May  4 18:31:03 rawhide kernel: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
May  4 18:31:03 rawhide kernel: in_atomic(): 1, irqs_disabled(): 0, pid: 902, name: Xorg
May  4 18:31:03 rawhide kernel: 4 locks held by Xorg/902:
May  4 18:31:03 rawhide kernel: #0: 000000005e74e4e3 (crtc_ww_class_acquire){+.+.}, at: drm_mode_cursor_common+0x90/0x210 [drm]
May  4 18:31:03 rawhide kernel: #1: 00000000154815bd (crtc_ww_class_mutex){+.+.}, at: drm_modeset_lock+0xfb/0x110 [drm]
May  4 18:31:03 rawhide kernel: #2: 00000000f6ef4033 (reservation_ww_class_acquire){+.+.}, at: qxl_release_reserve_list+0x63/0x150 [qxl]
May  4 18:31:03 rawhide kernel: #3: 00000000256c7d08 (reservation_ww_class_mutex){+.+.}, at: ttm_eu_reserve_buffers+0x349/0x5b0 [ttm]
May  4 18:31:03 rawhide kernel: CPU: 1 PID: 902 Comm: Xorg Tainted: G        W         4.17.0-0.rc3.git2.1.fc29.x86_64 #1
May  4 18:31:03 rawhide kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
May  4 18:31:03 rawhide kernel: Call Trace:
May  4 18:31:03 rawhide kernel: dump_stack+0x85/0xc0
May  4 18:31:03 rawhide kernel: ___might_sleep.cold.72+0xac/0xbc
May  4 18:31:03 rawhide kernel: ? __mutex_lock+0x56/0xa10
May  4 18:31:03 rawhide kernel: ? _raw_spin_unlock_irqrestore+0x4b/0x60
May  4 18:31:03 rawhide kernel: ? __slab_free+0x153/0x360
May  4 18:31:03 rawhide kernel: ? debug_check_no_obj_freed+0x123/0x204
May  4 18:31:03 rawhide kernel: ? qxl_surface_evict+0x25/0x60 [qxl]
May  4 18:31:03 rawhide kernel: ? qxl_surface_evict+0x25/0x60 [qxl]
May  4 18:31:03 rawhide kernel: ? qxl_gem_object_free+0x37/0x60 [qxl]
May  4 18:31:03 rawhide kernel: ? qxl_bo_unref+0x1d/0x30 [qxl]
May  4 18:31:03 rawhide kernel: ? qxl_cursor_atomic_update+0x270/0x2b0 [qxl]
May  4 18:31:03 rawhide kernel: ? drm_atomic_helper_commit_planes+0xae/0x210 [drm_kms_helper]
May  4 18:31:03 rawhide kernel: ? drm_atomic_helper_commit_tail+0x26/0x60 [drm_kms_helper]
May  4 18:31:03 rawhide kernel: ? commit_tail+0x59/0x70 [drm_kms_helper]
May  4 18:31:03 rawhide kernel: ? drm_atomic_helper_commit+0xdf/0x150 [drm_kms_helper]
May  4 18:31:03 rawhide kernel: ? drm_atomic_helper_update_plane+0xf1/0x110 [drm_kms_helper]
May  4 18:31:03 rawhide kernel: ? __setplane_internal+0x137/0x260 [drm]
May  4 18:31:03 rawhide kernel: ? drm_internal_framebuffer_create+0x2b6/0x490 [drm]
May  4 18:31:03 rawhide kernel: ? drm_mode_cursor_universal+0xed/0x1f0 [drm]
May  4 18:31:03 rawhide kernel: ? drm_mode_cursor_common+0x19e/0x210 [drm]
May  4 18:31:03 rawhide kernel: ? drm_mode_cursor_ioctl+0x70/0x70 [drm]
May  4 18:31:03 rawhide kernel: ? drm_ioctl_kernel+0x5b/0xb0 [drm]
May  4 18:31:03 rawhide kernel: ? drm_ioctl+0x1b3/0x370 [drm]
May  4 18:31:03 rawhide kernel: ? drm_mode_cursor_ioctl+0x70/0x70 [drm]
May  4 18:31:03 rawhide kernel: ? finish_task_switch+0x98/0x2b0
May  4 18:31:03 rawhide kernel: ? do_vfs_ioctl+0xa5/0x6d0
May  4 18:31:03 rawhide kernel: ? __fget+0x10d/0x1f0
May  4 18:31:03 rawhide kernel: ? ksys_ioctl+0x60/0x90
May  4 18:31:03 rawhide kernel: ? __x64_sys_ioctl+0x16/0x20
May  4 18:31:03 rawhide kernel: ? do_syscall_64+0x60/0x1f0
May  4 18:31:03 rawhide kernel: ? entry_SYSCALL_64_after_hwframe+0x49/0xbe

Comment 6 Joseph D. Wagner 2018-05-04 17:20:04 UTC
This bug came back for me too after upgrading to 4.17.0-0.rc3.git2.1.fc29.x86_64.

Comment 7 Joseph D. Wagner 2018-05-04 17:21:37 UTC
Why does abort say "the backtrace does not contain enough meaningful function frames to be reported"? Could this be improved in the future to facilitate better reporting?

Comment 8 Joseph D. Wagner 2018-05-05 19:57:20 UTC
Issue is ongoing with 4.17.0-0.rc3.git4.1.fc29.x86_64.

Comment 9 Joseph D. Wagner 2018-05-26 23:04:30 UTC
This issue appeared to go away for 4.17.0-0.rc6.git1.1.fc29.x86_64, but it came back in 4.17.0-0.rc6.git2.1.fc29.x86_64.

I hope this info helps.

Comment 10 Patrick Monnerat 2018-05-29 18:28:06 UTC
Last version that worked for me: kernel-4.16.0-0.rc6.git0.1.fc29.x86_64
Not yet fixed in kernel-4.17.0-0.rc6.git3.1.fc29.x86_64

Comment 11 Joseph D. Wagner 2018-05-31 09:19:45 UTC
Appears to be fixed in 4.17.0-0.rc7.git0.1.fc29.x86_64. Can anyone confirm?

Comment 12 Patrick Monnerat 2018-05-31 11:10:23 UTC
> Appears to be fixed in 4.17.0-0.rc7.git0.1.fc29.x86_64. Can anyone confirm?

Ok for me too, although I'm not in position to point on the fix in the source code.

Comment 13 Jeremy Cline 2018-05-31 18:03:12 UTC
Hi Joseph, Patrick,

I believe the reason you're repeatedly seeing it "fixed" is because it's not actually fixed, but the rc (rc#.git0) builds turn off debugging options which includes CONFIG_LOCKDEP. If you install kernel-debug-4.17.0-0.rc7.git0.1.fc29.x86_64 or kernel-4.17.0-0.rc7.git1.1.fc29, you'll likely still see it.

Comment 14 Patrick Monnerat 2018-05-31 18:16:35 UTC
Thanks for the info Jeremy. I'll check with the next non-git0 update when available.

Comment 15 Jeremy Cline 2018-06-01 00:23:50 UTC
I went ahead and set up a VM, it seems pretty easy to reproduce, I don't see a fix submitted upstream, and I think I understand the problem so I'll see about submitting a patch to fix this.

Comment 16 Patrick Monnerat 2018-06-01 09:29:36 UTC
> I think I understand the problem so I'll see about submitting a patch to fix this.
Would be great. Thanks in advance.

Comment 17 Joseph D. Wagner 2018-06-18 18:54:03 UTC
Appears to still be a problem on 4.18.0-0.rc0.git10.1.fc29.x86_64.

Comment 18 Jeremy Cline 2018-06-18 19:15:09 UTC
Hi Joseph,

It looks like the fix is in linux-next. I'll close this bug when it arrives in Linus' tree. I recommend running the non-debug builds (builds with git0 in the release) until then.

Comment 19 Patrick Monnerat 2018-07-10 23:08:42 UTC
It seems effectively fixed in kernel-4.18.0-0.rc3.git3.1.fc29.x86_64. Thanks a lot.