Bug 1618906 - [regression] nouveau DRM: EVO timeout in kernel 4.15 or later
Summary: [regression] nouveau DRM: EVO timeout in kernel 4.15 or later
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-08-18 00:49 UTC by Dominik 'Rathann' Mierzejewski
Modified: 2021-02-08 19:23 UTC (History)
36 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)
Journalctl log showing DRM: base-0 timeout error (re. comment #3) (109.16 KB, text/plain)
2018-12-04 03:14 UTC, Lou Hafer
no flags Details


Links
System ID Private Priority Status Summary Last Updated
freedesktop.org Gitlab xorg/driver/xf86-video-nouveau/issues/411 0 None None None 2020-04-08 10:05:04 UTC

Description Dominik 'Rathann' Mierzejewski 2018-08-18 00:49:23 UTC
Description of problem:
After booting any kernel newer than 4.14.18-200.fc26.x86_64 I get a frozen screen. The GPU is G98M / GeForce 9300M GS [10de:06e9].

Version-Release number of selected component (if applicable):
4.17.14-102.fc27.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Boot any kernel after 4.14.18-200.fc26.x86_64

Actual results:
Frozen screen.

Expected results:
Fully functional screen.

Additional info:
# lspci -d 10de:06e9 -vnn
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation G98M [GeForce 9300M GS] [10de:06e9] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: ASUSTeK Computer Inc. U6V laptop [1043:19b2]
	Flags: bus master, fast devsel, latency 0, IRQ 24
	Memory at fc000000 (32-bit, non-prefetchable) [size=16M]
	Memory at d0000000 (64-bit, prefetchable) [size=256M]
	Memory at fa000000 (64-bit, non-prefetchable) [size=32M]
	I/O ports at dc00 [size=128]
	Expansion ROM at 000c0000 [disabled] [size=128K]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Capabilities: [100] Virtual Channel
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Kernel driver in use: nouveau
	Kernel modules: nouveau
# journalctl --no-hostname -k -b 0 |grep nouveau
Aug 18 01:58:15 kernel: nouveau 0000:01:00.0: NVIDIA G98 (298480a2)
Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: bios: version 62.98.3c.00.44
Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: bios: M0203T not found
Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: bios: M0203E not matched!
Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: fb: 512 MiB DDR2
Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: DRM: VRAM: 512 MiB
Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: DRM: GART: 1048576 MiB
Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: DRM: TMDS table version 2.0
Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: DRM: DCB version 4.0
Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: DRM: DCB outp 00: 01011323 00010034
Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: DRM: DCB outp 01: 02000300 00000028
Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: DRM: DCB outp 02: 02022312 00020030
Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: DRM: DCB conn 00: 00000000
Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: DRM: DCB conn 01: 00000140
Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: DRM: DCB conn 02: 00002261
Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: DRM: DCB conn 07: 00000513
Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: DRM: MM: using M2MF for buffer copies
Aug 18 01:58:16 kernel: nouveau 0000:01:00.0: DRM: allocated 1440x900 fb: 0x50000, bo 000000006f9828c3
Aug 18 01:58:28 kernel: fbcon: nouveaufb (fb0) is primary device
Aug 18 01:58:28 kernel: nouveau 0000:01:00.0: DRM: EVO timeout
Aug 18 01:58:28 kernel: nouveau 0000:01:00.0: DRM: base-0: timeout
Aug 18 01:58:28 kernel: nouveau 0000:01:00.0: DRM: base-0: timeout
Aug 18 01:58:28 kernel: nouveau 0000:01:00.0: DRM: base-0: timeout
Aug 18 01:58:28 kernel: nouveau 0000:01:00.0: DRM: base-0: timeout
Aug 18 01:58:28 kernel: nouveau 0000:01:00.0: DRM: GPU lockup - switching to software fbcon
Aug 18 01:58:28 kernel: nouveau 0000:01:00.0: DRM: base-0: timeout
Aug 18 01:58:28 kernel: nouveau 0000:01:00.0: fb0: nouveaufb frame buffer device
Aug 18 01:58:30 kernel: [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 0
Aug 18 01:58:30 kernel: nouveau 0000:01:00.0: DRM: base-0: timeout
Aug 18 01:58:32 kernel: nouveau 0000:01:00.0: DRM: base-0: timeout
Aug 18 01:58:37 kernel: nouveau 0000:01:00.0: DRM: base-0: timeout
Aug 18 01:58:53 kernel: nouveau 0000:01:00.0: DRM: base-0: timeout
Aug 18 01:59:10 kernel: nouveau 0000:01:00.0: DRM: base-0: timeout
Aug 18 01:59:11 kernel: nouveau 0000:01:00.0: DRM: base-0: timeout
Aug 18 01:59:13 kernel: nouveau 0000:01:00.0: DRM: base-0: timeout
Aug 18 01:59:52 kernel: nouveau 0000:01:00.0: DRM: EVO timeout
Aug 18 02:04:07 kernel: nouveau 0000:01:00.0: DRM: EVO timeout

Comment 1 Dominik 'Rathann' Mierzejewski 2018-08-18 20:37:06 UTC
Also reproducible on Fedora 28 with kernel 4.17.14-202.fc28.

Just for fun, I installed F29 kernel-4.18.1-300.fc29.x86_64 from koji and I get the same issue with slightly different errors with this one:
[    2.902501] nouveau 0000:01:00.0: NVIDIA G98 (298480a2)
[    2.977795] nouveau 0000:01:00.0: bios: version 62.98.3c.00.44
[    3.010065] nouveau 0000:01:00.0: bios: M0203T not found
[    3.010218] nouveau 0000:01:00.0: bios: M0203E not matched!
[    3.010363] nouveau 0000:01:00.0: fb: 512 MiB DDR2
[    3.185260] nouveau 0000:01:00.0: DRM: VRAM: 512 MiB
[    3.185262] nouveau 0000:01:00.0: DRM: GART: 1048576 MiB
[    3.185268] nouveau 0000:01:00.0: DRM: TMDS table version 2.0
[    3.185271] nouveau 0000:01:00.0: DRM: DCB version 4.0
[    3.185274] nouveau 0000:01:00.0: DRM: DCB outp 00: 01011323 00010034
[    3.185277] nouveau 0000:01:00.0: DRM: DCB outp 01: 02000300 00000028
[    3.185280] nouveau 0000:01:00.0: DRM: DCB outp 02: 02022312 00020030
[    3.185282] nouveau 0000:01:00.0: DRM: DCB conn 00: 00000000
[    3.185284] nouveau 0000:01:00.0: DRM: DCB conn 01: 00000140
[    3.185286] nouveau 0000:01:00.0: DRM: DCB conn 02: 00002261
[    3.185288] nouveau 0000:01:00.0: DRM: DCB conn 07: 00000513
[    3.195497] nouveau 0000:01:00.0: DRM: MM: using M2MF for buffer copies
[    3.242999] nouveau 0000:01:00.0: DRM: allocated 1440x900 fb: 0x50000, bo (____ptrval____)
[    3.257587] fbcon: nouveaufb (fb0) is primary device
[    5.288414] nouveau 0000:01:00.0: DRM: core notifier timeout
[    7.288412] nouveau 0000:01:00.0: DRM: base-0: timeout
[    7.334784] nouveau 0000:01:00.0: fb0: nouveaufb frame buffer device
[    7.343411] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 0
[   86.317620] nouveau 0000:01:00.0: DRM: core notifier timeout
[   88.989806] nouveau 0000:01:00.0: DRM: core notifier timeout
[   90.909693] nouveau 0000:01:00.0: DRM: base-0: timeout

Bumping to F29, then.

Comment 2 Dominik 'Rathann' Mierzejewski 2018-08-19 22:58:54 UTC
Bumping to rawhide after trying kernel-4.19.0-0.rc0.git5.1:

[    7.193842] nouveau 0000:01:00.0: NVIDIA G98 (298480a2)
[    7.253541] nouveau 0000:01:00.0: bios: version 62.98.3c.00.44
[    7.301129] nouveau 0000:01:00.0: bios: M0203T not found
[    7.301492] nouveau 0000:01:00.0: bios: M0203E not matched!
[    7.301669] nouveau 0000:01:00.0: fb: 512 MiB DDR2
[    7.719129] nouveau 0000:01:00.0: DRM: VRAM: 512 MiB
[    7.719498] nouveau 0000:01:00.0: DRM: GART: 1048576 MiB
[    7.719681] nouveau 0000:01:00.0: DRM: TMDS table version 2.0
[    7.719851] nouveau 0000:01:00.0: DRM: DCB version 4.0
[    7.720014] nouveau 0000:01:00.0: DRM: DCB outp 00: 01011323 00010034
[    7.720182] nouveau 0000:01:00.0: DRM: DCB outp 01: 02000300 00000028
[    7.720387] nouveau 0000:01:00.0: DRM: DCB outp 02: 02022312 00020030
[    7.720567] nouveau 0000:01:00.0: DRM: DCB conn 00: 00000000
[    7.720736] nouveau 0000:01:00.0: DRM: DCB conn 01: 00000140
[    7.720903] nouveau 0000:01:00.0: DRM: DCB conn 02: 00002261
[    7.721066] nouveau 0000:01:00.0: DRM: DCB conn 07: 00000513
[    7.738669] nouveau 0000:01:00.0: DRM: MM: using M2MF for buffer copies
[    7.784149] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[    7.813006] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for HDMI-A-1
[    7.828162] nouveau 0000:01:00.0: DRM: allocated 1440x900 fb: 0x50000, bo (____ptrval____)
[    7.870502] fbcon: nouveaufb (fb0) is primary device
[    7.885068] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[    7.898694] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for HDMI-A-1
[    9.902963] nouveau 0000:01:00.0: DRM: core notifier timeout
[   11.903058] nouveau 0000:01:00.0: DRM: base-0: timeout
[   11.908408] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[   11.947124] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[   11.958855] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for HDMI-A-1
[   11.961609] nouveau 0000:01:00.0: fb0: nouveaufb frame buffer device
[   11.972641] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 0
[   11.978613]  #0: (____ptrval____) (drm_connector_list_iter){.+.+}, at: nouveau_backlight_init+0x63/0x450 [nouveau]
[   22.205362] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[   32.445359] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[   42.685355] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[   52.925595] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[   63.165373] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[   73.405363] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[   83.645378] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[   93.890397] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[  104.125363] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[  107.020185] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[  107.032965] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for HDMI-A-1
[  107.074838] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[  107.086752] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for HDMI-A-1
[  110.113354] nouveau 0000:01:00.0: DRM: core notifier timeout
[  110.634595] ------------[ cut here ]------------
[  110.634608] nouveau 0000:01:00.0: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x000000010c412000] [size=4096 bytes]
[  110.634630] WARNING: CPU: 1 PID: 1163 at kernel/dma/debug.c:1230 check_sync+0x136/0x670
[  110.634634] Modules linked in: ip_set nfnetlink ebtable_nat ebtable_broute ccm bridge stp llc ip6table_nat nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables bnep sunrpc arc4 snd_hda_codec_realtek snd_hda_codec_generic ath9k snd_hda_intel ath9k_common snd_hda_codec ath9k_hw snd_hda_core uvcvideo btusb snd_hwdep btrtl snd_seq snd_seq_device btbcm btintel snd_pcm mac80211 videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 ath videobuf2_common cfg80211 videodev media bluetooth snd_timer snd coretemp ecdh_generic joydev r592 soundcore asus_laptop memstick sparse_keymap rfkill input_polldev pcc_cpufreq acpi_cpufreq dm_crypt
[  110.634775]  nouveau ata_generic pata_acpi firewire_ohci firewire_core mxm_wmi wmi i2c_algo_bit drm_kms_helper sdhci_pci cqhci sdhci ttm sis190 serio_raw mmc_core mii crc_itu_t drm sata_sis pata_sis video
[  110.634823] CPU: 1 PID: 1163 Comm: Xorg Not tainted 4.19.0-0.rc0.git5.1.fc30.x86_64 #1
[  110.634827] Hardware name: ASUSTeK Computer Inc.  X71SL               /X71SL     , BIOS 206     11/05/2008
[  110.634832] RIP: 0010:check_sync+0x136/0x670
[  110.634837] Code: 48 85 ed 75 04 48 8b 68 10 48 8b 3c 24 e8 e2 38 56 00 48 89 c6 4d 89 e8 4c 89 f9 48 89 ea 48 c7 c7 a8 18 30 b1 e8 ee 77 f6 ff <0f> 0b 8b 05 9a 75 85 01 85 c0 0f 84 81 04 00 00 48 83 c4 28 4c 89
[  110.634841] RSP: 0018:ffffb980412c7a10 EFLAGS: 00010082
[  110.634847] RAX: 0000000000000000 RBX: ffffffffb2f33410 RCX: 0000000000000006
[  110.634851] RDX: 0000000000000007 RSI: 0000000000000001 RDI: ffff9e12fbbd6ba0
[  110.634855] RBP: ffff9e12f9f82ed0 R08: 0000000000000000 R09: 0000000000000001
[  110.634859] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000286
[  110.634863] R13: 0000000000001000 R14: 0000000000010000 R15: 000000010c412000
[  110.634868] FS:  00007fe0441aeac0(0000) GS:ffff9e12fba00000(0000) knlGS:0000000000000000
[  110.634873] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  110.634877] CR2: 00007fe03c0c8d90 CR3: 0000000114a6e000 CR4: 00000000000006e0
[  110.634881] Call Trace:
[  110.634897]  debug_dma_sync_single_for_device+0x7b/0x90
[  110.634915]  ? ttm_bo_mem_compat+0x23/0x60 [ttm]
[  110.634925]  ? kfree+0x188/0x320
[  110.634932]  ? krealloc+0x25/0xa0
[  110.635040]  nouveau_bo_sync_for_device+0x6a/0xb0 [nouveau]
[  110.635098]  nouveau_bo_validate+0x71/0x90 [nouveau]
[  110.635154]  nouveau_gem_ioctl_pushbuf+0x8a5/0x1ad0 [nouveau]
[  110.635222]  ? nouveau_gem_ioctl_new+0xe0/0xe0 [nouveau]
[  110.635240]  ? drm_ioctl_kernel+0xa5/0xf0 [drm]
[  110.635240]  ? nouveau_gem_ioctl_new+0xe0/0xe0 [nouveau]
[  110.635240]  drm_ioctl_kernel+0xa5/0xf0 [drm]
[  110.635240]  drm_ioctl+0x1fc/0x390 [drm]
[  110.635240]  ? nouveau_gem_ioctl_new+0xe0/0xe0 [nouveau]
[  110.635240]  nouveau_drm_ioctl+0x65/0xc0 [nouveau]
[  110.635240]  do_vfs_ioctl+0xa5/0x6e0
[  110.635240]  ksys_ioctl+0x60/0x90
[  110.635240]  __x64_sys_ioctl+0x16/0x20
[  110.635240]  do_syscall_64+0x60/0x1f0
[  110.635240]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  110.635240] RIP: 0033:0x7fe041422ec7
[  110.635240] Code: 00 00 90 48 8b 05 d9 7f 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a9 7f 2c 00 f7 d8 64 89 01 48
[  110.635240] RSP: 002b:00007ffcd424fc68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  110.635240] RAX: ffffffffffffffda RBX: 0000000000d5ae98 RCX: 00007fe041422ec7
[  110.635240] RDX: 00007ffcd424fcd0 RSI: 00000000c0406481 RDI: 000000000000000e
[  110.635240] RBP: 00007ffcd424fcd0 R08: 0000000000000000 R09: 0000000000d59f20
[  110.635240] R10: 0000000000d6be98 R11: 0000000000000246 R12: 00000000c0406481
[  110.635240] R13: 000000000000000e R14: 0000000000d5a070 R15: 0000000000d59f20
[  110.635240] irq event stamp: 0
[  110.635240] hardirqs last  enabled at (0): [<0000000000000000>]           (null)
[  110.635240] hardirqs last disabled at (0): [<ffffffffb00bb817>] copy_process.part.28+0x747/0x1e70
[  110.635240] softirqs last  enabled at (0): [<ffffffffb00bb817>] copy_process.part.28+0x747/0x1e70
[  110.635240] softirqs last disabled at (0): [<0000000000000000>]           (null)
[  110.635240] ---[ end trace a1450e59d31d3810 ]---
[  114.365372] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[  117.080247] nouveau 0000:01:00.0: DRM: core notifier timeout
[  119.080664] nouveau 0000:01:00.0: DRM: base-0: timeout
[  122.562843] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[  122.574617] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for HDMI-A-1
[  124.605473] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[  134.845626] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[  145.085449] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[  155.325447] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[  165.565443] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[  175.805469] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[  186.045466] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[  196.285425] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[  201.777469] nouveau 0000:01:00.0: DRM: base-0: timeout
[  204.052686] nouveau 0000:01:00.0: DRM: base-0: timeout
[  206.087485] nouveau 0000:01:00.0: DRM: base-0: timeout
[  206.525448] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[  216.765543] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[  227.005455] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[  237.245471] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[  247.485448] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[  257.725448] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[  267.965455] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[  270.806584] perf: interrupt took too long (2512 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
[  271.508397] nouveau 0000:01:00.0: DRM: base-0: timeout
[  278.205455] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[  288.445453] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[  298.685448] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[  308.925453] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[  319.165443] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[  329.405454] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for VGA-1
[  331.508453] nouveau 0000:01:00.0: DRM: base-0: timeout
(this keeps repeating)

Comment 3 Lou Hafer 2018-12-04 03:11:23 UTC
I'm seeing this problem on Fedora 28. Kernel 4.18.18-200 works fine. Kernels 4.19.2-200 and 4.19.5-200 fail. The GUI doesn't quite lock up, but it's horribly intermittent & jerky. I'm using the kernel nouveau driver on a GeForce GTX 1070 (GP104). System was updated with `dnf upgrade' for both 4.19.2-200 and 4.19.5-200, but for both I had to drop back to 4.18.18-200 to avoid lockup. I've attached a log from journalctl. All goes ok through the boot (up to 17:02:20) but after playing with firefox (63.0.3) for a few minutes, I get near lock-up and start seeing error messages from 17:07:14. The system remains completely responsive when accessed via ssh.

Comment 4 Lou Hafer 2018-12-04 03:14:04 UTC
Created attachment 1511170 [details]
Journalctl log showing DRM: base-0 timeout error (re. comment #3)

Journalctl log promised in comment #3 (Hafer).

Comment 5 Sergio Basto 2018-12-04 13:13:48 UTC
My first bad commit: [fdba46ffb4c203b6e6794163493fd310f98bb4be] x86/apic: Get rid of multi CPU affinity (in kernel 4.15.0-git2)

My second bad commit: [a31e58e129f73ab5b04016330b13ed51fde7a961] x86/apic: Switch all APICs to Fixed delivery mode (in kernel-4.15.0-0.rc6.git1.1) [1] commit message say that fixes fdba46ffb4c2 ("x86/apic: Get rid of multi CPU affinity")


[1]
https://bugs.freedesktop.org/attachment.cgi?id=141327

Comment 6 Lou Hafer 2018-12-08 18:37:25 UTC
The problem persists as of kernel 4.19.6-200.fc28.x86_64. Still have to drop back to 4.18.18-200 to avoid an unusable GUI.

Comment 7 Lou Hafer 2018-12-24 01:23:13 UTC
The problem persists as of kernel 4.19.10-200.fc28.x86_64. Still have to drop back to 4.18.18-200 to avoid an unusable GUI.

Comment 8 Mitchel Humpherys 2018-12-31 02:50:45 UTC
I'm also affected by this bug on Fedora 29.

I hopped on the kernel mainline and started poking around and noticed that I see this bug on 4.19 but not on 4.20.  So I bisected to find the commit in the 4.20 series that fixes the bug.  The fix appears to be from Ben Skeggs (the assignee of this bug, go Ben!):

commit 970a5ee41c72df46e3b0f307528c7d8ef7734a2e
Author: Ben Skeggs <bskeggs>
Date:   Wed Dec 12 16:51:17 2018 +1000

    drm/nouveau/kms/nv50-: also flush fb writes when rewinding push buffer

    Should hopefully fix a regression some people have been seeing since EVO
    push buffers were moved to VRAM by default on Pascal GPUs.

    Fixes: d00ddd9da ("drm/nouveau/kms/nv50-: allocate push buffers in vidmem on pascal")
    Signed-off-by: Ben Skeggs <bskeggs>
    Cc: <stable.org> # 4.19+

I can cherry pick just this commit on top of 4.19 and I get a stable system.

Looks like this patch just needs to be pulled in to the Fedora kernel.

Comment 9 Jelle de Jong 2019-01-23 12:16:41 UTC
I am also reporting frozen nouveau drivers, it still does something but the screens get unusable.

Linux dw093.wdm.local 3.10.0-957.1.3.el7.x86_64 #1 SMP Thu Nov 29 14:49:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

[root@dw093 ~]# cat /etc/centos-release
CentOS Linux release 7.6.1810 (Core) 

[root@dw093 ~]# grep "DRM: core notifier timeout" /var/log/messages*
/var/log/messages:Jan 23 09:05:13 dw093 kernel: nouveau 0000:01:00.0: DRM: core notifier timeout
/var/log/messages:Jan 23 09:22:51 dw093 kernel: nouveau 0000:01:00.0: DRM: core notifier timeout
/var/log/messages:Jan 23 09:34:03 dw093 kernel: nouveau 0000:01:00.0: DRM: core notifier timeout
/var/log/messages:Jan 23 09:34:13 dw093 kernel: nouveau 0000:01:00.0: DRM: core notifier timeout
/var/log/messages:Jan 23 13:04:34 dw093 kernel: nouveau 0000:01:00.0: DRM: core notifier timeout
/var/log/messages:Jan 23 13:05:57 dw093 kernel: nouveau 0000:01:00.0: DRM: core notifier timeout
/var/log/messages:Jan 23 13:06:01 dw093 kernel: nouveau 0000:01:00.0: DRM: core notifier timeout
/var/log/messages:Jan 23 13:06:03 dw093 kernel: nouveau 0000:01:00.0: DRM: core notifier timeout
/var/log/messages:Jan 23 13:06:08 dw093 kernel: nouveau 0000:01:00.0: DRM: core notifier timeout
/var/log/messages:Jan 23 13:12:07 dw093 kernel: nouveau 0000:01:00.0: DRM: core notifier timeout
/var/log/messages:Jan 23 13:12:09 dw093 kernel: nouveau 0000:01:00.0: DRM: core notifier timeout
/var/log/messages:Jan 23 13:12:39 dw093 kernel: nouveau 0000:01:00.0: DRM: core notifier timeout
/var/log/messages:Jan 23 13:14:47 dw093 kernel: nouveau 0000:01:00.0: DRM: core notifier timeout
/var/log/messages:Jan 23 13:14:49 dw093 kernel: nouveau 0000:01:00.0: DRM: core notifier timeout
/var/log/messages:Jan 23 13:14:51 dw093 kernel: nouveau 0000:01:00.0: DRM: core notifier timeout


[root@dw093 ~]# modinfo nouveau
filename:       /lib/modules/3.10.0-957.1.3.el7.x86_64/kernel/drivers/gpu/drm/nouveau/nouveau.ko.xz
firmware:       nvidia/gp100/gr/sw_method_init.bin
firmware:       nvidia/gp100/gr/sw_bundle_init.bin
firmware:       nvidia/gp100/gr/sw_nonctx.bin
firmware:       nvidia/gp100/gr/sw_ctx.bin
firmware:       nvidia/gp100/gr/gpccs_sig.bin
firmware:       nvidia/gp100/gr/gpccs_data.bin
firmware:       nvidia/gp100/gr/gpccs_inst.bin
firmware:       nvidia/gp100/gr/gpccs_bl.bin
firmware:       nvidia/gp100/gr/fecs_sig.bin
firmware:       nvidia/gp100/gr/fecs_data.bin
firmware:       nvidia/gp100/gr/fecs_inst.bin
firmware:       nvidia/gp100/gr/fecs_bl.bin
firmware:       nvidia/gp100/acr/ucode_unload.bin
firmware:       nvidia/gp100/acr/ucode_load.bin
firmware:       nvidia/gp100/acr/bl.bin
firmware:       nvidia/gm206/gr/sw_method_init.bin
firmware:       nvidia/gm206/gr/sw_bundle_init.bin
firmware:       nvidia/gm206/gr/sw_nonctx.bin
firmware:       nvidia/gm206/gr/sw_ctx.bin
firmware:       nvidia/gm206/gr/gpccs_sig.bin
firmware:       nvidia/gm206/gr/gpccs_data.bin
firmware:       nvidia/gm206/gr/gpccs_inst.bin
firmware:       nvidia/gm206/gr/gpccs_bl.bin
firmware:       nvidia/gm206/gr/fecs_sig.bin
firmware:       nvidia/gm206/gr/fecs_data.bin
firmware:       nvidia/gm206/gr/fecs_inst.bin
firmware:       nvidia/gm206/gr/fecs_bl.bin
firmware:       nvidia/gm206/acr/ucode_unload.bin
firmware:       nvidia/gm206/acr/ucode_load.bin
firmware:       nvidia/gm206/acr/bl.bin
firmware:       nvidia/gm204/gr/sw_method_init.bin
firmware:       nvidia/gm204/gr/sw_bundle_init.bin
firmware:       nvidia/gm204/gr/sw_nonctx.bin
firmware:       nvidia/gm204/gr/sw_ctx.bin
firmware:       nvidia/gm204/gr/gpccs_sig.bin
firmware:       nvidia/gm204/gr/gpccs_data.bin
firmware:       nvidia/gm204/gr/gpccs_inst.bin
firmware:       nvidia/gm204/gr/gpccs_bl.bin
firmware:       nvidia/gm204/gr/fecs_sig.bin
firmware:       nvidia/gm204/gr/fecs_data.bin
firmware:       nvidia/gm204/gr/fecs_inst.bin
firmware:       nvidia/gm204/gr/fecs_bl.bin
firmware:       nvidia/gm204/acr/ucode_unload.bin
firmware:       nvidia/gm204/acr/ucode_load.bin
firmware:       nvidia/gm204/acr/bl.bin
firmware:       nvidia/gm200/gr/sw_method_init.bin
firmware:       nvidia/gm200/gr/sw_bundle_init.bin
firmware:       nvidia/gm200/gr/sw_nonctx.bin
firmware:       nvidia/gm200/gr/sw_ctx.bin
firmware:       nvidia/gm200/gr/gpccs_sig.bin
firmware:       nvidia/gm200/gr/gpccs_data.bin
firmware:       nvidia/gm200/gr/gpccs_inst.bin
firmware:       nvidia/gm200/gr/gpccs_bl.bin
firmware:       nvidia/gm200/gr/fecs_sig.bin
firmware:       nvidia/gm200/gr/fecs_data.bin
firmware:       nvidia/gm200/gr/fecs_inst.bin
firmware:       nvidia/gm200/gr/fecs_bl.bin
firmware:       nvidia/gm200/acr/ucode_unload.bin
firmware:       nvidia/gm200/acr/ucode_load.bin
firmware:       nvidia/gm200/acr/bl.bin
firmware:       nvidia/gm20b/pmu/sig.bin
firmware:       nvidia/gm20b/pmu/image.bin
firmware:       nvidia/gm20b/pmu/desc.bin
firmware:       nvidia/gm20b/gr/sw_method_init.bin
firmware:       nvidia/gm20b/gr/sw_bundle_init.bin
firmware:       nvidia/gm20b/gr/sw_nonctx.bin
firmware:       nvidia/gm20b/gr/sw_ctx.bin
firmware:       nvidia/gm20b/gr/gpccs_data.bin
firmware:       nvidia/gm20b/gr/gpccs_inst.bin
firmware:       nvidia/gm20b/gr/fecs_sig.bin
firmware:       nvidia/gm20b/gr/fecs_data.bin
firmware:       nvidia/gm20b/gr/fecs_inst.bin
firmware:       nvidia/gm20b/gr/fecs_bl.bin
firmware:       nvidia/gm20b/acr/ucode_load.bin
firmware:       nvidia/gm20b/acr/bl.bin
firmware:       nvidia/gp107/sec2/sig.bin
firmware:       nvidia/gp107/sec2/image.bin
firmware:       nvidia/gp107/sec2/desc.bin
firmware:       nvidia/gp107/nvdec/scrubber.bin
firmware:       nvidia/gp107/gr/sw_method_init.bin
firmware:       nvidia/gp107/gr/sw_bundle_init.bin
firmware:       nvidia/gp107/gr/sw_nonctx.bin
firmware:       nvidia/gp107/gr/sw_ctx.bin
firmware:       nvidia/gp107/gr/gpccs_sig.bin
firmware:       nvidia/gp107/gr/gpccs_data.bin
firmware:       nvidia/gp107/gr/gpccs_inst.bin
firmware:       nvidia/gp107/gr/gpccs_bl.bin
firmware:       nvidia/gp107/gr/fecs_sig.bin
firmware:       nvidia/gp107/gr/fecs_data.bin
firmware:       nvidia/gp107/gr/fecs_inst.bin
firmware:       nvidia/gp107/gr/fecs_bl.bin
firmware:       nvidia/gp107/acr/ucode_unload.bin
firmware:       nvidia/gp107/acr/ucode_load.bin
firmware:       nvidia/gp107/acr/unload_bl.bin
firmware:       nvidia/gp107/acr/bl.bin
firmware:       nvidia/gp106/sec2/sig.bin
firmware:       nvidia/gp106/sec2/image.bin
firmware:       nvidia/gp106/sec2/desc.bin
firmware:       nvidia/gp106/nvdec/scrubber.bin
firmware:       nvidia/gp106/gr/sw_method_init.bin
firmware:       nvidia/gp106/gr/sw_bundle_init.bin
firmware:       nvidia/gp106/gr/sw_nonctx.bin
firmware:       nvidia/gp106/gr/sw_ctx.bin
firmware:       nvidia/gp106/gr/gpccs_sig.bin
firmware:       nvidia/gp106/gr/gpccs_data.bin
firmware:       nvidia/gp106/gr/gpccs_inst.bin
firmware:       nvidia/gp106/gr/gpccs_bl.bin
firmware:       nvidia/gp106/gr/fecs_sig.bin
firmware:       nvidia/gp106/gr/fecs_data.bin
firmware:       nvidia/gp106/gr/fecs_inst.bin
firmware:       nvidia/gp106/gr/fecs_bl.bin
firmware:       nvidia/gp106/acr/ucode_unload.bin
firmware:       nvidia/gp106/acr/ucode_load.bin
firmware:       nvidia/gp106/acr/unload_bl.bin
firmware:       nvidia/gp106/acr/bl.bin
firmware:       nvidia/gp104/sec2/sig.bin
firmware:       nvidia/gp104/sec2/image.bin
firmware:       nvidia/gp104/sec2/desc.bin
firmware:       nvidia/gp104/nvdec/scrubber.bin
firmware:       nvidia/gp104/gr/sw_method_init.bin
firmware:       nvidia/gp104/gr/sw_bundle_init.bin
firmware:       nvidia/gp104/gr/sw_nonctx.bin
firmware:       nvidia/gp104/gr/sw_ctx.bin
firmware:       nvidia/gp104/gr/gpccs_sig.bin
firmware:       nvidia/gp104/gr/gpccs_data.bin
firmware:       nvidia/gp104/gr/gpccs_inst.bin
firmware:       nvidia/gp104/gr/gpccs_bl.bin
firmware:       nvidia/gp104/gr/fecs_sig.bin
firmware:       nvidia/gp104/gr/fecs_data.bin
firmware:       nvidia/gp104/gr/fecs_inst.bin
firmware:       nvidia/gp104/gr/fecs_bl.bin
firmware:       nvidia/gp104/acr/ucode_unload.bin
firmware:       nvidia/gp104/acr/ucode_load.bin
firmware:       nvidia/gp104/acr/unload_bl.bin
firmware:       nvidia/gp104/acr/bl.bin
firmware:       nvidia/gp102/sec2/sig.bin
firmware:       nvidia/gp102/sec2/image.bin
firmware:       nvidia/gp102/sec2/desc.bin
firmware:       nvidia/gp102/nvdec/scrubber.bin
firmware:       nvidia/gp102/gr/sw_method_init.bin
firmware:       nvidia/gp102/gr/sw_bundle_init.bin
firmware:       nvidia/gp102/gr/sw_nonctx.bin
firmware:       nvidia/gp102/gr/sw_ctx.bin
firmware:       nvidia/gp102/gr/gpccs_sig.bin
firmware:       nvidia/gp102/gr/gpccs_data.bin
firmware:       nvidia/gp102/gr/gpccs_inst.bin
firmware:       nvidia/gp102/gr/gpccs_bl.bin
firmware:       nvidia/gp102/gr/fecs_sig.bin
firmware:       nvidia/gp102/gr/fecs_data.bin
firmware:       nvidia/gp102/gr/fecs_inst.bin
firmware:       nvidia/gp102/gr/fecs_bl.bin
firmware:       nvidia/gp102/acr/ucode_unload.bin
firmware:       nvidia/gp102/acr/ucode_load.bin
firmware:       nvidia/gp102/acr/unload_bl.bin
firmware:       nvidia/gp102/acr/bl.bin
firmware:       nvidia/gv100/sec2/sig.bin
firmware:       nvidia/gv100/sec2/image.bin
firmware:       nvidia/gv100/sec2/desc.bin
firmware:       nvidia/gv100/nvdec/scrubber.bin
firmware:       nvidia/gv100/gr/sw_method_init.bin
firmware:       nvidia/gv100/gr/sw_bundle_init.bin
firmware:       nvidia/gv100/gr/sw_nonctx.bin
firmware:       nvidia/gv100/gr/sw_ctx.bin
firmware:       nvidia/gv100/gr/gpccs_sig.bin
firmware:       nvidia/gv100/gr/gpccs_data.bin
firmware:       nvidia/gv100/gr/gpccs_inst.bin
firmware:       nvidia/gv100/gr/gpccs_bl.bin
firmware:       nvidia/gv100/gr/fecs_sig.bin
firmware:       nvidia/gv100/gr/fecs_data.bin
firmware:       nvidia/gv100/gr/fecs_inst.bin
firmware:       nvidia/gv100/gr/fecs_bl.bin
firmware:       nvidia/gv100/acr/ucode_unload.bin
firmware:       nvidia/gv100/acr/ucode_load.bin
firmware:       nvidia/gv100/acr/unload_bl.bin
firmware:       nvidia/gv100/acr/bl.bin
firmware:       nvidia/gp108/sec2/sig.bin
firmware:       nvidia/gp108/sec2/image.bin
firmware:       nvidia/gp108/sec2/desc.bin
firmware:       nvidia/gp108/nvdec/scrubber.bin
firmware:       nvidia/gp108/gr/sw_method_init.bin
firmware:       nvidia/gp108/gr/sw_bundle_init.bin
firmware:       nvidia/gp108/gr/sw_nonctx.bin
firmware:       nvidia/gp108/gr/sw_ctx.bin
firmware:       nvidia/gp108/gr/gpccs_sig.bin
firmware:       nvidia/gp108/gr/gpccs_data.bin
firmware:       nvidia/gp108/gr/gpccs_inst.bin
firmware:       nvidia/gp108/gr/gpccs_bl.bin
firmware:       nvidia/gp108/gr/fecs_sig.bin
firmware:       nvidia/gp108/gr/fecs_data.bin
firmware:       nvidia/gp108/gr/fecs_inst.bin
firmware:       nvidia/gp108/gr/fecs_bl.bin
firmware:       nvidia/gp108/acr/ucode_unload.bin
firmware:       nvidia/gp108/acr/ucode_load.bin
firmware:       nvidia/gp108/acr/unload_bl.bin
firmware:       nvidia/gp108/acr/bl.bin
firmware:       nvidia/gp10b/pmu/sig.bin
firmware:       nvidia/gp10b/pmu/image.bin
firmware:       nvidia/gp10b/pmu/desc.bin
firmware:       nvidia/gp10b/gr/sw_method_init.bin
firmware:       nvidia/gp10b/gr/sw_bundle_init.bin
firmware:       nvidia/gp10b/gr/sw_nonctx.bin
firmware:       nvidia/gp10b/gr/sw_ctx.bin
firmware:       nvidia/gp10b/gr/gpccs_sig.bin
firmware:       nvidia/gp10b/gr/gpccs_data.bin
firmware:       nvidia/gp10b/gr/gpccs_inst.bin
firmware:       nvidia/gp10b/gr/gpccs_bl.bin
firmware:       nvidia/gp10b/gr/fecs_sig.bin
firmware:       nvidia/gp10b/gr/fecs_data.bin
firmware:       nvidia/gp10b/gr/fecs_inst.bin
firmware:       nvidia/gp10b/gr/fecs_bl.bin
firmware:       nvidia/gp10b/acr/ucode_load.bin
firmware:       nvidia/gp10b/acr/bl.bin
license:        GPL and additional rights
description:    nVidia Riva/TNT/GeForce/Quadro/Tesla/Tegra K1+
author:         Nouveau Project
retpoline:      Y
rhelversion:    7.6
srcversion:     464415DA74D2AF7BF0C5E06
alias:          pci:v000012D2d*sv*sd*bc03sc*i*
alias:          pci:v000010DEd*sv*sd*bc03sc*i*
depends:        drm,drm_kms_helper,ttm,mxm-wmi,wmi,video,i2c-algo-bit
intree:         Y
vermagic:       3.10.0-957.1.3.el7.x86_64 SMP mod_unload modversions 
signer:         CentOS Linux kernel signing key
sig_key:        E7:CE:F3:61:3A:9B:8B:D0:12:FA:E7:49:82:72:15:9B:B1:87:9C:65
sig_hashalgo:   sha256
parm:           vram_pushbuf:Create DMA push buffers in VRAM (int)
parm:           tv_norm:Default TV norm.
		Supported: PAL, PAL-M, PAL-N, PAL-Nc, NTSC-M, NTSC-J,
			hd480i, hd480p, hd576i, hd576p, hd720p, hd1080i.
		Default: PAL
		*NOTE* Ignored for cards with external TV encoders. (charp)
parm:           nofbaccel:Disable fbcon acceleration (int)
parm:           fbcon_bpp:fbcon bits-per-pixel (default: auto) (int)
parm:           mst:Enable DisplayPort multi-stream (default: enabled) (int)
parm:           tv_disable:Disable TV-out detection (int)
parm:           ignorelid:Ignore ACPI lid status (int)
parm:           duallink:Allow dual-link TMDS (default: enabled) (int)
parm:           hdmimhz:Force a maximum HDMI pixel clock (in MHz) (int)
parm:           config:option string to pass to driver core (charp)
parm:           debug:debug string to pass to driver core (charp)
parm:           noaccel:disable kernel/abi16 acceleration (int)
parm:           modeset:enable driver (default: auto, 0 = disabled, 1 = enabled, 2 = headless) (int)
parm:           atomic:Expose atomic ioctl (default: disabled) (int)
parm:           runpm:disable (0), force enable (1), optimus only default (-1) (int)

Comment 10 Alejandro Ochoa 2019-01-28 17:14:47 UTC
I'm happy to say 4.20.3-200.fc29.x86_64, which is available as a regular "dnf update" on Fedora 29, appears to have the bug fixed.  I've been running it for a few days with none of the issues I used to have with the 4.19 series (frequent hangups as other had reported before).  Before trying the 4.20 kernel above, the only one that worked for me was 4.18.16-300.fc29 as Lou had reported before.

For completeness, my affected computer has the GP107GL [Quadro P400] Nvidia card.

Comment 11 Ivan Font 2019-04-05 01:40:23 UTC
I believe I'm seeing the same issue on Fedora 29 with 5.0.5-200.fc29.x86_64. This is with hybrid graphics when nouveau is trying to initialize and times out on each CPU which locks the system.

$ lspci -d 10de:1cba -vnn
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107GLM [Quadro P2000 Mobile] [10de:1cba] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: Lenovo Device [17aa:2266]
	Flags: bus master, fast devsel, latency 0, IRQ 131
	Memory at a3000000 (32-bit, non-prefetchable) [size=16M]
	Memory at 60000000 (64-bit, prefetchable) [size=256M]
	Memory at 70000000 (64-bit, prefetchable) [size=32M]
	I/O ports at 3000 [size=128]
	Expansion ROM at a4080000 [disabled] [size=512K]
	Capabilities: <access denied>
	Kernel driver in use: nouveau
	Kernel modules: nouveau



$ journalctl --no-hostname -k -b -1 | grep nouveau
Apr 04 10:17:43 kernel: nouveau: detected PR support, will not use DSM
Apr 04 10:17:43 kernel: nouveau 0000:01:00.0: enabling device (0006 -> 0007)
Apr 04 10:17:43 kernel: nouveau 0000:01:00.0: NVIDIA GP107 (137000a1)
Apr 04 10:17:43 kernel: nouveau 0000:01:00.0: bios: version 86.07.63.00.35
Apr 04 10:17:43 kernel: nouveau 0000:01:00.0: fb: 4096 MiB GDDR5
Apr 04 10:17:43 kernel: nouveau 0000:01:00.0: DRM: VRAM: 4096 MiB
Apr 04 10:17:43 kernel: nouveau 0000:01:00.0: DRM: GART: 536870912 MiB
Apr 04 10:17:43 kernel: nouveau 0000:01:00.0: DRM: BIT table 'A' not found
Apr 04 10:17:43 kernel: nouveau 0000:01:00.0: DRM: BIT table 'L' not found
Apr 04 10:17:43 kernel: nouveau 0000:01:00.0: DRM: TMDS table version 2.0
Apr 04 10:17:43 kernel: nouveau 0000:01:00.0: DRM: DCB version 4.1
Apr 04 10:17:43 kernel: nouveau 0000:01:00.0: DRM: DCB outp 00: 02800f76 04600020
Apr 04 10:17:43 kernel: nouveau 0000:01:00.0: DRM: DCB outp 01: 02011f62 00020010
Apr 04 10:17:43 kernel: nouveau 0000:01:00.0: DRM: DCB outp 02: 01022f46 04600010
Apr 04 10:17:43 kernel: nouveau 0000:01:00.0: DRM: DCB outp 03: 01033f56 04600020
Apr 04 10:17:43 kernel: nouveau 0000:01:00.0: DRM: DCB conn 00: 00020047
Apr 04 10:17:43 kernel: nouveau 0000:01:00.0: DRM: DCB conn 01: 00010161
Apr 04 10:17:43 kernel: nouveau 0000:01:00.0: DRM: DCB conn 02: 00001246
Apr 04 10:17:43 kernel: nouveau 0000:01:00.0: DRM: DCB conn 03: 00002346
Apr 04 10:17:43 kernel: nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies
Apr 04 10:17:43 kernel: [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 0
Apr 04 17:17:55 kernel: nouveau 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
Apr 04 17:17:58 kernel: nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 409800 [ TIMEOUT ]
Apr 04 17:18:00 kernel: nouveau 0000:01:00.0: timeout
Apr 04 17:18:00 kernel: WARNING: CPU: 4 PID: 1594 at drivers/gpu/drm/nouveau/nvkm/engine/gr/gf100.c:1524 gf100_gr_init_ctxctl_ext+0x323/0x7d0 [nouveau]
Apr 04 17:18:00 kernel:  videobuf2_memops btintel snd_hda_core videobuf2_v4l2 mdev iwlwifi vfio_iommu_type1 snd_hwdep videobuf2_common bluetooth vfio snd_seq joydev videodev snd_seq_device kvm media snd_pcm cfg80211 wmi_bmof intel_wmi_thunderbolt ecdh_generic idma64 thinkpad_acpi ucsi_acpi mei_me snd_timer processor_thermal_device thunderbolt intel_lpss_pci i2c_i801 typec_ucsi mei ledtrig_audio irqbypass intel_pch_thermal intel_lpss intel_soc_dts_iosf snd typec soundcore rfkill int3403_thermal int340x_thermal_zone pcc_cpufreq acpi_pad int3400_thermal acpi_thermal_rel dm_crypt nouveau crct10dif_pclmul mxm_wmi i2c_algo_bit crc32_pclmul drm_kms_helper ttm crc32c_intel nvme drm e1000e ghash_clmulni_intel nvme_core serio_raw wmi video uas usb_storage
Apr 04 17:18:00 kernel: RIP: 0010:gf100_gr_init_ctxctl_ext+0x323/0x7d0 [nouveau]
Apr 04 17:18:00 kernel:  gf100_gr_init_ctxctl+0x2e/0x2b0 [nouveau]
Apr 04 17:18:00 kernel:  ? gf100_gr_init+0x53c/0x580 [nouveau]
Apr 04 17:18:00 kernel:  nvkm_engine_init+0xaa/0x1e0 [nouveau]
Apr 04 17:18:00 kernel:  nvkm_subdev_init+0xb2/0x200 [nouveau]
Apr 04 17:18:00 kernel:  nvkm_engine_ref.part.0+0x43/0x60 [nouveau]
Apr 04 17:18:00 kernel:  nvkm_ioctl_new+0x125/0x220 [nouveau]
Apr 04 17:18:00 kernel:  ? nvkm_fifo_chan_child_del+0x90/0x90 [nouveau]
Apr 04 17:18:00 kernel:  ? gf100_gr_dtor+0xd0/0xd0 [nouveau]
Apr 04 17:18:00 kernel:  nvkm_ioctl+0xd8/0x170 [nouveau]
Apr 04 17:18:00 kernel:  usif_ioctl+0x6a3/0x700 [nouveau]
Apr 04 17:18:00 kernel:  nouveau_drm_ioctl+0xac/0xc0 [nouveau]
Apr 04 17:18:00 kernel: nouveau 0000:01:00.0: gr: init failed, -16
Apr 04 17:18:02 kernel: nouveau 0000:01:00.0: timeout
Apr 04 17:18:02 kernel: WARNING: CPU: 10 PID: 1594 at drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgf100.c:207 gf100_vmm_flush_+0x17b/0x190 [nouveau]
Apr 04 17:18:02 kernel:  videobuf2_memops btintel snd_hda_core videobuf2_v4l2 mdev iwlwifi vfio_iommu_type1 snd_hwdep videobuf2_common bluetooth vfio snd_seq joydev videodev snd_seq_device kvm media snd_pcm cfg80211 wmi_bmof intel_wmi_thunderbolt ecdh_generic idma64 thinkpad_acpi ucsi_acpi mei_me snd_timer processor_thermal_device thunderbolt intel_lpss_pci i2c_i801 typec_ucsi mei ledtrig_audio irqbypass intel_pch_thermal intel_lpss intel_soc_dts_iosf snd typec soundcore rfkill int3403_thermal int340x_thermal_zone pcc_cpufreq acpi_pad int3400_thermal acpi_thermal_rel dm_crypt nouveau crct10dif_pclmul mxm_wmi i2c_algo_bit crc32_pclmul drm_kms_helper ttm crc32c_intel nvme drm e1000e ghash_clmulni_intel nvme_core serio_raw wmi video uas usb_storage
Apr 04 17:18:02 kernel: RIP: 0010:gf100_vmm_flush_+0x17b/0x190 [nouveau]
Apr 04 17:18:02 kernel:  nvkm_vmm_iter.constprop.9+0x352/0x810 [nouveau]
Apr 04 17:18:02 kernel:  ? nvkm_vmm_free_insert+0x80/0x80 [nouveau]
Apr 04 17:18:02 kernel:  ? gf100_vmm_aper+0x20/0x20 [nouveau]
Apr 04 17:18:02 kernel:  nvkm_vmm_ptes_unmap_put+0x2a/0x40 [nouveau]
Apr 04 17:18:02 kernel:  ? gf100_vmm_aper+0x20/0x20 [nouveau]
Apr 04 17:18:02 kernel:  nvkm_vmm_put_locked+0xf5/0x210 [nouveau]
Apr 04 17:18:02 kernel:  nvkm_uvmm_mthd+0x37b/0x830 [nouveau]
Apr 04 17:18:02 kernel:  nvkm_ioctl+0xd8/0x170 [nouveau]
Apr 04 17:18:02 kernel:  nvif_object_mthd+0x108/0x130 [nouveau]
Apr 04 17:18:02 kernel:  nvif_vmm_put+0x5c/0x80 [nouveau]
Apr 04 17:18:02 kernel:  nouveau_vma_del+0x70/0xc0 [nouveau]
Apr 04 17:18:02 kernel:  nouveau_gem_object_close+0x1d4/0x200 [nouveau]
Apr 04 17:18:02 kernel:  nouveau_drm_ioctl+0x65/0xc0 [nouveau]
Apr 04 17:18:04 kernel: nouveau 0000:01:00.0: timeout
Apr 04 17:18:04 kernel: WARNING: CPU: 4 PID: 1594 at drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgf100.c:207 gf100_vmm_flush_+0x17b/0x190 [nouveau]
Apr 04 17:18:04 kernel:  videobuf2_memops btintel snd_hda_core videobuf2_v4l2 mdev iwlwifi vfio_iommu_type1 snd_hwdep videobuf2_common bluetooth vfio snd_seq joydev videodev snd_seq_device kvm media snd_pcm cfg80211 wmi_bmof intel_wmi_thunderbolt ecdh_generic idma64 thinkpad_acpi ucsi_acpi mei_me snd_timer processor_thermal_device thunderbolt intel_lpss_pci i2c_i801 typec_ucsi mei ledtrig_audio irqbypass intel_pch_thermal intel_lpss intel_soc_dts_iosf snd typec soundcore rfkill int3403_thermal int340x_thermal_zone pcc_cpufreq acpi_pad int3400_thermal acpi_thermal_rel dm_crypt nouveau crct10dif_pclmul mxm_wmi i2c_algo_bit crc32_pclmul drm_kms_helper ttm crc32c_intel nvme drm e1000e ghash_clmulni_intel nvme_core serio_raw wmi video uas usb_storage
Apr 04 17:18:04 kernel: RIP: 0010:gf100_vmm_flush_+0x17b/0x190 [nouveau]
Apr 04 17:18:04 kernel:  nvkm_vmm_unref_pdes+0xeb/0x1f0 [nouveau]
Apr 04 17:18:04 kernel:  nvkm_vmm_unref_ptes+0x1bc/0x1f0 [nouveau]
Apr 04 17:18:04 kernel:  ? nv50_instobj_release+0x74/0xc0 [nouveau]
Apr 04 17:18:04 kernel:  nvkm_vmm_iter.constprop.9+0x26d/0x810 [nouveau]
Apr 04 17:18:04 kernel:  ? nvkm_vmm_free_insert+0x80/0x80 [nouveau]
Apr 04 17:18:04 kernel:  ? gf100_vmm_aper+0x20/0x20 [nouveau]
Apr 04 17:18:04 kernel:  nvkm_vmm_ptes_unmap_put+0x2a/0x40 [nouveau]
Apr 04 17:18:04 kernel:  ? gf100_vmm_aper+0x20/0x20 [nouveau]
Apr 04 17:18:04 kernel:  nvkm_vmm_put_locked+0xf5/0x210 [nouveau]
Apr 04 17:18:04 kernel:  nvkm_uvmm_mthd+0x37b/0x830 [nouveau]
Apr 04 17:18:04 kernel:  nvkm_ioctl+0xd8/0x170 [nouveau]
Apr 04 17:18:04 kernel:  nvif_object_mthd+0x108/0x130 [nouveau]
Apr 04 17:18:04 kernel:  nvif_vmm_put+0x5c/0x80 [nouveau]
Apr 04 17:18:04 kernel:  nouveau_vma_del+0x70/0xc0 [nouveau]
Apr 04 17:18:04 kernel:  nouveau_gem_object_close+0x1d4/0x200 [nouveau]
Apr 04 17:18:04 kernel:  nouveau_drm_ioctl+0x65/0xc0 [nouveau]
...
...
...
Apr 04 17:19:09 kernel: nouveau 0000:01:00.0: timeout
Apr 04 17:19:09 kernel: WARNING: CPU: 0 PID: 336 at drivers/gpu/drm/nouveau/nvkm/engine/disp/sornv50.c:43 nv50_sor_power_wait+0x99/0xb0 [nouveau]
Apr 04 17:19:09 kernel:  videobuf2_memops btintel snd_hda_core videobuf2_v4l2 mdev iwlwifi vfio_iommu_type1 snd_hwdep videobuf2_common bluetooth vfio snd_seq joydev videodev snd_seq_device kvm media snd_pcm cfg80211 wmi_bmof intel_wmi_thunderbolt ecdh_generic idma64 thinkpad_acpi ucsi_acpi mei_me snd_timer processor_thermal_device thunderbolt intel_lpss_pci i2c_i801 typec_ucsi mei ledtrig_audio irqbypass intel_pch_thermal intel_lpss intel_soc_dts_iosf snd typec soundcore rfkill int3403_thermal int340x_thermal_zone pcc_cpufreq acpi_pad int3400_thermal acpi_thermal_rel dm_crypt nouveau crct10dif_pclmul mxm_wmi i2c_algo_bit crc32_pclmul drm_kms_helper ttm crc32c_intel nvme drm e1000e ghash_clmulni_intel nvme_core serio_raw wmi video uas usb_storage
Apr 04 17:19:09 kernel: RIP: 0010:nv50_sor_power_wait+0x99/0xb0 [nouveau]
Apr 04 17:19:09 kernel:  nv50_sor_power+0xa6/0x130 [nouveau]
Apr 04 17:19:09 kernel:  nvkm_disp_init+0xb6/0xd0 [nouveau]
Apr 04 17:19:09 kernel:  nvkm_engine_init+0xaa/0x1e0 [nouveau]
Apr 04 17:19:09 kernel:  nvkm_subdev_init+0xb2/0x200 [nouveau]
Apr 04 17:19:09 kernel:  nvkm_device_fini+0xb7/0x1c0 [nouveau]
Apr 04 17:19:09 kernel:  nvkm_udevice_fini+0x4c/0x60 [nouveau]
Apr 04 17:19:09 kernel:  nvkm_object_fini+0xbc/0x150 [nouveau]
Apr 04 17:19:09 kernel:  nvkm_object_fini+0x73/0x150 [nouveau]
Apr 04 17:19:09 kernel:  nouveau_do_suspend+0xfd/0x2c0 [nouveau]
Apr 04 17:19:09 kernel:  nouveau_pmops_runtime_suspend+0x42/0xa0 [nouveau]
Apr 04 17:19:11 kernel: nouveau 0000:01:00.0: timeout
Apr 04 17:19:11 kernel: WARNING: CPU: 0 PID: 336 at drivers/gpu/drm/nouveau/nvkm/engine/disp/sornv50.c:63 nv50_sor_power+0x127/0x130 [nouveau]
Apr 04 17:19:11 kernel:  videobuf2_memops btintel snd_hda_core videobuf2_v4l2 mdev iwlwifi vfio_iommu_type1 snd_hwdep videobuf2_common bluetooth vfio snd_seq joydev videodev snd_seq_device kvm media snd_pcm cfg80211 wmi_bmof intel_wmi_thunderbolt ecdh_generic idma64 thinkpad_acpi ucsi_acpi mei_me snd_timer processor_thermal_device thunderbolt intel_lpss_pci i2c_i801 typec_ucsi mei ledtrig_audio irqbypass intel_pch_thermal intel_lpss intel_soc_dts_iosf snd typec soundcore rfkill int3403_thermal int340x_thermal_zone pcc_cpufreq acpi_pad int3400_thermal acpi_thermal_rel dm_crypt nouveau crct10dif_pclmul mxm_wmi i2c_algo_bit crc32_pclmul drm_kms_helper ttm crc32c_intel nvme drm e1000e ghash_clmulni_intel nvme_core serio_raw wmi video uas usb_storage
Apr 04 17:19:11 kernel: RIP: 0010:nv50_sor_power+0x127/0x130 [nouveau]
Apr 04 17:19:11 kernel:  nvkm_disp_init+0xb6/0xd0 [nouveau]
Apr 04 17:19:11 kernel:  nvkm_engine_init+0xaa/0x1e0 [nouveau]
Apr 04 17:19:11 kernel:  nvkm_subdev_init+0xb2/0x200 [nouveau]
Apr 04 17:19:11 kernel:  nvkm_device_fini+0xb7/0x1c0 [nouveau]
Apr 04 17:19:11 kernel:  nvkm_udevice_fini+0x4c/0x60 [nouveau]
Apr 04 17:19:11 kernel:  nvkm_object_fini+0xbc/0x150 [nouveau]
Apr 04 17:19:11 kernel:  nvkm_object_fini+0x73/0x150 [nouveau]
Apr 04 17:19:11 kernel:  nouveau_do_suspend+0xfd/0x2c0 [nouveau]
Apr 04 17:19:11 kernel:  nouveau_pmops_runtime_suspend+0x42/0xa0 [nouveau]
Apr 04 17:19:11 kernel: nouveau: DRM-master:00000000:00000080: suspend failed with -110
Apr 04 17:19:13 kernel: nouveau 0000:01:00.0: timeout
Apr 04 17:19:13 kernel: WARNING: CPU: 0 PID: 336 at drivers/gpu/drm/nouveau/nvkm/engine/disp/piocgf119.c:63 gf119_disp_pioc_init+0xdc/0x130 [nouveau]
Apr 04 17:19:13 kernel:  videobuf2_memops btintel snd_hda_core videobuf2_v4l2 mdev iwlwifi vfio_iommu_type1 snd_hwdep videobuf2_common bluetooth vfio snd_seq joydev videodev snd_seq_device kvm media snd_pcm cfg80211 wmi_bmof intel_wmi_thunderbolt ecdh_generic idma64 thinkpad_acpi ucsi_acpi mei_me snd_timer processor_thermal_device thunderbolt intel_lpss_pci i2c_i801 typec_ucsi mei ledtrig_audio irqbypass intel_pch_thermal intel_lpss intel_soc_dts_iosf snd typec soundcore rfkill int3403_thermal int340x_thermal_zone pcc_cpufreq acpi_pad int3400_thermal acpi_thermal_rel dm_crypt nouveau crct10dif_pclmul mxm_wmi i2c_algo_bit crc32_pclmul drm_kms_helper ttm crc32c_intel nvme drm e1000e ghash_clmulni_intel nvme_core serio_raw wmi video uas usb_storage
Apr 04 17:19:13 kernel: RIP: 0010:gf119_disp_pioc_init+0xdc/0x130 [nouveau]
Apr 04 17:19:13 kernel:  nvkm_object_init+0x3e/0x100 [nouveau]
Apr 04 17:19:13 kernel:  nvkm_object_init+0x71/0x100 [nouveau]
Apr 04 17:19:13 kernel:  nvkm_object_init+0x71/0x100 [nouveau]
Apr 04 17:19:13 kernel:  nvkm_object_init+0x71/0x100 [nouveau]
Apr 04 17:19:13 kernel:  nvkm_object_fini+0x137/0x150 [nouveau]
Apr 04 17:19:13 kernel:  nouveau_do_suspend+0xfd/0x2c0 [nouveau]
Apr 04 17:19:13 kernel:  nouveau_pmops_runtime_suspend+0x42/0xa0 [nouveau]
Apr 04 17:19:13 kernel: nouveau 0000:01:00.0: disp: ch 20 init: bad00100
Apr 04 17:19:13 kernel: nouveau: DRM:00000000:0000917a: init failed with -16
Apr 04 17:19:13 kernel: nouveau: DRM:00000000:00009870: init failed with -16
Apr 04 17:19:13 kernel: nouveau: DRM:00000000:00000080: init failed with -16
Apr 04 17:19:13 kernel: nouveau: DRM-master:00000000:00000000: init failed with -16
Apr 04 17:19:13 kernel: RIP: 0010:evo_wait+0x5a/0x130 [nouveau]
Apr 04 17:19:13 kernel:  core507d_init+0x1d/0x70 [nouveau]
Apr 04 17:19:13 kernel:  nv50_display_init+0x34/0xf0 [nouveau]
Apr 04 17:19:13 kernel:  nouveau_display_init+0x36/0xe0 [nouveau]
Apr 04 17:19:13 kernel:  nouveau_display_resume+0x39/0x250 [nouveau]
Apr 04 17:19:13 kernel:  nouveau_do_suspend+0x156/0x2c0 [nouveau]
Apr 04 17:19:13 kernel:  nouveau_pmops_runtime_suspend+0x42/0xa0 [nouveau]
Apr 04 17:19:13 kernel:  videobuf2_memops btintel snd_hda_core videobuf2_v4l2 mdev iwlwifi vfio_iommu_type1 snd_hwdep videobuf2_common bluetooth vfio snd_seq joydev videodev snd_seq_device kvm media snd_pcm cfg80211 wmi_bmof intel_wmi_thunderbolt ecdh_generic idma64 thinkpad_acpi ucsi_acpi mei_me snd_timer processor_thermal_device thunderbolt intel_lpss_pci i2c_i801 typec_ucsi mei ledtrig_audio irqbypass intel_pch_thermal intel_lpss intel_soc_dts_iosf snd typec soundcore rfkill int3403_thermal int340x_thermal_zone pcc_cpufreq acpi_pad int3400_thermal acpi_thermal_rel dm_crypt nouveau crct10dif_pclmul mxm_wmi i2c_algo_bit crc32_pclmul drm_kms_helper ttm crc32c_intel nvme drm e1000e ghash_clmulni_intel nvme_core serio_raw wmi video uas usb_storage
Apr 04 17:19:13 kernel: RIP: 0010:evo_wait+0x5a/0x130 [nouveau]

Comment 12 Ben Cotton 2019-08-13 16:57:12 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle.
Changing version to '31'.

Comment 13 Ben Cotton 2019-08-13 19:28:46 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle.
Changing version to 31.

Comment 14 Eugene Mah 2019-11-03 18:00:30 UTC
Seeing this error coming up on one of my systems with the 5.3.8 kernel on F31.  Screen freezes up and I need to reboot the system.  After rebooting, the system is function for some seemingly random period of time before it starts acting up again.

~> lspci -d 10de:05e2 -vnn
03:00.0 VGA compatible controller [0300]: NVIDIA Corporation GT200 [GeForce GTX 260] [10de:05e2] (rev a1) (prog-if 00 [VGA controller])
Subsystem: eVga.com. Corp. Device [3842:1255]
Flags: bus master, fast devsel, latency 0, IRQ 44
Memory at f8000000 (32-bit, non-prefetchable) [size=16M]
Memory at e0000000 (64-bit, prefetchable) [size=256M]
Memory at f6000000 (64-bit, non-prefetchable) [size=32M]
I/O ports at af00 [size=128]
[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities: <access denied>
Kernel driver in use: nouveau
Kernel modules: nouveau

From journalctl --no-hostname -k -b 0 |grep nouveau output, the problem seems to start with 

Nov 03 12:28:20 kernel: nouveau 0000:03:00.0: disp: ERROR 5 [INVALID_STATE] 0b [] chid 0 mthd 0080 data 00000000

and then followed by a stream of these errors

Nov 03 12:40:49 kernel: nouveau: evo channel stalled
Nov 03 12:40:51 kernel: nouveau 0000:03:00.0: DRM: base-1: timeout

Comment 15 Eugene Mah 2019-11-30 23:53:44 UTC
From the most recent occurrence on my system with kernel 5.3.13-300

Nov 30 18:41:58 kernel: nouveau 0000:03:00.0: disp: ERROR 1 [PUSHBUFFER_ERR] 01 [] chid 0 mthd 0000 data 00000000
Nov 30 18:42:00 kernel: nouveau 0000:03:00.0: DRM: core notifier timeout
Nov 30 18:42:02 kernel: nouveau 0000:03:00.0: DRM: base-1: timeout
Nov 30 18:42:04 kernel: nouveau 0000:03:00.0: DRM: base-1: timeout
Nov 30 18:42:06 kernel: nouveau: evo channel stalled
Nov 30 18:42:08 kernel: nouveau 0000:03:00.0: DRM: base-1: timeout
Nov 30 18:42:10 kernel: nouveau 0000:03:00.0: DRM: base-1: timeout
Nov 30 18:42:12 kernel: nouveau 0000:03:00.0: DRM: base-1: timeout
Nov 30 18:42:14 kernel: nouveau 0000:03:00.0: DRM: core notifier timeout
Nov 30 18:42:14 kernel: nouveau 0000:03:00.0: DRM: base-1: timeout
Nov 30 18:42:16 kernel: nouveau 0000:03:00.0: DRM: base-0: timeout
Nov 30 18:42:18 kernel: nouveau 0000:03:00.0: DRM: base-0: timeout
Nov 30 18:42:18 kernel: nouveau 0000:03:00.0: DRM: base-1: timeout
Nov 30 18:42:20 kernel: nouveau 0000:03:00.0: DRM: base-0: timeout

Comment 16 Yaniv Ferszt 2019-12-12 11:31:35 UTC
I see the same issue

cat /etc/redhat-release 
Fedora release 31 (Thirty One)

$ uname -a
Linux yferszt-fc 5.3.15-300.fc31.x86_64 #1 SMP Thu Dec 5 15:04:01 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

lspci | grep -e VGA
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 530 (rev 06)
01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M1000M] (rev a2)

=====
Dec 12 12:07:51 yferszt-fc kernel: nouveau: evo channel stalled
Dec 12 12:08:02 yferszt-fc kernel: nouveau 0000:01:00.0: DRM: core notifier timeout
Dec 12 12:08:12 yferszt-fc kernel: nouveau 0000:01:00.0: DRM: base-0: timeout
Dec 12 12:08:15 yferszt-fc kernel: nouveau 0000:01:00.0: DRM: core notifier timeout
=====


GUI freezes and ssh is still working. rebooting the system fixes the issue till it happens randomly again.

Comment 17 Dominik 'Rathann' Mierzejewski 2020-04-08 20:57:01 UTC
Still reproducible on F31:
$ uname -r
5.5.15-200.fc31.x86_64
$ dmesg|grep nouveau
[    6.105904] nouveau 0000:01:00.0: NVIDIA G98 (298480a2)
[    6.196288] nouveau 0000:01:00.0: bios: version 62.98.3c.00.44
[    6.234673] nouveau 0000:01:00.0: bios: M0203T not found
[    6.234677] nouveau 0000:01:00.0: bios: M0203E not matched!
[    6.234680] nouveau 0000:01:00.0: fb: 512 MiB DDR2
[    6.347632] nouveau 0000:01:00.0: DRM: VRAM: 512 MiB
[    6.347634] nouveau 0000:01:00.0: DRM: GART: 1048576 MiB
[    6.347640] nouveau 0000:01:00.0: DRM: TMDS table version 2.0
[    6.347643] nouveau 0000:01:00.0: DRM: DCB version 4.0
[    6.347646] nouveau 0000:01:00.0: DRM: DCB outp 00: 01011323 00010034
[    6.347649] nouveau 0000:01:00.0: DRM: DCB outp 01: 02000300 00000028
[    6.347651] nouveau 0000:01:00.0: DRM: DCB outp 02: 02022312 00020030
[    6.347653] nouveau 0000:01:00.0: DRM: DCB conn 00: 00000000
[    6.347655] nouveau 0000:01:00.0: DRM: DCB conn 01: 00000140
[    6.347657] nouveau 0000:01:00.0: DRM: DCB conn 02: 00002261
[    6.347659] nouveau 0000:01:00.0: DRM: DCB conn 07: 00000513
[    6.351313] nouveau 0000:01:00.0: DRM: MM: using M2MF for buffer copies
[    6.440483] nouveau 0000:01:00.0: DRM: allocated 1440x900 fb: 0x50000, bo (____ptrval____)
[    6.440646] fbcon: nouveaudrmfb (fb0) is primary device
[    8.442191] nouveau 0000:01:00.0: DRM: core notifier timeout
[   10.442203] nouveau 0000:01:00.0: DRM: base-0: timeout
[   10.446983] nouveau 0000:01:00.0: fb0: nouveaudrmfb frame buffer device
[   10.454934] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 0
[   12.627942] nouveau 0000:01:00.0: DRM: core notifier timeout
[   14.628130] nouveau 0000:01:00.0: DRM: base-0: timeout
[ 2126.158743] nouveau 0000:01:00.0: DRM: core notifier timeout
[ 2129.529458] nouveau 0000:01:00.0: DRM: core notifier timeout
[ 2131.550202] nouveau 0000:01:00.0: DRM: base-0: timeout
[ 2153.738236] nouveau 0000:01:00.0: DRM: base-0: timeout
[ 2155.738333] nouveau 0000:01:00.0: DRM: core notifier timeout
[ 2161.063900] nouveau 0000:01:00.0: DRM: base-0: timeout
[ 2163.080106] nouveau 0000:01:00.0: DRM: base-0: timeout
[ 2165.182431] nouveau 0000:01:00.0: DRM: base-0: timeout
[ 2167.203106] nouveau 0000:01:00.0: DRM: base-0: timeout

This is the only GPU in this machine, so I don't have the luxury of using a built-in Intel GPU waiting for a fix.

Comment 18 infrandomness 2020-04-15 00:29:08 UTC
Happening on F31 with

Linux ir-pc 5.3.7-301.fc31.x86_64 #1 SMP Mon Oct 21 19:18:58 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Apr 15 02:01:43 ir-pc kernel: nouveau 0000:1d:00.0: DRM: base-1: timeout
Apr 15 02:01:45 ir-pc kernel: nouveau 0000:1d:00.0: DRM: base-1: timeout
Apr 15 02:01:47 ir-pc kernel: nouveau 0000:1d:00.0: DRM: base-1: timeout
Apr 15 02:01:49 ir-pc kernel: nouveau 0000:1d:00.0: DRM: base-1: timeout
Apr 15 02:01:51 ir-pc kernel: nouveau 0000:1d:00.0: DRM: base-1: timeout
Apr 15 02:01:53 ir-pc kernel: nouveau 0000:1d:00.0: DRM: base-1: timeout
Apr 15 02:01:55 ir-pc kernel: nouveau 0000:1d:00.0: DRM: base-1: timeout

Comment 19 Manuel Andres Garcia Vazquez 2020-04-15 02:00:29 UTC
Same here!

$ cat /etc/system-release
Fedora release 31 (Thirty One)


$ uname -a
Linux 5.5.15-200.fc31.x86_64 #1 SMP Thu Apr 2 19:16:17 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux


$ lspci -d 10de:1b80 -vnn
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1080] [10de:1b80] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: ASUSTeK Computer Inc. Device [1043:8592]
        Flags: bus master, fast devsel, latency 0, IRQ 136
        Memory at de000000 (32-bit, non-prefetchable) [size=16M]
        Memory at c0000000 (64-bit, prefetchable) [size=256M]
        Memory at d0000000 (64-bit, prefetchable) [size=32M]
        I/O ports at e000 [size=128]
        Expansion ROM at 000c0000 [disabled] [size=128K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Legacy Endpoint, MSI 00
        Capabilities: [100] Virtual Channel
        Capabilities: [250] Latency Tolerance Reporting
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [420] Advanced Error Reporting
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Capabilities: [900] Secondary PCI Express
        Kernel driver in use: nouveau
        Kernel modules: nouveau


$ dmseg -w | grep nouveau
[    2.115632] fb0: switching to nouveaufb from EFI VGA
[    2.115708] nouveau 0000:01:00.0: NVIDIA GP104 (134000a1)
[    2.221635] nouveau 0000:01:00.0: bios: version ESC[32m86.04.17.00ESC[m.1c
[    2.222157] nouveau 0000:01:00.0: fb: 8192 MiB GDDR5X
[    2.229078] nouveau 0000:01:00.0: DRM: VRAM: 8192 MiB
[    2.229079] nouveau 0000:01:00.0: DRM: GART: 536870912 MiB
[    2.229080] nouveau 0000:01:00.0: DRM: BIT table 'A' not found
[    2.229081] nouveau 0000:01:00.0: DRM: BIT table 'L' not found
[    2.229082] nouveau 0000:01:00.0: DRM: TMDS table version 2.0
[    2.229083] nouveau 0000:01:00.0: DRM: DCB version 4.1
[    2.229084] nouveau 0000:01:00.0: DRM: DCB outp 00: 01000f42 00020030
[    2.229085] nouveau 0000:01:00.0: DRM: DCB outp 01: 04811f96 04600020
[    2.229086] nouveau 0000:01:00.0: DRM: DCB outp 02: 04011f92 00020020
[    2.229087] nouveau 0000:01:00.0: DRM: DCB outp 03: 04822f86 04600010
[    2.229088] nouveau 0000:01:00.0: DRM: DCB outp 04: 04022f82 00020010
[    2.229089] nouveau 0000:01:00.0: DRM: DCB outp 06: 02033f62 00020010
[    2.229089] nouveau 0000:01:00.0: DRM: DCB outp 08: 02044f72 00020020
[    2.229090] nouveau 0000:01:00.0: DRM: DCB conn 00: 00001031
[    2.229091] nouveau 0000:01:00.0: DRM: DCB conn 01: 02000146
[    2.229092] nouveau 0000:01:00.0: DRM: DCB conn 02: 01000246
[    2.229093] nouveau 0000:01:00.0: DRM: DCB conn 03: 00010361
[    2.229093] nouveau 0000:01:00.0: DRM: DCB conn 04: 00020461
[    2.229390] nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies
[    2.857595] nouveau 0000:01:00.0: DRM: allocated 1920x1080 fb: 0x200000, bo 00000000188c5ccc
[    2.892257] fbcon: nouveaudrmfb (fb0) is primary device
[    2.892259] nouveau 0000:01:00.0: fb0: nouveaudrmfb frame buffer device
[    2.944672] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 0
               NVRM: nouveau, rivafb, nvidiafb or rivatv
               NVRM: nouveau, rivafb, nvidiafb or rivatv
               NVRM: nouveau, rivafb, nvidiafb or rivatv
               NVRM: nouveau, rivafb, nvidiafb or rivatv
               NVRM: nouveau, rivafb, nvidiafb or rivatv
               NVRM: nouveau, rivafb, nvidiafb or rivatv
[33245.082049] nouveau 0000:01:00.0: disp: chid 0 mthd 0080 data 00000002 00005080 00000015
[33245.082050] nouveau 0000:01:00.0: disp: Core:
[33245.082054] nouveau 0000:01:00.0: disp:      0080: 00000000 -> 00000002
[33245.082057] nouveau 0000:01:00.0: disp:      0084: 00000000 -> 80000000
[33245.082060] nouveau 0000:01:00.0: disp:      0088: f0000000
[33245.082061] nouveau 0000:01:00.0: disp: Core - DAC 0:
[33245.082065] nouveau 0000:01:00.0: disp:      0180: 00000000
[33245.082068] nouveau 0000:01:00.0: disp:      0184: 00000000
[33245.082073] nouveau 0000:01:00.0: disp:      0188: 00000000
[33245.082075] nouveau 0000:01:00.0: disp:      0190: 00000000
[33245.082076] nouveau 0000:01:00.0: disp: Core - DAC 1:
[33245.082095] nouveau 0000:01:00.0: disp:      01a0: 00000000
[33245.082099] nouveau 0000:01:00.0: disp:      01a4: 00000000
[33245.082105] nouveau 0000:01:00.0: disp:      01a8: 00000000
[33245.082110] nouveau 0000:01:00.0: disp:      01b0: 00000000
{...}
[33267.065439] nouveau 0000:01:00.0: disp:      0e54: 00000000
[33267.065442] nouveau 0000:01:00.0: disp:      0e58: 00000000
[33267.065446] nouveau 0000:01:00.0: disp:      0e5c: 00000001
[33269.548289] nouveau 0000:01:00.0: DRM: base-0: timeout
[33271.562467] nouveau: evo channel stalled
[33273.562548] nouveau 0000:01:00.0: DRM: base-0: timeout
[33275.565515] nouveau 0000:01:00.0: DRM: base-0: timeout
[33277.568617] nouveau 0000:01:00.0: DRM: base-0: timeout
[33279.570770] nouveau 0000:01:00.0: DRM: base-0: timeout
[33281.575781] nouveau 0000:01:00.0: DRM: base-0: timeout
[33283.595087] nouveau 0000:01:00.0: DRM: base-0: timeout
[33285.598140] nouveau 0000:01:00.0: DRM: base-0: timeout
[33287.601285] nouveau 0000:01:00.0: DRM: base-0: timeout
[33289.604393] nouveau 0000:01:00.0: DRM: base-0: timeout
[33291.644816] nouveau 0000:01:00.0: DRM: base-0: timeout
{end}

Comment 20 ravvle 2020-04-23 01:44:29 UTC
I get this issues on Fedora 32 beta kernel 5.6.5-300.fc32.x86_64, I'm using a Nvidia GeForce 1070. I don't get this issue straight away on startup, i get it at random throughout the day. Usually 3 or 4 times a day, and even once while writing this :)

using: 01:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] (rev a1)

In my journalctl logs i get:
Apr 23 10:57:24 pc kernel: nouveau 0000:01:00.0: DRM: base-0: timeout
Apr 23 10:57:22 pc kernel: nouveau 0000:01:00.0: DRM: base-0: timeout
Apr 23 10:57:20 pc kernel: nouveau 0000:01:00.0: DRM: base-0: timeout
Apr 23 10:57:18 pc kernel: nouveau 0000:01:00.0: DRM: base-0: timeout
Apr 23 10:57:16 pc kernel: nouveau 0000:01:00.0: DRM: base-0: timeout
Apr 23 10:57:14 pc kernel: nouveau 0000:01:00.0: DRM: base-0: timeout
Apr 23 10:57:12 pc kernel: nouveau 0000:01:00.0: DRM: core notifier timeout

A reboot seems to be the only way to fix it.

Comment 21 Jan Pokorný [poki] 2020-07-27 21:19:30 UTC
Also hit this problem of frozen screen with Nvidia GP107GL [Quadro P400].

The system in the background was working just fine, as attested with
blind switching to next VT (where I was also logged in) and killing
sound producing program by name.

NOTE: would be nice to have a command handy to try to recover from
      this problem, e.g. when one can still enter the commands
      as was my case per above, or when ssh connection is an option
      -- if at all possible, of course

Luckily, it was the only manifestation of the problem in about 3 months
IIRC.

Jul 27 17:40:43 sway[2504]: 2020-07-27 17:40:43 - [sway/commands.c:255] Handling command 'workspace 9'
Jul 27 17:40:43 sway[2504]: 2020-07-27 17:40:43 - [sway/commands.c:255] Handling command 'workspace 10'
Jul 27 17:40:44 sway[2504]: 2020-07-27 17:40:44 - [sway/commands.c:255] Handling command 'workspace 9'
Jul 27 17:40:44 sway[2504]: 2020-07-27 17:40:44 - [sway/commands.c:255] Handling command 'workspace 8'
Jul 27 17:40:45 sway[2504]: 2020-07-27 17:40:45 - [sway/commands.c:255] Handling command 'workspace 7'
Jul 27 17:40:45 sway[2504]: 2020-07-27 17:40:45 - [sway/commands.c:255] Handling command 'workspace 6'
Jul 27 17:40:45 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a00 data 0000a004 10003a00 00000000
Jul 27 17:40:45 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a04 data 0000cf00 10003a04 00000000
Jul 27 17:40:45 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a08 data 00040084 10003a08 00000000
Jul 27 17:40:45 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a0c data 00000010 10003a0c 00000000
Jul 27 17:40:45 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a10 data 000400c0 10003a10 00000000
Jul 27 17:40:45 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a14 data fb0000fe 10003a14 00000000
Jul 27 17:40:45 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a18 data 00140400 10003a18 00000000
Jul 27 17:40:45 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a1c data 002e4000 10003a1c 00000000
Jul 27 17:40:45 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a20 data 00000000 10003a20 00000000
Jul 27 17:40:45 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a24 data 05a00a00 10003a24 00000000
Jul 27 17:40:45 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a28 data 0000a004 10003a28 00000000
Jul 27 17:40:45 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a2c data 0000cf00 10003a2c 00000000
Jul 27 17:40:45 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a30 data 00040080 10003a30 00000000
Jul 27 17:40:45 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a34 data 00000000 10003a34 00000000
Jul 27 17:40:47 kernel: nouveau 0000:65:00.0: DRM: base-3: timeout
Jul 27 17:40:47 sway[2504]: 2020-07-27 17:40:47 - [sway/commands.c:255] Handling command 'workspace 6'
Jul 27 17:40:47 sway[2504]: 2020-07-27 17:40:47 - [sway/commands.c:255] Handling command 'workspace 4'
Jul 27 17:40:47 sway[2504]: 2020-07-27 17:40:47 - [sway/commands.c:255] Handling command 'workspace 3'
Jul 27 17:40:47 sway[2504]: 2020-07-27 17:40:47 - [sway/commands.c:255] Handling command 'workspace 4'
Jul 27 17:40:47 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a38 data 000800a0 10003a38 00000000
Jul 27 17:40:47 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a3c data 00000130 10003a3c 00000000
Jul 27 17:40:47 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a40 data f0000000 10003a40 00000000
Jul 27 17:40:47 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a44 data 00040084 10003a44 00000000
Jul 27 17:40:47 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a48 data 00000010 10003a48 00000000
Jul 27 17:40:47 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a4c data 000400c0 10003a4c 00000000
Jul 27 17:40:47 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a50 data fb0000fe 10003a50 00000000
Jul 27 17:40:47 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a54 data 00140400 10003a54 00000000
Jul 27 17:40:47 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a58 data 003c8000 10003a58 00000000
Jul 27 17:40:47 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a5c data 00000000 10003a5c 00000000
Jul 27 17:40:47 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a60 data 05a00a00 10003a60 00000000
Jul 27 17:40:47 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a64 data 0000a004 10003a64 00000000
Jul 27 17:40:47 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a68 data 0000cf00 10003a68 00000000
Jul 27 17:40:47 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a6c data 00040080 10003a6c 00000000
Jul 27 17:40:47 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a70 data 00000000 10003a70 00000000
Jul 27 17:40:49 kernel: nouveau 0000:65:00.0: DRM: base-3: timeout
Jul 27 17:40:49 sway[2504]: 2020-07-27 17:40:49 - [sway/commands.c:255] Handling command 'workspace 4'
Jul 27 17:40:50 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a74 data 000800a0 10003a74 00000000
Jul 27 17:40:50 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a78 data 00000120 10003a78 00000000
Jul 27 17:40:50 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a7c data f0000000 10003a7c 00000000
Jul 27 17:40:50 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a80 data 00040084 10003a80 00000000
Jul 27 17:40:50 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a84 data 00000010 10003a84 00000000
Jul 27 17:40:50 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a88 data 000400c0 10003a88 00000000
Jul 27 17:40:50 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a8c data fb0000fe 10003a8c 00000000
Jul 27 17:40:50 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a90 data 00140400 10003a90 00000000
Jul 27 17:40:50 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a94 data 002e4000 10003a94 00000000
Jul 27 17:40:50 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a98 data 00000000 10003a98 00000000
Jul 27 17:40:50 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0a9c data 05a00a00 10003a9c 00000000
Jul 27 17:40:50 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0aa0 data 0000a004 10003aa0 00000000
Jul 27 17:40:50 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0aa4 data 0000cf00 10003aa4 00000000
Jul 27 17:40:50 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0aa8 data 00040080 10003aa8 00000000
Jul 27 17:40:50 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0aac data 00000000 10003aac 00000000
Jul 27 17:40:52 kernel: nouveau 0000:65:00.0: DRM: base-3: timeout
Jul 27 17:40:52 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0ab0 data 000800a0 10003ab0 00000000
Jul 27 17:40:52 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0ab4 data 00000130 10003ab4 00000000
Jul 27 17:40:52 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0ab8 data f0000000 10003ab8 00000000
Jul 27 17:40:52 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0abc data 00040084 10003abc 00000000
Jul 27 17:40:52 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0ac0 data 00000010 10003ac0 00000000
Jul 27 17:40:52 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0ac4 data 000400c0 10003ac4 00000000
Jul 27 17:40:52 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0ac8 data fb0000fe 10003ac8 00000000
Jul 27 17:40:52 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0acc data 00140400 10003acc 00000000
Jul 27 17:40:52 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0ad0 data 003c8000 10003ad0 00000000
Jul 27 17:40:52 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0ad4 data 00000000 10003ad4 00000000
Jul 27 17:40:52 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0ad8 data 05a00a00 10003ad8 00000000
Jul 27 17:40:52 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0adc data 0000a004 10003adc 00000000
Jul 27 17:40:52 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0ae0 data 0000cf00 10003ae0 00000000
Jul 27 17:40:52 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0ae4 data 00040080 10003ae4 00000000
Jul 27 17:40:52 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0ae8 data 00000001 10003ae8 00000000
Jul 27 17:40:54 kernel: nouveau 0000:65:00.0: DRM: base-3: timeout
Jul 27 17:40:54 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0aec data 000800a0 10003aec 00000000
Jul 27 17:40:54 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0af0 data 00000120 10003af0 00000000
Jul 27 17:40:54 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0af4 data f0000000 10003af4 00000000
Jul 27 17:40:54 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0af8 data 00040084 10003af8 00000000
Jul 27 17:40:54 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0afc data 00000010 10003afc 00000000
Jul 27 17:40:54 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b00 data 000400c0 10003b00 00000000
Jul 27 17:40:54 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b04 data fb0000fe 10003b04 00000000
Jul 27 17:40:54 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b08 data 00140400 10003b08 00000000
Jul 27 17:40:54 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b0c data 002e4000 10003b0c 00000000
Jul 27 17:40:54 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b10 data 00000000 10003b10 00000000
Jul 27 17:40:54 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b14 data 05a00a00 10003b14 00000000
Jul 27 17:40:54 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b18 data 0000a004 10003b18 00000000
Jul 27 17:40:54 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b1c data 0000cf00 10003b1c 00000000
Jul 27 17:40:54 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b20 data 00040080 10003b20 00000000
Jul 27 17:40:54 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b24 data 00000000 10003b24 00000000
Jul 27 17:40:54 sway[2504]: 2020-07-27 17:40:54 - [sway/commands.c:255] Handling command 'workspace 2'
Jul 27 17:40:56 kernel: nouveau 0000:65:00.0: DRM: base-3: timeout
Jul 27 17:40:56 sway[2504]: 2020-07-27 17:40:56 - [sway/commands.c:255] Handling command 'workspace 2'
Jul 27 17:40:58 kernel: nouveau 0000:65:00.0: DRM: core notifier timeout
Jul 27 17:40:58 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b28 data 000800a0 10003b28 00000000
Jul 27 17:40:58 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b2c data 00000130 10003b2c 00000000
Jul 27 17:40:58 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b30 data f0000000 10003b30 00000000
Jul 27 17:40:58 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b34 data 00040084 10003b34 00000000
Jul 27 17:40:58 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b38 data 00000010 10003b38 00000000
Jul 27 17:40:58 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b3c data 000400c0 10003b3c 00000000
Jul 27 17:40:58 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b40 data fb0000fe 10003b40 00000000
Jul 27 17:40:58 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b44 data 00140400 10003b44 00000000
Jul 27 17:40:58 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b48 data 003c8000 10003b48 00000000
Jul 27 17:40:58 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b4c data 00000000 10003b4c 00000000
Jul 27 17:40:58 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b50 data 05a00a00 10003b50 00000000
Jul 27 17:40:58 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b54 data 0000a004 10003b54 00000000
Jul 27 17:40:58 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b58 data 0000cf00 10003b58 00000000
Jul 27 17:40:58 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b5c data 00040080 10003b5c 00000000
Jul 27 17:40:58 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b60 data 00000000 10003b60 00000000
Jul 27 17:41:00 kernel: nouveau 0000:65:00.0: DRM: base-3: timeout
Jul 27 17:41:02 kernel: nouveau 0000:65:00.0: DRM: core notifier timeout
Jul 27 17:41:02 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b64 data 000800a0 10003b64 00000000
Jul 27 17:41:02 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b68 data 00000120 10003b68 00000000
Jul 27 17:41:02 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b6c data f0000000 10003b6c 00000000
Jul 27 17:41:02 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b70 data 00040084 10003b70 00000000
Jul 27 17:41:02 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b74 data 00000010 10003b74 00000000
Jul 27 17:41:02 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b78 data 000400c0 10003b78 00000000
Jul 27 17:41:02 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b7c data fb0000fe 10003b7c 00000000
Jul 27 17:41:02 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b80 data 00140400 10003b80 00000000
Jul 27 17:41:02 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b84 data 002e4000 10003b84 00000000
Jul 27 17:41:02 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b88 data 00000000 10003b88 00000000
Jul 27 17:41:02 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b8c data 05a00a00 10003b8c 00000000
Jul 27 17:41:02 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b90 data 0000a004 10003b90 00000000
Jul 27 17:41:02 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b94 data 0000cf00 10003b94 00000000
Jul 27 17:41:02 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b98 data 00040080 10003b98 00000000
Jul 27 17:41:02 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0b9c data 00000000 10003b9c 00000000
Jul 27 17:41:04 kernel: nouveau 0000:65:00.0: DRM: base-3: timeout
Jul 27 17:41:04 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0ba0 data 000800a0 10003ba0 00000000
Jul 27 17:41:04 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0ba4 data 00000130 10003ba4 00000000
Jul 27 17:41:04 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0ba8 data f0000000 10003ba8 00000000
Jul 27 17:41:04 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0bac data 00040084 10003bac 00000000
Jul 27 17:41:04 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0bb0 data 00000010 10003bb0 00000000
Jul 27 17:41:04 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0bb4 data 000400c0 10003bb4 00000000
Jul 27 17:41:04 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0bb8 data fb0000fe 10003bb8 00000000
Jul 27 17:41:04 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0bbc data 00140400 10003bbc 00000000
Jul 27 17:41:04 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0bc0 data 003c8000 10003bc0 00000000
Jul 27 17:41:04 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0bc4 data 00000000 10003bc4 00000000
Jul 27 17:41:04 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0bc8 data 05a00a00 10003bc8 00000000
Jul 27 17:41:04 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0bcc data 0000a004 10003bcc 00000000
Jul 27 17:41:04 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0bd0 data 0000cf00 10003bd0 00000000
Jul 27 17:41:04 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0bd4 data 00040080 10003bd4 00000000
Jul 27 17:41:04 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0bd8 data 00000000 10003bd8 00000000
Jul 27 17:41:06 kernel: nouveau 0000:65:00.0: DRM: base-3: timeout
Jul 27 17:41:08 kernel: nouveau 0000:65:00.0: DRM: core notifier timeout
Jul 27 17:41:08 kernel: nouveau 0000:65:00.0: disp: chid 4 mthd 0bdc data 000800a0 10003bdc 00000000
[...]

As you can see, the problem might have been provoked with rather heavy
workspace (virtual desktop) switching under Sway WM.

# uname -r
5.7.0-0.rc7.20200529gitb0c3ba31be3e.1.fc33.x86_64

(bumped hence to Rawhide)

# lspci -d 10de:1cb3 -vnn
65:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107GL [Quadro P400] [10de:1cb3] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: NVIDIA Corporation Device [10de:11be]
        Flags: bus master, fast devsel, latency 0, IRQ 49, NUMA node 0
        Memory at df000000 (32-bit, non-prefetchable) [size=16M]
        Memory at c0000000 (64-bit, prefetchable) [size=256M]
        Memory at d0000000 (64-bit, prefetchable) [size=32M]
        I/O ports at b000 [size=128]
        Expansion ROM at e0000000 [disabled] [size=512K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Legacy Endpoint, MSI 00
        Capabilities: [100] Virtual Channel
        Capabilities: [250] Latency Tolerance Reporting
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [420] Advanced Error Reporting
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Capabilities: [900] Secondary PCI Express
        Kernel driver in use: nouveau
        Kernel modules: nouveau

Note that I suspect the original component was wrong, this is rather a problem
with the Nouveau kernel driver as such, since I use minimum of Xorg (the user
space part of the driver stack would rather be mesa-dri-drivers in my case
if any, I think).

But please correct me if I am wrong.

Comment 22 Sergio Basto 2020-07-27 21:32:58 UTC

I ended up using kernel-longterm-4.14 [1] , since I can't boot the computer with any newer kernel release ...

[1] https://copr.fedorainfracloud.org/coprs/kwizart/kernel-longterm-4.14/

Comment 23 Jan Pokorný [poki] 2020-07-28 17:28:46 UTC
Thanks, Sergio.

I have no problem to boot up and all was working very reasonably until recently
but it seems now that I am running into the problem rather frequently -- just
observed another such problem again.

I use sway WM (sway-1.4-7.fc33.x86_64 + wlroots-0.10.1-2.fc33.x86_64) , and
I start to suspect the problem is related with some operations involving:

- some sort of video playback (vlc, zoom)

- heavy use of manual dragging of freefloating "surfaces"
  (overflow-windowed Zoom app, control bar with play/pause etc. of VLC)

- perhaps but not sure, Firefox involvement (one of the conspiracy theories
  being that newer Firefox does more of chunked redrawing or something
  like that)

Anyway, I was now able to get that far as to see observe also this message
that I didn't see originally, and it was connected to switching from TTY1
with running sway to TTY2 with plain VT and back, whereby sway/wlroots
attempted to regain the direct screen access:

Jul 28 18:51:06 sway[2476]: 2020-07-28 18:51:06 - [backend/drm/backend.c:124] DRM fd paused
Jul 28 18:51:27 systemd[1]: getty: Succeeded.
[...]
Jul 28 18:51:42 sway[2476]: 2020-07-28 18:51:42 - [backend/drm/backend.c:91] DRM fd resumed
Jul 28 18:51:42 sway[2476]: 2020-07-28 18:51:42 - [backend/drm/drm.c:1272] Scanning DRM connectors
Jul 28 18:51:43 sway[2476]: 2020-07-28 18:51:43 - [backend/drm/drm.c:693] Modesetting 'DP-1' with '2560x1440@59951 mHz'
Jul 28 18:51:45 kernel: nouveau 0000:65:00.0: DRM: base-0: timeout
Jul 28 18:51:48 kernel: nouveau: evo channel stalled

Also, I was able to blindly kill sway and attempt to run it anew, with
another surprising observation:

Jul 28 18:55:42 sway[44306]: 2020-07-28 18:55:42 - [sway/server.c:207] Running compositor on wayland display 'wayland-0'
[...nothing interesting...]
Jul 28 18:58:34 kernel: INFO: task kworker/u24:4:37180 blocked for more than 122 seconds.
Jul 28 18:58:34 kernel:       Tainted: G        W  O     --------- ---  5.7.0-0.rc7.20200529gitb0c3ba31be3e.1.fc33.x86_64 #1
Jul 28 18:58:34 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 28 18:58:34 kernel: kworker/u24:4   D12240 37180      2 0x80004000
Jul 28 18:58:34 kernel: Workqueue: events_unbound nv50_disp_atomic_commit_work [nouveau]
Jul 28 18:58:34 kernel: Call Trace:
Jul 28 18:58:34 kernel:  __schedule+0x33e/0xa30
Jul 28 18:58:34 kernel:  ? sched_clock+0x5/0x10
Jul 28 18:58:34 kernel:  schedule+0x5f/0xd0
Jul 28 18:58:34 kernel:  schedule_timeout+0xe4/0x120
Jul 28 18:58:34 kernel:  ? mark_held_locks+0x2d/0x80
Jul 28 18:58:34 kernel:  ? _raw_spin_unlock_irqrestore+0x46/0x60
Jul 28 18:58:34 kernel:  ? lockdep_hardirqs_on+0x11e/0x1b0
Jul 28 18:58:34 kernel:  dma_fence_default_wait+0x176/0x210
Jul 28 18:58:34 kernel:  ? dma_fence_free+0x20/0x20
Jul 28 18:58:34 kernel:  dma_fence_wait_timeout+0x1b2/0x250
Jul 28 18:58:34 kernel:  drm_atomic_helper_wait_for_fences+0x7f/0xf0 [drm_kms_helper]
Jul 28 18:58:34 kernel:  nv50_disp_atomic_commit_tail+0x79/0x760 [nouveau]
Jul 28 18:58:34 kernel:  ? sched_clock+0x5/0x10
Jul 28 18:58:34 kernel:  process_one_work+0x269/0x5c0
Jul 28 18:58:34 kernel:  worker_thread+0x55/0x3d0
Jul 28 18:58:34 kernel:  ? process_one_work+0x5c0/0x5c0
Jul 28 18:58:34 kernel:  kthread+0x131/0x150
Jul 28 18:58:34 kernel:  ? __kthread_bind_mask+0x60/0x60
Jul 28 18:58:34 kernel:  ret_from_fork+0x3a/0x50
Jul 28 18:58:34 kernel: 
                        Showing all locks held in the system:
Jul 28 18:58:34 kernel: 1 lock held by khungtaskd/73:
Jul 28 18:58:34 kernel:  #0: ffffffffb8a98760 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x15/0x16f
Jul 28 18:58:34 kernel: 1 lock held by fuse mainloop/3463:
Jul 28 18:58:34 kernel:  #0: ffff8e582a085c70 (&pipe->mutex/1){+.+.}-{3:3}, at: do_splice+0x5cb/0x790
Jul 28 18:58:34 kernel: 1 lock held by fuse mainloop/3464:
Jul 28 18:58:34 kernel:  #0: ffff8e5829b5ca70 (&pipe->mutex/1){+.+.}-{3:3}, at: do_splice+0x5cb/0x790
Jul 28 18:58:34 kernel: 2 locks held by kworker/u24:4/37180:
Jul 28 18:58:34 kernel:  #0: ffff8e58f8411948 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1d4/0x5c0
Jul 28 18:58:34 kernel:  #1: ffffa310feacfe70 ((work_completion)(&state->commit_work)){+.+.}-{0:0}, at: process_one_work+0x1d4/0x5c0
Jul 28 18:58:34 kernel: 2 locks held by sway/44306:
Jul 28 18:58:34 kernel:  #0: ffffa310f5e07d08 (crtc_ww_class_acquire){+.+.}-{0:0}, at: drm_mode_gamma_set_ioctl+0x8f/0x1f0 [drm]
Jul 28 18:58:34 kernel:  #1: ffff8e58e934f8c8 (crtc_ww_class_mutex){+.+.}-{3:3}, at: modeset_lock+0xd7/0x1c0 [drm]
Jul 28 18:58:34 kernel: 
Jul 28 18:58:34 kernel: =============================================

If this stacktrace could help to move forward with this.

or, please, suggest, what would be good diagnostics steps that
one can attempt to run blindly or with a prepared script.

Comment 24 Jordi Sanfeliu 2020-07-29 12:09:20 UTC
(In reply to Sergio Basto from comment #22)
> 
> I ended up using kernel-longterm-4.14 [1] , since I can't boot the computer
> with any newer kernel release ...
> 
> [1] https://copr.fedorainfracloud.org/coprs/kwizart/kernel-longterm-4.14/

Sergio,

I'm also ended up using the kwizart's kernel-longterm, but in my case the 4.19 which lasts almost one year more than 4.14 to become EOL.
Good results so far.

Comment 25 Steeve McCauley 2021-02-08 19:23:27 UTC
FWIW, still seeing this on Fedora 33, 5.10.13-200.fc33.x86_64

$ sudo lspci -s 01:00.0 -v
01:00.0 VGA compatible controller: NVIDIA Corporation GF110 [GeForce GTX 560 Ti OEM] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: PC Partner Limited / Sapphire Technology Device 5207
	Flags: bus master, fast devsel, latency 0, IRQ 39
	Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
	Memory at d0000000 (64-bit, prefetchable) [size=128M]
	Memory at d8000000 (64-bit, prefetchable) [size=32M]
	I/O ports at e000 [size=128]
	Expansion ROM at 000c0000 [disabled] [size=128K]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Capabilities: [b4] Vendor Specific Information: Len=14 <?>
	Capabilities: [100] Virtual Channel
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Kernel driver in use: nouveau
	Kernel modules: nouveau


Note You need to log in before you can comment on or make changes to this bug.