Created attachment 1273911 [details] Backtrace output Description of problem: All the released 4.10 kernels have hung my machine hard under wayland. This seems to happen a few times a day. Added kdump and have a backtrace from 4.10.10-200 Version-Release number of selected component (if applicable): 4.10.10-200 How reproducible: Wait long enough during the day Steps to Reproduce: 1.Run Wayland desktop 2.Desktop hangs (often seems in Window overview mode) Additional info: Managed to get a BT from kdump. PANIC: "divide error: 0000 [#1] SMP" [exception RIP: drm_calc_vbltimestamp_from_scanoutpos+358] Complete BT attached.
Still seeing this with 4.10.13-200.fc25.x86_64 [ 502.862336] RIP: 0010:drm_calc_vbltimestamp_from_scanoutpos+0x166/0x310 [drm] Looks like this maybe upstream here: https://bugs.freedesktop.org/show_bug.cgi?id=100691 But would have thought this would have been more commonly seen.
Still hosed in 4.10.14-200.fc25.x86_64 RIP: drm_calc_vbltimestamp_from_scanoutpos+0x166/0x310 [drm] RSP: ffff9cb97dc03aa8 Usually goes when you move the mouse to the corner and get Window overview mode.
Let's have the nouveau team look at this
Just keeping this ticket up to date. Kernel 4.10.15-200.fc25.x86_64 still has this issue. Thanks for passing this to the nouveau team.
And still in 4.10.17-200.fc25.x86_64 RIP: drm_calc_vbltimestamp_from_scanoutpos+0x166/0x310 [drm] RSP: ffff895abdc03aa8 Are the nouveau team thinking anything about this bug? Concerned it may make it into F26. Thanks
And still in 4.11.3-200.fc25.x86_64 RIP: drm_calc_vbltimestamp_from_scanoutpos+0x14f/0x2d0 [drm] RSP: ffff89ec3dd83af8 Nouveau team any information.....can you say something about this!
Still in 4.11.4-200.fc25.x86_64 RIP: drm_calc_vbltimestamp_from_scanoutpos+0x14f/0x2d0 [drm] RSP: ffffa0897dc43af8 You aren't exactly giving me much faith in Nouveau. For example, I have a problem in RHEL open with RH support and one of their suggestions was to switch from NVIDIA proprietary to Nouveau. I have no faith in doing this, given that there seems to be little interest in fixing bleeding edge bugs in Nouveau. Also what's the point in a RHEL customer trying Fedora for future RHEL technology previews, if there is no interest (not even a reply) on the issues we find.
Please test 4.11.5 which should have a fix. *** This bug has been marked as a duplicate of bug 1460456 ***
Sorry about that, I misread the function that was panicking so I don't think that particular fix will apply but it's still worth testing 4.11.5 which may have other nouveau fixes.
Can you post a full kernel log please? I'm aware of this issue from the upstream report, but have never seen it myself, despite testing on a LOT of different hardware in the same timeframe. I also cannot pinpoint why this would be happening from a scan over the relevant changes. In this case, since it's random, doing a bisect between 4.9 and 4.10 to determine the exact commit causing it would be problematic. It'd be useful if it could be managed, but there'd be some doubt in the result due to the randomness.
Not sure what you mean by the full kernel log. I have a complete crashdump from 4.11.5-200, if I can send that to you some how?
I now have a complete crashdump for 4.11.6-201.fc25.x86_64. How can I get this to you to move this forward?
Created attachment 1292407 [details] vmcore-dmesg-4.11.6-201
Attached what I thought you maybe needed.
This is now happening on F26, not hugely surprising as it hasn't been addressed. But I no longer have an old kernel to back out to. Linux version 4.11.9-300.fc26.x86_64 RIP: drm_calc_vbltimestamp_from_scanoutpos+0x14f/0x2d0 [drm] RSP: ffff8ce7fdc03b08 Ben, I I asked how to send you a crash dump but never have heard back from you? Can you let me know how we can address this bug?
I'd hoped to stop this happening by running Xorg instead of Wayland for my Gnome session. But arghh it still happens with Xorg. RIP: drm_calc_vbltimestamp_from_scanoutpos+0x14f/0x2d0 [drm] RSP: ffff8d2ebddc3b08
This has started happening in RHEL after upgrading from 7.3 to 7.4. 3.10.0-693.2.1.el7.x86_64 [1029086.283151] divide error: 0000 [#1] SMP [1029086.283167] Modules linked in: sctp_diag sctp dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag binfmt_misc xt_nat veth nls_utf8 cifs dns_resolver tcp_lp f use xt_addrtype br_netfilter xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv 4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter btrfs raid6_pq xor dm_thin_pool dm_persistent_data dm_bio_p rison dm_bufio snd_hda_codec_hdmi intel_powerclamp [1029086.283360] coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel mei_wdt hp_wmi ppdev sparse_keymap aesni_intel rfkill snd_hda_codec_realtek lrw gf128mu l snd_hda_codec_generic glue_helper ablk_helper cryptd snd_hda_intel snd_hda_codec snd_hda_core i2c_i801 pcspkr snd_hwdep snd_seq joydev snd_seq_device snd_pcm parport_pc snd_timer parport sn d sg soundcore mei_me mei tpm_infineon shpchp acpi_pad nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic nouveau mxm_wmi i2c_algo_bit drm _kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ahci libahci e1000e libata crct10dif_pclmul crct10dif_common crc32c_intel serio_raw ptp pps_core i2c_core wmi video dm_mirror dm_region_hash dm_log dm_mod [1029086.283558] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.0-693.1.1.el7.x86_64 #1 [1029086.283575] Hardware name: HP HP EliteDesk 800 G2 SFF/8054, BIOS N01 Ver. 02.16 08/08/2016 [1029086.283594] task: ffffffff819f9480 ti: ffffffff819e4000 task.ti: ffffffff819e4000 [1029086.283612] RIP: 0010:[<ffffffffc0172ad9>] [<ffffffffc0172ad9>] drm_calc_vbltimestamp_from_scanoutpos+0x169/0x320 [drm] [1029086.283646] RSP: 0018:ffff88082dc03a58 EFLAGS: 00010002 [1029086.283659] RAX: 0001377b41ca6940 RBX: 0000000000000000 RCX: 0000000000000000 [1029086.283674] RDX: 0000000000000000 RSI: 00000000ffffffe0 RDI: 0003a7f26059f405 [1029086.283690] RBP: ffff88082dc03af8 R08: 0000000000000001 R09: ffff8808073b9000 [1029086.283707] R10: 00000000001f006f R11: ffff8808087fe910 R12: 0000000000000007 [1029086.283723] R13: ffff8807a37fd420 R14: 0000000000000002 R15: 0000000000000003 [1029086.283739] FS: 0000000000000000(0000) GS:ffff88082dc00000(0000) knlGS:0000000000000000 [1029086.283757] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1029086.283770] CR2: 00007f3fb303af00 CR3: 00000000019f2000 CR4: 00000000003407f0 [1029086.283786] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [1029086.283801] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [1029086.283817] Stack: [1029086.283822] ffff88082dc03ac0 ffff8807a37fd420 0000000000059399 00000000000fb3dc [1029086.283840] 000000000005952b ffff880800000002 ffff880000000000 00000000000fb3dc [1029086.283858] ffff88082dc03b68 ffff88082dc03b1c 00000001814c669d 000002bdffffffe2 [1029086.283876] Call Trace: [1029086.283882] <IRQ> [1029086.283889] [1029086.283914] [<ffffffffc02a62de>] nouveau_display_vblstamp+0x6e/0x80 [nouveau] [1029086.283934] [<ffffffffc01727a3>] drm_get_last_vbltimestamp+0x53/0x90 [drm] [1029086.283956] [<ffffffffc0172f79>] drm_update_vblank_count+0x79/0x2c0 [drm] [1029086.283978] [<ffffffffc0173c66>] drm_handle_vblank+0x96/0x280 [drm] [1029086.283998] [<ffffffffc0173e67>] drm_crtc_handle_vblank+0x17/0x20 [drm] [1029086.284031] [<ffffffffc02a5f55>] nouveau_display_vblank_handler+0x15/0x20 [nouveau] [1029086.284058] [<ffffffffc01f0cdc>] nvif_notify+0xac/0x1a0 [nouveau] [1029086.284089] [<ffffffffc0254514>] ? nv50_disp_vblank_fini_+0x14/0x20 [nouveau] [1029086.284122] [<ffffffffc029c27a>] nvkm_client_ntfy+0x6a/0x70 [nouveau] [1029086.284146] [<ffffffffc01f0fe2>] nvkm_client_notify+0x22/0x30 [nouveau] [1029086.284171] [<ffffffffc01f41e6>] nvkm_notify_send+0x86/0x160 [nouveau] [1029086.284195] [<ffffffffc01f1f1f>] nvkm_event_send+0xdf/0x100 [nouveau] [1029086.284226] [<ffffffffc0253c21>] nvkm_disp_vblank+0x41/0x60 [nouveau] [1029086.284257] [<ffffffffc0256f5b>] gf119_disp_intr+0x10b/0x270 [nouveau] [1029086.284288] [<ffffffffc0254553>] nv50_disp_intr_+0x13/0x20 [nouveau] [1029086.284317] [<ffffffffc02538b4>] nvkm_disp_intr+0x14/0x20 [nouveau] [1029086.284341] [<ffffffffc01f177f>] nvkm_engine_intr+0x1f/0x30 [nouveau] [1029086.284366] [<ffffffffc01f5a37>] nvkm_subdev_intr+0x17/0x20 [nouveau] [1029086.284396] [<ffffffffc023bf47>] nvkm_mc_intr+0xf7/0x1a0 [nouveau] [1029086.284425] [<ffffffffc0240e23>] nvkm_pci_intr+0x53/0x80 [nouveau] [1029086.284441] [<ffffffff81130a2e>] __handle_irq_event_percpu+0x3e/0x1c0 [1029086.284457] [<ffffffff81130be2>] handle_irq_event_percpu+0x32/0x80 [1029086.284471] [<ffffffff81130c6c>] handle_irq_event+0x3c/0x60 [1029086.284484] [<ffffffff811338f7>] handle_edge_irq+0x77/0x130 [1029086.284499] [<ffffffff8102d2c8>] handle_irq+0xb8/0x150 [1029086.284512] [<ffffffff810f3ddc>] ? tick_check_idle+0x8c/0xd0 [1029086.284526] [<ffffffff816b053a>] ? atomic_notifier_call_chain+0x1a/0x20 [1029086.284543] [<ffffffff816b75ed>] do_IRQ+0x4d/0xe0 [1029086.284554] [<ffffffff816ac1ed>] common_interrupt+0x6d/0x6d [1029086.284567] <EOI> [1029086.284572] [1029086.285527] [<ffffffff81527c22>] ? cpuidle_enter_state+0x52/0xc0 [1029086.286484] [<ffffffff81527c18>] ? cpuidle_enter_state+0x48/0xc0 [1029086.287430] [<ffffffff81527d68>] cpuidle_idle_call+0xd8/0x210 [1029086.288372] [<ffffffff81034fee>] arch_cpu_idle+0xe/0x30 [1029086.289310] [<ffffffff810e7bca>] cpu_startup_entry+0x14a/0x1c0 [1029086.290251] [<ffffffff81692c57>] rest_init+0x77/0x80 [1029086.291186] [<ffffffff81b45060>] start_kernel+0x439/0x45a [1029086.292121] [<ffffffff81b44a30>] ? repair_env_string+0x5c/0x5c [1029086.293052] [<ffffffff81b44120>] ? early_idt_handler_array+0x120/0x120 [1029086.293979] [<ffffffff81b445ef>] x86_64_start_reservations+0x24/0x26 [1029086.294908] [<ffffffff81b44740>] x86_64_start_kernel+0x14f/0x172 [1029086.295814] Code: e0 02 83 f8 01 41 8b 85 a8 00 00 00 45 19 ff 0f af 44 24 58 41 83 e7 fe 41 83 c7 03 03 44 24 5c 48 98 48 69 c0 40 42 0f 00 48 99 <48> f7 f9 49 89 c5 8b 05 47 a5 03 00 85 c0 0f 84 b3 00 00 00 e8 [1029086.296827] RIP [<ffffffffc0172ad9>] drm_calc_vbltimestamp_from_scanoutpos+0x169/0x320 [drm] [1029086.297826] RSP <ffff88082dc03a58>
Sadly the way i have resolved this is moving to the 4.13.0-1.fc27 on Fedora 26, not really an option with RHEL7.4 When I had discussions with a RH engineer on this they said they couldn't reproduce with the hardware they had. Also it was a hard issue to debug. I guess a support call if you have that option?
This message is a reminder that Fedora 25 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 25. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '25'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 25 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 25 changed to end-of-life (EOL) status on 2017-12-12. Fedora 25 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.