Bug 1547037 - [regression] BUG unable to handle kernel NULL pointer dereference in kernel 4.15.3
Summary: [regression] BUG unable to handle kernel NULL pointer dereference in kernel 4...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: xorg-x11-drv-nouveau
Version: 27
Hardware: x86_64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Ben Skeggs
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-02-20 11:51 UTC by Dominik 'Rathann' Mierzejewski
Modified: 2019-01-02 11:43 UTC (History)
23 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-03-27 20:13:14 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
full dmesg (77.47 KB, text/plain)
2018-02-20 11:51 UTC, Dominik 'Rathann' Mierzejewski
no flags Details
dmesg (4.71 KB, text/plain)
2018-02-28 14:50 UTC, Stefano Biagiotti
no flags Details
dmesg with kernel-4.15.11-300.fc27 (4.72 KB, text/plain)
2018-03-22 14:56 UTC, Stefano Biagiotti
no flags Details
journalctl (kernel-4.15.12-301.fc27.x86_64 (4.66 KB, text/plain)
2018-03-26 09:40 UTC, Stefano Biagiotti
no flags Details


Links
System ID Private Priority Status Summary Last Updated
FreeDesktop.org 105174 0 None None None 2018-02-20 11:51:22 UTC

Description Dominik 'Rathann' Mierzejewski 2018-02-20 11:51:23 UTC
Created attachment 1398214 [details]
full dmesg

Description of problem:
After updating to Fedora kernel 4.15.3-300.fc27.x86_64 and rebooting, I get no output on the second screen attached to HDMI port and the Xorg session doesn't start fully after login. I can only see the wallpaper on the built-in display. Mouse cursor moves, but doesn't respond to clicks. Machine remains accessible via ssh. There are no errors or warnings in Xorg log.

The following message is visible in dmesg:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
IP: nouveau_mem_host+0x47/0x1b0 [nouveau]

Full dmesg attached. The hardware is Dell XPS 15 with Intel and nVidia GPU in Optimus configuration.

Version-Release number of selected component (if applicable):
kernel-4.15.3-300.fc27.x86_64
xorg-x11-drv-nouveau-1.0.15-3.fc27.x86_64

How reproducible:
Always.

Steps to Reproduce:
1. Boot 4.15.3-300.fc27.x86_64 kernel with second screen connected to HDMI
2. Log in (via lightdm)

Actual results:
Wallpaper on built-in LCD, no signal on HDMI. Errors in dmesg.

Expected results:
Built-in LCD should be blank, normal output on HDMI (as configured for this user).

Additional info:
kernel-4.14.18-300.fc27.x86_64 works fine.

Comment 1 Stefano Biagiotti 2018-02-28 14:50:56 UTC
Created attachment 1401910 [details]
dmesg

Same bug hit here on kernel-4.15.4-300.fc27.x86_64 and xorg-x11-drv-nouveau-1.0.15-3.fc27.x86_64.

Kernel-4.14.16-300.fc27.x86_64 seems to work fine.

Hardware is a Dell Vostro 220 with two monitors connected to (from lspci)
01:00.0 VGA compatible controller: NVIDIA Corporation G98 [GeForce 8400 GS Rev. 2] (rev a1)

The freeze happens in a non-predictable way, but often right after login from lightdm.

When the freeze happens the mouse pointer is still alive (I can move it around), and I can use ssh to log into the system and reboot.

Comment 2 Dominik 'Rathann' Mierzejewski 2018-03-09 19:54:01 UTC
Patch provided in the upstream bug report (and sent to nouveau mailing list: https://lists.freedesktop.org/archives/nouveau/2018-March/029959.html) fixes this for me, please include it in the next Fedora kernel build.

Comment 3 André Johansen 2018-03-17 15:26:18 UTC
I see the same problem, unable to login to desktop.  Fedora 27/x86-64.
Kernels 4.15.4 and 4.15.9 fails, 4.14.14 works fine.

Mar 17 15:31:27 argonath kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
Mar 17 15:31:27 argonath kernel: IP: nouveau_mem_host+0x47/0x1b0 [nouveau]
Mar 17 15:31:27 argonath kernel: PGD 0 P4D 0 
Mar 17 15:31:27 argonath kernel: Oops: 0000 [#1] SMP PTI
Mar 17 15:31:27 argonath kernel: Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables sunrpc coretemp kvm_intel kvm iTCO_wdt gpio_ich iTCO_vendor_support irqbypass snd_virtuoso snd_oxygen_lib snd_mpu401_uart snd_rawmidi joydev snd_seq snd_seq_device snd_pcm i2c_i801 snd_timer lpc_ich snd soundcore shpchp asus_atk0110 acpi_cpufreq binfmt_misc uas usb_storage nouveau firewire_ohci video mxm_wmi wmi i2c_algo_bit
Mar 17 15:31:27 argonath kernel: drm_kms_helper ttm firewire_core serio_raw drm ata_generic atl1 pata_acpi crc_itu_t mii pata_jmicron
Mar 17 15:31:27 argonath kernel: CPU: 1 PID: 1249 Comm: plasmashell Not tainted 4.15.4-300.fc27.x86_64 #1
Mar 17 15:31:27 argonath kernel: Hardware name: System manufacturer P5K/P5K, BIOS 1201    10/14/2008
Mar 17 15:31:27 argonath kernel: RIP: 0010:nouveau_mem_host+0x47/0x1b0 [nouveau]
Mar 17 15:31:27 argonath kernel: RSP: 0018:ffffa77a0209f800 EFLAGS: 00010246
Mar 17 15:31:27 argonath kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000001000
Mar 17 15:31:27 argonath kernel: RDX: 000000003452d000 RSI: ffff999979837b80 RDI: ffffa77a0209f958
Mar 17 15:31:27 argonath kernel: RBP: ffff999979837100 R08: 000000ffffffffff R09: 0000000000000000
Mar 17 15:31:27 argonath kernel: R10: ffff999936128890 R11: 0000000000000e10 R12: ffffa77a0209f958
Mar 17 15:31:27 argonath kernel: R13: 0000000000000000 R14: ffff999979837100 R15: ffffa77a0209f958
Mar 17 15:31:27 argonath kernel: FS:  00007ff93042b940(0000) GS:ffff99997fc80000(0000) knlGS:0000000000000000
Mar 17 15:31:27 argonath kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 17 15:31:27 argonath kernel: CR2: 00007ff91a5a3000 CR3: 0000000054442000 CR4: 00000000000006e0
Mar 17 15:31:27 argonath kernel: Call Trace:
Mar 17 15:31:27 argonath kernel: nv50_sgdma_bind+0x18/0x30 [nouveau]
Mar 17 15:31:27 argonath kernel: ttm_tt_bind+0x3f/0x60 [ttm]
Mar 17 15:31:27 argonath kernel: ttm_bo_handle_move_mem+0x5da/0x610 [ttm]
Mar 17 15:31:27 argonath kernel: ttm_bo_evict+0x14d/0x330 [ttm]
Mar 17 15:31:27 argonath kernel: ttm_mem_evict_first+0x161/0x1d0 [ttm]
Mar 17 15:31:27 argonath kernel: ttm_bo_mem_space+0x344/0x4c0 [ttm]
Mar 17 15:31:27 argonath kernel: ttm_bo_validate+0xce/0x150 [ttm]
Mar 17 15:31:27 argonath kernel: ttm_bo_init_reserved+0x385/0x430 [ttm]
Mar 17 15:31:27 argonath kernel: ttm_bo_init+0x2f/0x90 [ttm]
Mar 17 15:31:27 argonath kernel: ? nouveau_bo_invalidate_caches+0x10/0x10 [nouveau]
Mar 17 15:31:27 argonath kernel: ? _cond_resched+0x15/0x40
Mar 17 15:31:27 argonath kernel: nouveau_bo_new+0x416/0x590 [nouveau]
Mar 17 15:31:27 argonath kernel: ? nouveau_bo_invalidate_caches+0x10/0x10 [nouveau]
Mar 17 15:31:27 argonath kernel: ? nouveau_gem_new+0x120/0x120 [nouveau]
Mar 17 15:31:27 argonath kernel: nouveau_gem_new+0x5d/0x120 [nouveau]
Mar 17 15:31:27 argonath kernel: ? smp_call_function_many+0x23f/0x250
Mar 17 15:31:27 argonath kernel: nouveau_gem_ioctl_new+0x51/0xd0 [nouveau]
Mar 17 15:31:27 argonath kernel: drm_ioctl_kernel+0x5b/0xb0 [drm]
Mar 17 15:31:27 argonath kernel: drm_ioctl+0x2d5/0x370 [drm]
Mar 17 15:31:27 argonath kernel: ? nouveau_gem_new+0x120/0x120 [nouveau]
Mar 17 15:31:27 argonath kernel: nouveau_drm_ioctl+0x64/0xc0 [nouveau]
Mar 17 15:31:27 argonath kernel: do_vfs_ioctl+0xa4/0x620
Mar 17 15:31:27 argonath kernel: SyS_ioctl+0x74/0x80
Mar 17 15:31:27 argonath kernel: do_syscall_64+0x75/0x180
Mar 17 15:31:27 argonath kernel: entry_SYSCALL_64_after_hwframe+0x21/0x86
Mar 17 15:31:27 argonath kernel: RIP: 0033:0x7ff927d618e7
Mar 17 15:31:27 argonath kernel: RSP: 002b:00007ffd9847ea48 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Mar 17 15:31:27 argonath kernel: RAX: ffffffffffffffda RBX: 000055c31d940d90 RCX: 00007ff927d618e7
Mar 17 15:31:27 argonath kernel: RDX: 00007ffd9847eaa0 RSI: 00000000c0306480 RDI: 0000000000000011
Mar 17 15:31:27 argonath kernel: RBP: 00007ffd9847eaa0 R08: 0000000000000000 R09: 00007ff928032c80
Mar 17 15:31:27 argonath kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c0306480
Mar 17 15:31:27 argonath kernel: R13: 0000000000000011 R14: 000055c31d951618 R15: 000055c31d28a340
Mar 17 15:31:27 argonath kernel: Code: 04 25 28 00 00 00 48 89 44 24 20 31 c0 49 8b 1e 48 c7 44 24 08 00 00 00 00 48 c7 44 24 10 00 00 00 00 48 c7 44 24 18 00 00 00 00 <48> 8b 7b 40 48 8d 83 f8 00 00 00 44 0f b6 6b 39 48 89 04 24 48 
Mar 17 15:31:27 argonath kernel: RIP: nouveau_mem_host+0x47/0x1b0 [nouveau] RSP: ffffa77a0209f800
Mar 17 15:31:27 argonath kernel: CR2: 0000000000000040

Comment 4 Dominik 'Rathann' Mierzejewski 2018-03-19 09:31:12 UTC
I made a scratch build of kernel-4.15.7 with this patch applied. If anyone is interested, feel free to test before it gets garbage collected by koji:

https://koji.fedoraproject.org/koji/taskinfo?taskID=25579883

Comment 5 Dominik 'Rathann' Mierzejewski 2018-03-19 09:31:59 UTC
Also, reassigning to the kernel component since it's a bug in kernel driver.

Comment 6 Laura Abbott 2018-03-19 15:25:24 UTC
We also track kernel graphics crashes with the associated userspace driver package since the same people work on both teams. Moving it back there for tracking.

I'll see about picking up the fix for today's stable kernel.

Comment 7 Dominik 'Rathann' Mierzejewski 2018-03-19 15:46:49 UTC
(In reply to Laura Abbott from comment #6)
> We also track kernel graphics crashes with the associated userspace driver
> package since the same people work on both teams. Moving it back there for
> tracking.

That's good to know, thanks!

> I'll see about picking up the fix for today's stable kernel.

I'd appreciate this, thanks again!

Comment 8 Fedora Update System 2018-03-20 01:18:55 UTC
kernel-4.15.11-300.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-350dc6f6b6

Comment 9 Fedora Update System 2018-03-20 01:20:53 UTC
kernel-4.15.11-200.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2018-f0c97206ba

Comment 10 Fedora Update System 2018-03-20 18:58:55 UTC
kernel-4.15.11-200.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-f0c97206ba

Comment 11 Fedora Update System 2018-03-20 19:38:15 UTC
kernel-4.15.11-300.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-350dc6f6b6

Comment 12 Fedora Update System 2018-03-21 23:45:49 UTC
kernel-4.15.12-300.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-61ce49be88

Comment 13 Fedora Update System 2018-03-21 23:46:41 UTC
kernel-4.15.12-200.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2018-fdbeaded74

Comment 14 Stefano Biagiotti 2018-03-22 14:56:13 UTC
Created attachment 1411755 [details]
dmesg with kernel-4.15.11-300.fc27

Installed kernel-4.15.11-300.fc27 from updates-testing but it doesn't resolve.
 $ LANG=en dnf list kernel-4.15.11-300.fc27
 Failed to set locale, defaulting to C
 Last metadata expiration check: 0:00:15 ago on Thu Mar 22 15:50:44 2018.
 Installed Packages
 kernel.x86_64           4.15.11-300.fc27                @updates-testing

Same freeze after login via lightdm.
Dmesg attached.

Comment 15 Dominik 'Rathann' Mierzejewski 2018-03-22 16:36:49 UTC
@Stefano, could you mention that in the upstream report, too?

Comment 16 Fedora Update System 2018-03-22 17:39:54 UTC
kernel-4.15.12-300.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-61ce49be88

Comment 17 Stefano Biagiotti 2018-03-23 09:32:11 UTC
(In reply to Dominik 'Rathann' Mierzejewski from comment #15)
> @Stefano, could you mention that in the upstream report, too?
BTDT.

Comment 18 Fedora Update System 2018-03-23 13:38:27 UTC
kernel-4.15.12-200.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-fdbeaded74

Comment 19 Fedora Update System 2018-03-23 15:19:53 UTC
kernel-4.15.12-301.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-e378863e47

Comment 20 Fedora Update System 2018-03-23 15:20:33 UTC
kernel-4.15.12-201.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2018-ba39fc0e07

Comment 21 Fedora Update System 2018-03-23 17:48:41 UTC
kernel-4.15.12-301.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-e378863e47

Comment 22 Fedora Update System 2018-03-24 03:34:43 UTC
kernel-4.15.12-201.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-ba39fc0e07

Comment 23 Stefano Biagiotti 2018-03-26 09:40:53 UTC
Created attachment 1413042 [details]
journalctl (kernel-4.15.12-301.fc27.x86_64

kernel-4.15.12-301.fc27.x86_64 from updates-testing still doesn't resolve.

Display adapter (from lspci) is:
01:00.0 VGA compatible controller: NVIDIA Corporation G98 [GeForce 8400 GS Rev. 2] (rev a1)

Attach is an excerpt from "journalctl -k -b -1 --no-pager --no-hostname".

Comment 24 Fedora Update System 2018-03-27 19:29:07 UTC
kernel-4.15.12-201.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.

Comment 25 Fedora Update System 2018-03-27 20:13:14 UTC
kernel-4.15.12-301.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.

Comment 26 André Johansen 2018-04-09 17:42:33 UTC
The problem is still present with kernel-4.15.14-300.fc27.x86_64.

Excerpt from dmesg when my KDE desktop was mostly blank/black (retrieved on a virtual terminal) -- same output as my previous comment:

[    0.000000] microcode: microcode updated early to revision 0xba, date = 2010-10-03
[    0.000000] Linux version 4.15.14-300.fc27.x86_64 (mockbuild.fedoraproject.org) (gcc version 7.3.1 20180303 (Red Hat 7.3.1-5) (GCC)) #1 SMP Thu Mar 29 16:13:44 UT
C 2018
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.15.14-300.fc27.x86_64 root=/dev/mapper/argonath-fedora_root ro rd.lvm.lv=argonath/fedora_root rd.lvm.lv=argonath/swap rhgb quiet 
LANG=en_GB.UTF-8
...
[   24.194379] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
[   24.194438] IP: nouveau_mem_host+0x47/0x1b0 [nouveau]
[   24.194442] PGD 0 P4D 0 
[   24.194448] Oops: 0000 [#1] SMP PTI
....
[   24.194617] Call Trace:
[   24.194651]  nv50_sgdma_bind+0x18/0x30 [nouveau]
[   24.194661]  ttm_tt_bind+0x3f/0x60 [ttm]
[   24.194668]  ttm_bo_handle_move_mem+0x5da/0x610 [ttm]
[   24.194676]  ttm_bo_evict+0x14d/0x330 [ttm]
...

This is on an old computer/graphics card, Fedora 27 fully updated:
01:00.0 VGA compatible controller: NVIDIA Corporation G84 [GeForce 8600 GTS] (rev a1)

Please reopen.

Comment 27 Edgar Hoch 2018-05-14 10:02:42 UTC
This may be a duplicate of bug 1559178

Comment 28 Dominik 'Rathann' Mierzejewski 2019-01-02 11:43:57 UTC
(In reply to André Johansen from comment #26)
> The problem is still present with kernel-4.15.14-300.fc27.x86_64.
> 
> Excerpt from dmesg when my KDE desktop was mostly blank/black (retrieved on
> a virtual terminal) -- same output as my previous comment:
> 
> [    0.000000] microcode: microcode updated early to revision 0xba, date =
> 2010-10-03
> [    0.000000] Linux version 4.15.14-300.fc27.x86_64
> (mockbuild.fedoraproject.org) (gcc version 7.3.1 20180303
> (Red Hat 7.3.1-5) (GCC)) #1 SMP Thu Mar 29 16:13:44 UT
> C 2018
> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.15.14-300.fc27.x86_64
> root=/dev/mapper/argonath-fedora_root ro rd.lvm.lv=argonath/fedora_root
> rd.lvm.lv=argonath/swap rhgb quiet 
> LANG=en_GB.UTF-8
> ...
> [   24.194379] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000040
> [   24.194438] IP: nouveau_mem_host+0x47/0x1b0 [nouveau]
> [   24.194442] PGD 0 P4D 0 
> [   24.194448] Oops: 0000 [#1] SMP PTI
> ....
> [   24.194617] Call Trace:
> [   24.194651]  nv50_sgdma_bind+0x18/0x30 [nouveau]
> [   24.194661]  ttm_tt_bind+0x3f/0x60 [ttm]
> [   24.194668]  ttm_bo_handle_move_mem+0x5da/0x610 [ttm]
> [   24.194676]  ttm_bo_evict+0x14d/0x330 [ttm]
> ...
> 
> This is on an old computer/graphics card, Fedora 27 fully updated:
> 01:00.0 VGA compatible controller: NVIDIA Corporation G84 [GeForce 8600 GTS]
> (rev a1)
> 
> Please reopen.

Does the patch mentioned in https://bugs.freedesktop.org/show_bug.cgi?id=105174#c20 fix this for you?

I can't reproduce this anymore, though I've encountered a different bug in the meantime: https://bugs.freedesktop.org/show_bug.cgi?id=109187 .

I'd open a new bug report and mention it's a regression if you can still reproduce it on a supported Fedora release.


Note You need to log in before you can comment on or make changes to this bug.