Bug 1508088 - nouveau fails to boot falcon on Nvidia GeForce GTX1060 most of the time
Summary: nouveau fails to boot falcon on Nvidia GeForce GTX1060 most of the time
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: xorg-x11-drv-nouveau
Version: 26
Hardware: All
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Ben Skeggs
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-10-31 19:55 UTC by Peter Backes
Modified: 2018-05-29 12:18 UTC (History)
23 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-05-29 12:18:33 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
dmesg output (71.65 KB, text/plain)
2017-11-12 16:14 UTC, stupidfrog66
no flags Details
dmesg output on F28 desktop system also affected (85.99 KB, text/plain)
2018-05-03 23:12 UTC, Brion Vibber
no flags Details

Description Peter Backes 2017-10-31 19:55:57 UTC
Description of problem:
I recently bought a laptop produced by Clevo, model P65xHP (according to /sys/devices/virtual/dmi/id/product_name), rebranded as XMG P507 1060 2017. The laptop has a discrete Nvidia GeForce GTX 1060 (Pascal family) and an Intel Core i7-7700HQ with HD Graphics 630.

% /sbin/lspci | grep -i vga
00:02.0 VGA compatible controller: Intel Corporation Device 591b (rev 04)
01:00.0 VGA compatible controller: NVIDIA Corporation GP106M [GeForce GTX 1060 Mobile 6GB] (rev a1)

When booting, and logging into X11, in 95% of the cases, it uses the llvmpipe driver for the nvidia GPU. This is caused by nouveau not being able to correctly initialize the Falcon.

I suppose it must be a race condition in the initialization code for the nvidia chip, because the problem is not deterministic. It occurs only in about 19 of 20 boots. The problem is written to the console ("error during falcon reset: -110") before initramfs asks for the LUKS password. If I continuously reboot on the LUKS password prompt after the error occurs, I get a good falcon initialization after about 20 reboots. X11 then properly uses the modesetting driver instead of llvmpipe, and PRIME works.

Version-Release number of selected component (if applicable):
kernel-4.13.8-200.fc26.x86_64

How reproducible:
Not always, about 95% of all boots

Steps to Reproduce:
1. boot

Actual results:
...
pci 0000:01:00.0: optimus capabilities: enabled, status dynamic power, hda bios codec supported
VGA switcheroo: detected Optimus DSM method \_SB_.PCI0.PEG0.PEGP handle
nouveau: detected PR support, will not use DSM
nouveau 0000:01:00.0: enabling device (0006 -> 0007)
nouveau 0000:01:00.0: NVIDIA GP106 (136000a1)
[drm] Memory usable by graphics device = 4096M
fb: switching to inteldrmfb from EFI VGA
Console: switching to colour dummy device 80x25
[drm] Replacing VGA console driver
[drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[drm] Driver supports precise vblank timestamp query.
i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[drm] Finished loading DMC firmware i915/kbl_dmc_ver1_01.bin (v1.1)
[drm] Initialized i915 1.6.0 20170619 for 0000:00:02.0 on minor 0
ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input11
ACPI: Video Device [PEGP] (multi-head: no  rom: yes  post: no)
input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:11/LNXVIDEO:01/input/input12
fbcon: inteldrmfb (fb0) is primary device
...
nouveau 0000:01:00.0: bios: version 86.06.3f.00.16
...
nouveau 0000:01:00.0: fb: 6144 MiB GDDR5
vga_switcheroo: enabled
[TTM] Zone  kernel: Available graphics memory: 8088720 kiB
[TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[TTM] Initializing pool allocator
[TTM] Initializing DMA pool allocator
nouveau 0000:01:00.0: DRM: VRAM: 6144 MiB
nouveau 0000:01:00.0: DRM: GART: 1048576 MiB
nouveau 0000:01:00.0: DRM: BIT table 'A' not found
nouveau 0000:01:00.0: DRM: BIT table 'L' not found
nouveau 0000:01:00.0: DRM: TMDS table version 2.0
nouveau 0000:01:00.0: DRM: DCB version 4.1
nouveau 0000:01:00.0: DRM: DCB outp 00: 02022f62 00020010
nouveau 0000:01:00.0: DRM: DCB outp 01: 04844f86 04600010
nouveau 0000:01:00.0: DRM: DCB outp 02: 04844f82 00020010
nouveau 0000:01:00.0: DRM: DCB outp 03: 04855f96 04600020
nouveau 0000:01:00.0: DRM: DCB outp 04: 04855f92 00020020
nouveau 0000:01:00.0: DRM: DCB conn 02: 00010261
nouveau 0000:01:00.0: DRM: DCB conn 04: 01000446
nouveau 0000:01:00.0: DRM: DCB conn 05: 02000546
nouveau 0000:01:00.0: DRM: Pointer to flat panel table invalid
...
[drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[drm] Driver supports precise vblank timestamp query.
...
nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies
...
nouveau 0000:01:00.0: timeout
------------[ cut here ]------------
WARNING: CPU: 1 PID: 350 at drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c:184 acr_ls_sec2_post_run+0x223/0x270 [nouveau]
Modules linked in: uas usb_storage i915 nouveau(+) ttm rtsx_pci_sdmmc i2c_algo_bit mxm_wmi mmc_core drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel drm ghash_clmulni_intel nvme nvme_core rtsx_pci serio_raw wmi video
CPU: 1 PID: 350 Comm: systemd-udevd Not tainted 4.13.8-200.fc26.x86_64 #1
Hardware name: XMG P65xHP/P65xHP, BIOS 1.05.06 06/28/2017
task: ffff8ca16bf80000 task.stack: ffff9d8c82104000
RIP: 0010:acr_ls_sec2_post_run+0x223/0x270 [nouveau]
RSP: 0018:ffff9d8c82107320 EFLAGS: 00010282
RAX: 000000000000001d RBX: ffff8ca16f148350 RCX: 0000000000000002
RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000246
RBP: ffff9d8c82107360 R08: 000000000000001d R09: 000000000002d51c
R10: 0000000000000000 R11: 000000000000037b R12: ffff8ca16afb20c0
R13: 0000000000000000 R14: 0000000000000040 R15: ffff8ca16bf4a000
FS:  00007f825d89b8c0(0000) GS:ffff8ca182040000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055f4bda0f3d8 CR3: 00000004aad61000 CR4: 00000000003406e0
Call Trace:
 acr_r352_bootstrap+0x244/0x280 [nouveau]
 acr_r352_reset+0x39/0x240 [nouveau]
 nvkm_secboot_reset+0x2f/0x70 [nouveau]
 gf100_gr_init_ctxctl+0x23d/0x9a0 [nouveau]
 gp100_gr_init+0x6f0/0x720 [nouveau]
 gf100_gr_init_+0x55/0x60 [nouveau]
 nvkm_gr_init+0x17/0x20 [nouveau]
 nvkm_engine_init+0x68/0x1f0 [nouveau]
 nvkm_subdev_init+0xb0/0x200 [nouveau]
 nvkm_engine_ref+0x4f/0x70 [nouveau]
 nvkm_ioctl_new+0x2b4/0x300 [nouveau]
 ? nvkm_fifo_chan_dtor+0xe0/0xe0 [nouveau]
 ? gf100_gr_chsw_load+0x50/0x50 [nouveau]
 nvkm_ioctl+0x118/0x280 [nouveau]
 nvkm_client_ioctl+0x12/0x20 [nouveau]
 nvif_object_ioctl+0x41/0x50 [nouveau]
 nvif_object_init+0xc8/0x120 [nouveau]
 nvc0_fbcon_accel_init+0x5b/0x910 [nouveau]
 nouveau_fbcon_create+0x50b/0x5e0 [nouveau]
 ? drm_setup_crtcs+0x409/0x9d0 [drm_kms_helper]
 drm_fb_helper_initial_config+0x1f5/0x420 [drm_kms_helper]
 nouveau_fbcon_init+0x105/0x170 [nouveau]
 ? nouveau_bo_move_init+0xb2/0xf0 [nouveau]
 nouveau_drm_load+0x23c/0x8f0 [nouveau]
 ? sysfs_do_create_link_sd.isra.2+0x6c/0xc0
 drm_dev_register+0x146/0x1d0 [drm]
 drm_get_pci_dev+0x9a/0x180 [drm]
 nouveau_drm_probe+0x1d7/0x260 [nouveau]
 local_pci_probe+0x42/0xa0
 ? pci_assign_irq+0x2b/0x120
 pci_device_probe+0x18d/0x1a0
 driver_probe_device+0x2ff/0x450
 __driver_attach+0xa4/0xe0
 ? driver_probe_device+0x450/0x450
 bus_for_each_dev+0x6e/0xb0
 driver_attach+0x1e/0x20
 bus_add_driver+0x1c7/0x270
 ? 0xffffffffc0143000
 driver_register+0x60/0xe0
 ? 0xffffffffc0143000
 __pci_register_driver+0x4c/0x50
 drm_pci_init+0xde/0xf0 [drm]
 ? 0xffffffffc0143000
 nouveau_drm_init+0x1e0/0x1000 [nouveau]
 do_one_initcall+0x50/0x190
 ? __vunmap+0x81/0xb0
 ? kmem_cache_alloc_trace+0x15f/0x1c0
 ? do_init_module+0x27/0x1e9
 do_init_module+0x5f/0x1e9
 load_module+0x2602/0x2c30
 SYSC_init_module+0x170/0x1a0
 ? SYSC_init_module+0x170/0x1a0
 SyS_init_module+0xe/0x10
 do_syscall_64+0x67/0x140
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f825c4f218a
RSP: 002b:00007ffd77d38828 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
RAX: ffffffffffffffda RBX: 000055f4bcf146d0 RCX: 00007f825c4f218a
RDX: 00007f825d0289c5 RSI: 000000000028108b RDI: 000055f4bd78e340
RBP: 00007f825d0289c5 R08: 000055f4bcf0eda0 R09: 0000000000000030
R10: 00007f825c7acb00 R11: 0000000000000246 R12: 000055f4bd78e340
R13: 000055f4bcf13bf0 R14: 0000000000020000 R15: 000055f4bc6f3f4a
Code: 37 ef e9 d6 fe ff ff 49 8b 7f 10 48 8b 5f 50 48 85 db 74 51 e8 6f 40 37 ef 48 89 da 48 89 c6 48 c7 c7 83 77 2f c0 e8 4e 51 ef ee <0f> ff e9 93 fe ff ff 48 8b 45 c8 48 8b 78 10 48 8b 5f 50 48 85 
---[ end trace 3c842e828c462e5e ]---
...
nouveau 0000:01:00.0: secboot: error during falcon reset: -110
nouveau 0000:01:00.0: gr: init failed, -110
nouveau 0000:01:00.0: DRM: allocated 1920x1200 fb: 0x60000, bo ffff8ca16ae72000
nouveau 0000:01:00.0: fb1: nouveaufb frame buffer device
[drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 1
...
nouveau 0000:01:00.0: gr: FECS falcon already acquired by gr!
nouveau 0000:01:00.0: gr: init failed, -16

Expected results:
...
nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies
...
nouveau 0000:01:00.0: DRM: allocated 1920x1200 fb: 0x60000, bo ffffa081ea1c1000
...
nouveau 0000:01:00.0: fb1: nouveaufb frame buffer device
[drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 1


Additional info:
"Pointer to flat panel table invalid" doesn't sound good either.

I unsuccessfully tried the following hacks.

Increasing the timeout does not help:

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c b/d
rivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c
index ee98921..e8f94f2 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c
@@ -178,14 +178,14 @@ acr_ls_sec2_post_run(const struct nvkm_acr *acr, const struct nvkm_secboot *sb)
         * Once started, the falcon will end up in STOPPED condition (bit 5)
         * if successful, or in HALT condition (bit 4) if not.
         */
-       nvkm_msec(device, 1, 
+       nvkm_msec(device, 1000,
                  if ((reg = nvkm_falcon_rd32(sb->boot_falcon, 0x100) & 0x30) != 0)
                          break;
        );
        if (reg & BIT(4)) {
                nvkm_debug(subdev, "applying workaround for start bug...");
                nvkm_falcon_start(sb->boot_falcon);
-               nvkm_msec(subdev->device, 1,
+               nvkm_msec(subdev->device, 1000,
                        if ((reg = nvkm_rd32(subdev->device,
                                             sb->boot_falcon->addr + 0x100)
                             & 0x30) != 0)

Making it use the existing start bug workaround in this situation does not help either:

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c b/drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c
index ee98921..a2d5a30 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c
@@ -182,7 +182,11 @@ acr_ls_sec2_post_run(const struct nvkm_acr *acr, const struct nvkm_secboot *sb)
 		  if ((reg = nvkm_falcon_rd32(sb->boot_falcon, 0x100) & 0x30) != 0)
 			  break;
 	);
-	if (reg & BIT(4)) {
+	if ((reg & 0x30) == 0)
+		nvkm_error(subdev, "%s failed to start before timeout (reg=0x%x), retrying\n",
+		       nvkm_secboot_falcon_name[acr->boot_falcon], (unsigned int)reg);
+
+	if (reg & BIT(4) || (reg & 0x30) == 0) {
 		nvkm_debug(subdev, "applying workaround for start bug...");
 		nvkm_falcon_start(sb->boot_falcon);
 		nvkm_msec(subdev->device, 1,
@@ -196,6 +200,11 @@ acr_ls_sec2_post_run(const struct nvkm_acr *acr, const struct nvkm_secboot *sb)
 			       nvkm_secboot_falcon_name[acr->boot_falcon]);
 			return -EINVAL;
 		}
+		if ((reg & 0x30) == 0) {
+			nvkm_error(subdev, "%s failed to start before timeout again (reg=0x%x), giving up\n",
+			       nvkm_secboot_falcon_name[acr->boot_falcon], (unsigned int)reg);
+			return -EINVAL;
+		}
 	}
 
 	nvkm_debug(&sb->subdev, "%s started\n",

Both start attempts fail.

At https://www.reddit.com/r/linuxquestions/comments/760oc8/nouveau_acceleration_doesnt_work_on_cold_boot/ it is claimed that the problem can be circumvented by first booting into Windows. I cannot say whether that works because I do not have Windows.

A related issue might be https://bugs.freedesktop.org/show_bug.cgi?id=103382 -- the attachment "dmesg output" shows the same error messages. However, the reporter of that bug describes different symptoms. I do not get the flickering/artifacts described there when running with llvmpipe.

Comment 1 Jeremy Cline 2017-11-01 18:27:23 UTC
Hey, thank you for the bug report!

This bug is in a video subsystem that has a kernel part. We track and work on these bugs via the driver package name instead of leaving them assigned to the kernel.

Comment 2 stupidfrog66 2017-11-10 17:01:35 UTC
(In reply to Peter Backes from comment #0)
> Description of problem:
> I recently bought a laptop produced by Clevo, model P65xHP (according to
> /sys/devices/virtual/dmi/id/product_name), rebranded as XMG P507 1060 2017.
> The laptop has a discrete Nvidia GeForce GTX 1060 (Pascal family) and an
> Intel Core i7-7700HQ with HD Graphics 630.
> 
> % /sbin/lspci | grep -i vga
> 00:02.0 VGA compatible controller: Intel Corporation Device 591b (rev 04)
> 01:00.0 VGA compatible controller: NVIDIA Corporation GP106M [GeForce GTX
> 1060 Mobile 6GB] (rev a1)
> 
> When booting, and logging into X11, in 95% of the cases, it uses the
> llvmpipe driver for the nvidia GPU. This is caused by nouveau not being able
> to correctly initialize the Falcon.
> 
> I suppose it must be a race condition in the initialization code for the
> nvidia chip, because the problem is not deterministic. It occurs only in
> about 19 of 20 boots. The problem is written to the console ("error during
> falcon reset: -110") before initramfs asks for the LUKS password. If I
> continuously reboot on the LUKS password prompt after the error occurs, I
> get a good falcon initialization after about 20 reboots. X11 then properly
> uses the modesetting driver instead of llvmpipe, and PRIME works.
> 
> Version-Release number of selected component (if applicable):
> kernel-4.13.8-200.fc26.x86_64
> 
> How reproducible:
> Not always, about 95% of all boots
> 
> Steps to Reproduce:
> 1. boot
> 
> Actual results:
> ...
> pci 0000:01:00.0: optimus capabilities: enabled, status dynamic power, hda
> bios codec supported
> VGA switcheroo: detected Optimus DSM method \_SB_.PCI0.PEG0.PEGP handle
> nouveau: detected PR support, will not use DSM
> nouveau 0000:01:00.0: enabling device (0006 -> 0007)
> nouveau 0000:01:00.0: NVIDIA GP106 (136000a1)
> [drm] Memory usable by graphics device = 4096M
> fb: switching to inteldrmfb from EFI VGA
> Console: switching to colour dummy device 80x25
> [drm] Replacing VGA console driver
> [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> [drm] Driver supports precise vblank timestamp query.
> i915 0000:00:02.0: vgaarb: changed VGA decodes:
> olddecodes=io+mem,decodes=none:owns=io+mem
> [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_01.bin (v1.1)
> [drm] Initialized i915 1.6.0 20170619 for 0000:00:02.0 on minor 0
> ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
> input: Video Bus as
> /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input11
> ACPI: Video Device [PEGP] (multi-head: no  rom: yes  post: no)
> input: Video Bus as
> /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:11/LNXVIDEO:01/input/
> input12
> fbcon: inteldrmfb (fb0) is primary device
> ...
> nouveau 0000:01:00.0: bios: version 86.06.3f.00.16
> ...
> nouveau 0000:01:00.0: fb: 6144 MiB GDDR5
> vga_switcheroo: enabled
> [TTM] Zone  kernel: Available graphics memory: 8088720 kiB
> [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
> [TTM] Initializing pool allocator
> [TTM] Initializing DMA pool allocator
> nouveau 0000:01:00.0: DRM: VRAM: 6144 MiB
> nouveau 0000:01:00.0: DRM: GART: 1048576 MiB
> nouveau 0000:01:00.0: DRM: BIT table 'A' not found
> nouveau 0000:01:00.0: DRM: BIT table 'L' not found
> nouveau 0000:01:00.0: DRM: TMDS table version 2.0
> nouveau 0000:01:00.0: DRM: DCB version 4.1
> nouveau 0000:01:00.0: DRM: DCB outp 00: 02022f62 00020010
> nouveau 0000:01:00.0: DRM: DCB outp 01: 04844f86 04600010
> nouveau 0000:01:00.0: DRM: DCB outp 02: 04844f82 00020010
> nouveau 0000:01:00.0: DRM: DCB outp 03: 04855f96 04600020
> nouveau 0000:01:00.0: DRM: DCB outp 04: 04855f92 00020020
> nouveau 0000:01:00.0: DRM: DCB conn 02: 00010261
> nouveau 0000:01:00.0: DRM: DCB conn 04: 01000446
> nouveau 0000:01:00.0: DRM: DCB conn 05: 02000546
> nouveau 0000:01:00.0: DRM: Pointer to flat panel table invalid
> ...
> [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> [drm] Driver supports precise vblank timestamp query.
> ...
> nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies
> ...
> nouveau 0000:01:00.0: timeout
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 350 at
> drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c:184
> acr_ls_sec2_post_run+0x223/0x270 [nouveau]
> Modules linked in: uas usb_storage i915 nouveau(+) ttm rtsx_pci_sdmmc
> i2c_algo_bit mxm_wmi mmc_core drm_kms_helper crct10dif_pclmul crc32_pclmul
> crc32c_intel drm ghash_clmulni_intel nvme nvme_core rtsx_pci serio_raw wmi
> video
> CPU: 1 PID: 350 Comm: systemd-udevd Not tainted 4.13.8-200.fc26.x86_64 #1
> Hardware name: XMG P65xHP/P65xHP, BIOS 1.05.06 06/28/2017
> task: ffff8ca16bf80000 task.stack: ffff9d8c82104000
> RIP: 0010:acr_ls_sec2_post_run+0x223/0x270 [nouveau]
> RSP: 0018:ffff9d8c82107320 EFLAGS: 00010282
> RAX: 000000000000001d RBX: ffff8ca16f148350 RCX: 0000000000000002
> RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000246
> RBP: ffff9d8c82107360 R08: 000000000000001d R09: 000000000002d51c
> R10: 0000000000000000 R11: 000000000000037b R12: ffff8ca16afb20c0
> R13: 0000000000000000 R14: 0000000000000040 R15: ffff8ca16bf4a000
> FS:  00007f825d89b8c0(0000) GS:ffff8ca182040000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000055f4bda0f3d8 CR3: 00000004aad61000 CR4: 00000000003406e0
> Call Trace:
>  acr_r352_bootstrap+0x244/0x280 [nouveau]
>  acr_r352_reset+0x39/0x240 [nouveau]
>  nvkm_secboot_reset+0x2f/0x70 [nouveau]
>  gf100_gr_init_ctxctl+0x23d/0x9a0 [nouveau]
>  gp100_gr_init+0x6f0/0x720 [nouveau]
>  gf100_gr_init_+0x55/0x60 [nouveau]
>  nvkm_gr_init+0x17/0x20 [nouveau]
>  nvkm_engine_init+0x68/0x1f0 [nouveau]
>  nvkm_subdev_init+0xb0/0x200 [nouveau]
>  nvkm_engine_ref+0x4f/0x70 [nouveau]
>  nvkm_ioctl_new+0x2b4/0x300 [nouveau]
>  ? nvkm_fifo_chan_dtor+0xe0/0xe0 [nouveau]
>  ? gf100_gr_chsw_load+0x50/0x50 [nouveau]
>  nvkm_ioctl+0x118/0x280 [nouveau]
>  nvkm_client_ioctl+0x12/0x20 [nouveau]
>  nvif_object_ioctl+0x41/0x50 [nouveau]
>  nvif_object_init+0xc8/0x120 [nouveau]
>  nvc0_fbcon_accel_init+0x5b/0x910 [nouveau]
>  nouveau_fbcon_create+0x50b/0x5e0 [nouveau]
>  ? drm_setup_crtcs+0x409/0x9d0 [drm_kms_helper]
>  drm_fb_helper_initial_config+0x1f5/0x420 [drm_kms_helper]
>  nouveau_fbcon_init+0x105/0x170 [nouveau]
>  ? nouveau_bo_move_init+0xb2/0xf0 [nouveau]
>  nouveau_drm_load+0x23c/0x8f0 [nouveau]
>  ? sysfs_do_create_link_sd.isra.2+0x6c/0xc0
>  drm_dev_register+0x146/0x1d0 [drm]
>  drm_get_pci_dev+0x9a/0x180 [drm]
>  nouveau_drm_probe+0x1d7/0x260 [nouveau]
>  local_pci_probe+0x42/0xa0
>  ? pci_assign_irq+0x2b/0x120
>  pci_device_probe+0x18d/0x1a0
>  driver_probe_device+0x2ff/0x450
>  __driver_attach+0xa4/0xe0
>  ? driver_probe_device+0x450/0x450
>  bus_for_each_dev+0x6e/0xb0
>  driver_attach+0x1e/0x20
>  bus_add_driver+0x1c7/0x270
>  ? 0xffffffffc0143000
>  driver_register+0x60/0xe0
>  ? 0xffffffffc0143000
>  __pci_register_driver+0x4c/0x50
>  drm_pci_init+0xde/0xf0 [drm]
>  ? 0xffffffffc0143000
>  nouveau_drm_init+0x1e0/0x1000 [nouveau]
>  do_one_initcall+0x50/0x190
>  ? __vunmap+0x81/0xb0
>  ? kmem_cache_alloc_trace+0x15f/0x1c0
>  ? do_init_module+0x27/0x1e9
>  do_init_module+0x5f/0x1e9
>  load_module+0x2602/0x2c30
>  SYSC_init_module+0x170/0x1a0
>  ? SYSC_init_module+0x170/0x1a0
>  SyS_init_module+0xe/0x10
>  do_syscall_64+0x67/0x140
>  entry_SYSCALL64_slow_path+0x25/0x25
> RIP: 0033:0x7f825c4f218a
> RSP: 002b:00007ffd77d38828 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
> RAX: ffffffffffffffda RBX: 000055f4bcf146d0 RCX: 00007f825c4f218a
> RDX: 00007f825d0289c5 RSI: 000000000028108b RDI: 000055f4bd78e340
> RBP: 00007f825d0289c5 R08: 000055f4bcf0eda0 R09: 0000000000000030
> R10: 00007f825c7acb00 R11: 0000000000000246 R12: 000055f4bd78e340
> R13: 000055f4bcf13bf0 R14: 0000000000020000 R15: 000055f4bc6f3f4a
> Code: 37 ef e9 d6 fe ff ff 49 8b 7f 10 48 8b 5f 50 48 85 db 74 51 e8 6f 40
> 37 ef 48 89 da 48 89 c6 48 c7 c7 83 77 2f c0 e8 4e 51 ef ee <0f> ff e9 93 fe
> ff ff 48 8b 45 c8 48 8b 78 10 48 8b 5f 50 48 85 
> ---[ end trace 3c842e828c462e5e ]---
> ...
> nouveau 0000:01:00.0: secboot: error during falcon reset: -110
> nouveau 0000:01:00.0: gr: init failed, -110
> nouveau 0000:01:00.0: DRM: allocated 1920x1200 fb: 0x60000, bo
> ffff8ca16ae72000
> nouveau 0000:01:00.0: fb1: nouveaufb frame buffer device
> [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 1
> ...
> nouveau 0000:01:00.0: gr: FECS falcon already acquired by gr!
> nouveau 0000:01:00.0: gr: init failed, -16
> 
> Expected results:
> ...
> nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies
> ...
> nouveau 0000:01:00.0: DRM: allocated 1920x1200 fb: 0x60000, bo
> ffffa081ea1c1000
> ...
> nouveau 0000:01:00.0: fb1: nouveaufb frame buffer device
> [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 1
> 
> 
> Additional info:
> "Pointer to flat panel table invalid" doesn't sound good either.
> 
> I unsuccessfully tried the following hacks.
> 
> Increasing the timeout does not help:
> 
> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c
> b/d
> rivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c
> index ee98921..e8f94f2 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c
> @@ -178,14 +178,14 @@ acr_ls_sec2_post_run(const struct nvkm_acr *acr, const
> struct nvkm_secboot *sb)
>          * Once started, the falcon will end up in STOPPED condition (bit 5)
>          * if successful, or in HALT condition (bit 4) if not.
>          */
> -       nvkm_msec(device, 1, 
> +       nvkm_msec(device, 1000,
>                   if ((reg = nvkm_falcon_rd32(sb->boot_falcon, 0x100) &
> 0x30) != 0)
>                           break;
>         );
>         if (reg & BIT(4)) {
>                 nvkm_debug(subdev, "applying workaround for start bug...");
>                 nvkm_falcon_start(sb->boot_falcon);
> -               nvkm_msec(subdev->device, 1,
> +               nvkm_msec(subdev->device, 1000,
>                         if ((reg = nvkm_rd32(subdev->device,
>                                              sb->boot_falcon->addr + 0x100)
>                              & 0x30) != 0)
> 
> Making it use the existing start bug workaround in this situation does not
> help either:
> 
> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c
> b/drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c
> index ee98921..a2d5a30 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c
> @@ -182,7 +182,11 @@ acr_ls_sec2_post_run(const struct nvkm_acr *acr, const
> struct nvkm_secboot *sb)
>  		  if ((reg = nvkm_falcon_rd32(sb->boot_falcon, 0x100) & 0x30) != 0)
>  			  break;
>  	);
> -	if (reg & BIT(4)) {
> +	if ((reg & 0x30) == 0)
> +		nvkm_error(subdev, "%s failed to start before timeout (reg=0x%x),
> retrying\n",
> +		       nvkm_secboot_falcon_name[acr->boot_falcon], (unsigned int)reg);
> +
> +	if (reg & BIT(4) || (reg & 0x30) == 0) {
>  		nvkm_debug(subdev, "applying workaround for start bug...");
>  		nvkm_falcon_start(sb->boot_falcon);
>  		nvkm_msec(subdev->device, 1,
> @@ -196,6 +200,11 @@ acr_ls_sec2_post_run(const struct nvkm_acr *acr, const
> struct nvkm_secboot *sb)
>  			       nvkm_secboot_falcon_name[acr->boot_falcon]);
>  			return -EINVAL;
>  		}
> +		if ((reg & 0x30) == 0) {
> +			nvkm_error(subdev, "%s failed to start before timeout again (reg=0x%x),
> giving up\n",
> +			       nvkm_secboot_falcon_name[acr->boot_falcon], (unsigned int)reg);
> +			return -EINVAL;
> +		}
>  	}
>  
>  	nvkm_debug(&sb->subdev, "%s started\n",
> 
> Both start attempts fail.
> 
> At
> https://www.reddit.com/r/linuxquestions/comments/760oc8/
> nouveau_acceleration_doesnt_work_on_cold_boot/ it is claimed that the
> problem can be circumvented by first booting into Windows. I cannot say
> whether that works because I do not have Windows.
> 
> A related issue might be https://bugs.freedesktop.org/show_bug.cgi?id=103382
> -- the attachment "dmesg output" shows the same error messages. However, the
> reporter of that bug describes different symptoms. I do not get the
> flickering/artifacts described there when running with llvmpipe.

I am actually the person who made that Reddit post, in my case the firmware always fails to load unless I reboot from Windows or immediately after removing the Nvidia blob.

Just in it's an AIB specific VBIOS issue may I ask what's the make of your 1060?

Comment 3 Peter Backes 2017-11-10 20:37:15 UTC
(In reply to stupidfrog66 from comment #2)
> I am actually the person who made that Reddit post, in my case the firmware
> always fails to load unless I reboot from Windows or immediately after
> removing the Nvidia blob.

Please post the full relevant dmesg entries (see my bug report as an example), at least those of the nouveau module. Only then we can see whether you actually have the same issue as me.

This bug is not about loading the firmware, but about initializing the falcon (aka fuc). Apparently, the falcon initialization fails with a timeout. Do you see this timeout in your dmesg logs, too?

What do you mean by "always fails"? How often did you actually try? As I have stressed, the problem seems to be there most of the time for me, but about one in 20 reboots actually succeeds, without any binary blob or Windows. It can take quite a lot of attempts.

> Just in it's an AIB specific VBIOS issue may I ask what's the make of your
> 1060?

As I have said in my bug report, I am using the 1060 built into the Clevo P65xHP laptop, and the various parameters are there, too ("01:00.0 VGA compatible controller: NVIDIA Corporation GP106M [GeForce GTX 1060 Mobile 6GB] (rev a1)", "NVIDIA GP106 (136000a1)", "bios: version 86.06.3f.00.16"). I don't know more than that.

Comment 4 Peter Backes 2017-11-10 21:09:45 UTC
This thread seems to discuss the same or a closely related issue: https://lists.freedesktop.org/archives/nouveau/2017-September/028813.html

Comment 5 stupidfrog66 2017-11-12 16:14:09 UTC
Created attachment 1351244 [details]
dmesg output

Here is my dmesg output. It's the exact same error in ls_ucode_msgqueue.c

Comment 6 Fedora End Of Life 2018-05-03 07:58:18 UTC
This message is a reminder that Fedora 26 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 26. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '26'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 26 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 7 Brion Vibber 2018-05-03 23:04:19 UTC
I am experiencing what appears to be the same issue in Fedora 28, also with a GTX 1060 (as a discrete add-in card to a desktop system). Never saw it under Fedora 27 previously. There is no other GPU or VGA card on the system. At 2160p the system is unusably slow and barely responsive using llvmpipe.

Will add dmesg output below...

Comment 8 Brion Vibber 2018-05-03 23:12:51 UTC
Created attachment 1430946 [details]
dmesg output on F28 desktop system also affected

Comment 9 Fedora End Of Life 2018-05-29 12:18:33 UTC
Fedora 26 changed to end-of-life (EOL) status on 2018-05-29. Fedora 26
is no longer maintained, which means that it will not receive any
further security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.