Bug 1508088
Summary: | nouveau fails to boot falcon on Nvidia GeForce GTX1060 most of the time | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Peter Backes <rtc> | ||||||
Component: | xorg-x11-drv-nouveau | Assignee: | Ben Skeggs <bskeggs> | ||||||
Status: | CLOSED EOL | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 26 | CC: | airlied, ajax, brion, bskeggs, ewk, garethsime, hdegoede, ichavero, itamar, jarodwilson, jcline, jeremy, jglisse, john.j5live, jonathan, josef, kernel-maint, linville, mchehab, mjg59, rtc, steved, stupidfrog66 | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2018-05-29 12:18:33 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Peter Backes
2017-10-31 19:55:57 UTC
Hey, thank you for the bug report! This bug is in a video subsystem that has a kernel part. We track and work on these bugs via the driver package name instead of leaving them assigned to the kernel. (In reply to Peter Backes from comment #0) > Description of problem: > I recently bought a laptop produced by Clevo, model P65xHP (according to > /sys/devices/virtual/dmi/id/product_name), rebranded as XMG P507 1060 2017. > The laptop has a discrete Nvidia GeForce GTX 1060 (Pascal family) and an > Intel Core i7-7700HQ with HD Graphics 630. > > % /sbin/lspci | grep -i vga > 00:02.0 VGA compatible controller: Intel Corporation Device 591b (rev 04) > 01:00.0 VGA compatible controller: NVIDIA Corporation GP106M [GeForce GTX > 1060 Mobile 6GB] (rev a1) > > When booting, and logging into X11, in 95% of the cases, it uses the > llvmpipe driver for the nvidia GPU. This is caused by nouveau not being able > to correctly initialize the Falcon. > > I suppose it must be a race condition in the initialization code for the > nvidia chip, because the problem is not deterministic. It occurs only in > about 19 of 20 boots. The problem is written to the console ("error during > falcon reset: -110") before initramfs asks for the LUKS password. If I > continuously reboot on the LUKS password prompt after the error occurs, I > get a good falcon initialization after about 20 reboots. X11 then properly > uses the modesetting driver instead of llvmpipe, and PRIME works. > > Version-Release number of selected component (if applicable): > kernel-4.13.8-200.fc26.x86_64 > > How reproducible: > Not always, about 95% of all boots > > Steps to Reproduce: > 1. boot > > Actual results: > ... > pci 0000:01:00.0: optimus capabilities: enabled, status dynamic power, hda > bios codec supported > VGA switcheroo: detected Optimus DSM method \_SB_.PCI0.PEG0.PEGP handle > nouveau: detected PR support, will not use DSM > nouveau 0000:01:00.0: enabling device (0006 -> 0007) > nouveau 0000:01:00.0: NVIDIA GP106 (136000a1) > [drm] Memory usable by graphics device = 4096M > fb: switching to inteldrmfb from EFI VGA > Console: switching to colour dummy device 80x25 > [drm] Replacing VGA console driver > [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). > [drm] Driver supports precise vblank timestamp query. > i915 0000:00:02.0: vgaarb: changed VGA decodes: > olddecodes=io+mem,decodes=none:owns=io+mem > [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_01.bin (v1.1) > [drm] Initialized i915 1.6.0 20170619 for 0000:00:02.0 on minor 0 > ACPI: Video Device [GFX0] (multi-head: yes rom: no post: no) > input: Video Bus as > /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input11 > ACPI: Video Device [PEGP] (multi-head: no rom: yes post: no) > input: Video Bus as > /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:11/LNXVIDEO:01/input/ > input12 > fbcon: inteldrmfb (fb0) is primary device > ... > nouveau 0000:01:00.0: bios: version 86.06.3f.00.16 > ... > nouveau 0000:01:00.0: fb: 6144 MiB GDDR5 > vga_switcheroo: enabled > [TTM] Zone kernel: Available graphics memory: 8088720 kiB > [TTM] Zone dma32: Available graphics memory: 2097152 kiB > [TTM] Initializing pool allocator > [TTM] Initializing DMA pool allocator > nouveau 0000:01:00.0: DRM: VRAM: 6144 MiB > nouveau 0000:01:00.0: DRM: GART: 1048576 MiB > nouveau 0000:01:00.0: DRM: BIT table 'A' not found > nouveau 0000:01:00.0: DRM: BIT table 'L' not found > nouveau 0000:01:00.0: DRM: TMDS table version 2.0 > nouveau 0000:01:00.0: DRM: DCB version 4.1 > nouveau 0000:01:00.0: DRM: DCB outp 00: 02022f62 00020010 > nouveau 0000:01:00.0: DRM: DCB outp 01: 04844f86 04600010 > nouveau 0000:01:00.0: DRM: DCB outp 02: 04844f82 00020010 > nouveau 0000:01:00.0: DRM: DCB outp 03: 04855f96 04600020 > nouveau 0000:01:00.0: DRM: DCB outp 04: 04855f92 00020020 > nouveau 0000:01:00.0: DRM: DCB conn 02: 00010261 > nouveau 0000:01:00.0: DRM: DCB conn 04: 01000446 > nouveau 0000:01:00.0: DRM: DCB conn 05: 02000546 > nouveau 0000:01:00.0: DRM: Pointer to flat panel table invalid > ... > [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). > [drm] Driver supports precise vblank timestamp query. > ... > nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies > ... > nouveau 0000:01:00.0: timeout > ------------[ cut here ]------------ > WARNING: CPU: 1 PID: 350 at > drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c:184 > acr_ls_sec2_post_run+0x223/0x270 [nouveau] > Modules linked in: uas usb_storage i915 nouveau(+) ttm rtsx_pci_sdmmc > i2c_algo_bit mxm_wmi mmc_core drm_kms_helper crct10dif_pclmul crc32_pclmul > crc32c_intel drm ghash_clmulni_intel nvme nvme_core rtsx_pci serio_raw wmi > video > CPU: 1 PID: 350 Comm: systemd-udevd Not tainted 4.13.8-200.fc26.x86_64 #1 > Hardware name: XMG P65xHP/P65xHP, BIOS 1.05.06 06/28/2017 > task: ffff8ca16bf80000 task.stack: ffff9d8c82104000 > RIP: 0010:acr_ls_sec2_post_run+0x223/0x270 [nouveau] > RSP: 0018:ffff9d8c82107320 EFLAGS: 00010282 > RAX: 000000000000001d RBX: ffff8ca16f148350 RCX: 0000000000000002 > RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000246 > RBP: ffff9d8c82107360 R08: 000000000000001d R09: 000000000002d51c > R10: 0000000000000000 R11: 000000000000037b R12: ffff8ca16afb20c0 > R13: 0000000000000000 R14: 0000000000000040 R15: ffff8ca16bf4a000 > FS: 00007f825d89b8c0(0000) GS:ffff8ca182040000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 000055f4bda0f3d8 CR3: 00000004aad61000 CR4: 00000000003406e0 > Call Trace: > acr_r352_bootstrap+0x244/0x280 [nouveau] > acr_r352_reset+0x39/0x240 [nouveau] > nvkm_secboot_reset+0x2f/0x70 [nouveau] > gf100_gr_init_ctxctl+0x23d/0x9a0 [nouveau] > gp100_gr_init+0x6f0/0x720 [nouveau] > gf100_gr_init_+0x55/0x60 [nouveau] > nvkm_gr_init+0x17/0x20 [nouveau] > nvkm_engine_init+0x68/0x1f0 [nouveau] > nvkm_subdev_init+0xb0/0x200 [nouveau] > nvkm_engine_ref+0x4f/0x70 [nouveau] > nvkm_ioctl_new+0x2b4/0x300 [nouveau] > ? nvkm_fifo_chan_dtor+0xe0/0xe0 [nouveau] > ? gf100_gr_chsw_load+0x50/0x50 [nouveau] > nvkm_ioctl+0x118/0x280 [nouveau] > nvkm_client_ioctl+0x12/0x20 [nouveau] > nvif_object_ioctl+0x41/0x50 [nouveau] > nvif_object_init+0xc8/0x120 [nouveau] > nvc0_fbcon_accel_init+0x5b/0x910 [nouveau] > nouveau_fbcon_create+0x50b/0x5e0 [nouveau] > ? drm_setup_crtcs+0x409/0x9d0 [drm_kms_helper] > drm_fb_helper_initial_config+0x1f5/0x420 [drm_kms_helper] > nouveau_fbcon_init+0x105/0x170 [nouveau] > ? nouveau_bo_move_init+0xb2/0xf0 [nouveau] > nouveau_drm_load+0x23c/0x8f0 [nouveau] > ? sysfs_do_create_link_sd.isra.2+0x6c/0xc0 > drm_dev_register+0x146/0x1d0 [drm] > drm_get_pci_dev+0x9a/0x180 [drm] > nouveau_drm_probe+0x1d7/0x260 [nouveau] > local_pci_probe+0x42/0xa0 > ? pci_assign_irq+0x2b/0x120 > pci_device_probe+0x18d/0x1a0 > driver_probe_device+0x2ff/0x450 > __driver_attach+0xa4/0xe0 > ? driver_probe_device+0x450/0x450 > bus_for_each_dev+0x6e/0xb0 > driver_attach+0x1e/0x20 > bus_add_driver+0x1c7/0x270 > ? 0xffffffffc0143000 > driver_register+0x60/0xe0 > ? 0xffffffffc0143000 > __pci_register_driver+0x4c/0x50 > drm_pci_init+0xde/0xf0 [drm] > ? 0xffffffffc0143000 > nouveau_drm_init+0x1e0/0x1000 [nouveau] > do_one_initcall+0x50/0x190 > ? __vunmap+0x81/0xb0 > ? kmem_cache_alloc_trace+0x15f/0x1c0 > ? do_init_module+0x27/0x1e9 > do_init_module+0x5f/0x1e9 > load_module+0x2602/0x2c30 > SYSC_init_module+0x170/0x1a0 > ? SYSC_init_module+0x170/0x1a0 > SyS_init_module+0xe/0x10 > do_syscall_64+0x67/0x140 > entry_SYSCALL64_slow_path+0x25/0x25 > RIP: 0033:0x7f825c4f218a > RSP: 002b:00007ffd77d38828 EFLAGS: 00000246 ORIG_RAX: 00000000000000af > RAX: ffffffffffffffda RBX: 000055f4bcf146d0 RCX: 00007f825c4f218a > RDX: 00007f825d0289c5 RSI: 000000000028108b RDI: 000055f4bd78e340 > RBP: 00007f825d0289c5 R08: 000055f4bcf0eda0 R09: 0000000000000030 > R10: 00007f825c7acb00 R11: 0000000000000246 R12: 000055f4bd78e340 > R13: 000055f4bcf13bf0 R14: 0000000000020000 R15: 000055f4bc6f3f4a > Code: 37 ef e9 d6 fe ff ff 49 8b 7f 10 48 8b 5f 50 48 85 db 74 51 e8 6f 40 > 37 ef 48 89 da 48 89 c6 48 c7 c7 83 77 2f c0 e8 4e 51 ef ee <0f> ff e9 93 fe > ff ff 48 8b 45 c8 48 8b 78 10 48 8b 5f 50 48 85 > ---[ end trace 3c842e828c462e5e ]--- > ... > nouveau 0000:01:00.0: secboot: error during falcon reset: -110 > nouveau 0000:01:00.0: gr: init failed, -110 > nouveau 0000:01:00.0: DRM: allocated 1920x1200 fb: 0x60000, bo > ffff8ca16ae72000 > nouveau 0000:01:00.0: fb1: nouveaufb frame buffer device > [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 1 > ... > nouveau 0000:01:00.0: gr: FECS falcon already acquired by gr! > nouveau 0000:01:00.0: gr: init failed, -16 > > Expected results: > ... > nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies > ... > nouveau 0000:01:00.0: DRM: allocated 1920x1200 fb: 0x60000, bo > ffffa081ea1c1000 > ... > nouveau 0000:01:00.0: fb1: nouveaufb frame buffer device > [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 1 > > > Additional info: > "Pointer to flat panel table invalid" doesn't sound good either. > > I unsuccessfully tried the following hacks. > > Increasing the timeout does not help: > > diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c > b/d > rivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c > index ee98921..e8f94f2 100644 > --- a/drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c > +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c > @@ -178,14 +178,14 @@ acr_ls_sec2_post_run(const struct nvkm_acr *acr, const > struct nvkm_secboot *sb) > * Once started, the falcon will end up in STOPPED condition (bit 5) > * if successful, or in HALT condition (bit 4) if not. > */ > - nvkm_msec(device, 1, > + nvkm_msec(device, 1000, > if ((reg = nvkm_falcon_rd32(sb->boot_falcon, 0x100) & > 0x30) != 0) > break; > ); > if (reg & BIT(4)) { > nvkm_debug(subdev, "applying workaround for start bug..."); > nvkm_falcon_start(sb->boot_falcon); > - nvkm_msec(subdev->device, 1, > + nvkm_msec(subdev->device, 1000, > if ((reg = nvkm_rd32(subdev->device, > sb->boot_falcon->addr + 0x100) > & 0x30) != 0) > > Making it use the existing start bug workaround in this situation does not > help either: > > diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c > b/drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c > index ee98921..a2d5a30 100644 > --- a/drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c > +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/secboot/ls_ucode_msgqueue.c > @@ -182,7 +182,11 @@ acr_ls_sec2_post_run(const struct nvkm_acr *acr, const > struct nvkm_secboot *sb) > if ((reg = nvkm_falcon_rd32(sb->boot_falcon, 0x100) & 0x30) != 0) > break; > ); > - if (reg & BIT(4)) { > + if ((reg & 0x30) == 0) > + nvkm_error(subdev, "%s failed to start before timeout (reg=0x%x), > retrying\n", > + nvkm_secboot_falcon_name[acr->boot_falcon], (unsigned int)reg); > + > + if (reg & BIT(4) || (reg & 0x30) == 0) { > nvkm_debug(subdev, "applying workaround for start bug..."); > nvkm_falcon_start(sb->boot_falcon); > nvkm_msec(subdev->device, 1, > @@ -196,6 +200,11 @@ acr_ls_sec2_post_run(const struct nvkm_acr *acr, const > struct nvkm_secboot *sb) > nvkm_secboot_falcon_name[acr->boot_falcon]); > return -EINVAL; > } > + if ((reg & 0x30) == 0) { > + nvkm_error(subdev, "%s failed to start before timeout again (reg=0x%x), > giving up\n", > + nvkm_secboot_falcon_name[acr->boot_falcon], (unsigned int)reg); > + return -EINVAL; > + } > } > > nvkm_debug(&sb->subdev, "%s started\n", > > Both start attempts fail. > > At > https://www.reddit.com/r/linuxquestions/comments/760oc8/ > nouveau_acceleration_doesnt_work_on_cold_boot/ it is claimed that the > problem can be circumvented by first booting into Windows. I cannot say > whether that works because I do not have Windows. > > A related issue might be https://bugs.freedesktop.org/show_bug.cgi?id=103382 > -- the attachment "dmesg output" shows the same error messages. However, the > reporter of that bug describes different symptoms. I do not get the > flickering/artifacts described there when running with llvmpipe. I am actually the person who made that Reddit post, in my case the firmware always fails to load unless I reboot from Windows or immediately after removing the Nvidia blob. Just in it's an AIB specific VBIOS issue may I ask what's the make of your 1060? (In reply to stupidfrog66 from comment #2) > I am actually the person who made that Reddit post, in my case the firmware > always fails to load unless I reboot from Windows or immediately after > removing the Nvidia blob. Please post the full relevant dmesg entries (see my bug report as an example), at least those of the nouveau module. Only then we can see whether you actually have the same issue as me. This bug is not about loading the firmware, but about initializing the falcon (aka fuc). Apparently, the falcon initialization fails with a timeout. Do you see this timeout in your dmesg logs, too? What do you mean by "always fails"? How often did you actually try? As I have stressed, the problem seems to be there most of the time for me, but about one in 20 reboots actually succeeds, without any binary blob or Windows. It can take quite a lot of attempts. > Just in it's an AIB specific VBIOS issue may I ask what's the make of your > 1060? As I have said in my bug report, I am using the 1060 built into the Clevo P65xHP laptop, and the various parameters are there, too ("01:00.0 VGA compatible controller: NVIDIA Corporation GP106M [GeForce GTX 1060 Mobile 6GB] (rev a1)", "NVIDIA GP106 (136000a1)", "bios: version 86.06.3f.00.16"). I don't know more than that. This thread seems to discuss the same or a closely related issue: https://lists.freedesktop.org/archives/nouveau/2017-September/028813.html Created attachment 1351244 [details]
dmesg output
Here is my dmesg output. It's the exact same error in ls_ucode_msgqueue.c
This message is a reminder that Fedora 26 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 26. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '26'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 26 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. I am experiencing what appears to be the same issue in Fedora 28, also with a GTX 1060 (as a discrete add-in card to a desktop system). Never saw it under Fedora 27 previously. There is no other GPU or VGA card on the system. At 2160p the system is unusably slow and barely responsive using llvmpipe. Will add dmesg output below... Created attachment 1430946 [details]
dmesg output on F28 desktop system also affected
Fedora 26 changed to end-of-life (EOL) status on 2018-05-29. Fedora 26 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. |