Bug 1400623 - Nvidia NV50 not working with ARM64 64k pages [NEEDINFO]
Summary: Nvidia NV50 not working with ARM64 64k pages
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 25
Hardware: aarch64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Keywords:
Depends On:
Blocks: ARMTracker
TreeView+ depends on / blocked
 
Reported: 2016-12-01 16:08 UTC by Jeremy Linton
Modified: 2019-01-09 12:54 UTC (History)
8 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2017-04-28 17:17:32 UTC
jforbes: needinfo?


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
FreeDesktop.org 94757 None None None 2016-12-03 07:21 UTC

Description Jeremy Linton 2016-12-01 16:08:22 UTC
Description of problem: Putting a 9600GT (NV50) in a softiron 3k machine the nouveau driver fails to claim the board with -ENOMEM. This appears to be a 64k page bug in the ttm subsystem (and therefore likely affects more than just the NV50). Building the fedora kernel with 4k pages fixes the problem and the console then appears on a monitor connected to the board. 

Version-Release number of selected component (if applicable):


How reproducible: 100%


Steps to Reproduce:
1. Install NV50 with a stock fedora
2. Look at attached card 
3.

Actual results:
[root@mammon-seattle-raw ~]# lspci
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1a00
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1a01
00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1a02
01:00.0 VGA compatible controller: NVIDIA Corporation G94 [GeForce 9600 GT] (rev a1)

[root@mammon-seattle-raw modules]# modprobe nouveau
[drm] Initialized
nouveau 0000:01:00.0: NVIDIA G94 (094100a1)
nouveau 0000:01:00.0: bios: version 62.94.0d.00.04
nouveau: probe of 0000:01:00.0 failed with error -22

Expected results:


Additional info:
So there are a couple other "gocha's" here. First is that the nvidia driver when built with ACPI requires the acpi_button module, which won't load in a DT enviroment. So at the moment even with a 4k kernel the machine must be running with acpi. There is also a nasty message with ACPI if the ACPI/ARM/MSI patches aren't in place (mainline). 


With debug=spam on nouveau and dyndbg=+p for ttm and a little extra custom debug info, the failure that kills it seems to be in ttm_bo_validate.

[77216.692605] [<ffff000001404120>] ttm_bo_validate+0xb0/0x1e8 [ttm]
[77216.698697] [<ffff0000014045ac>] ttm_bo_init+0x354/0x410 [ttm]
[77216.704706] [<ffff0000019d7bd0>] nouveau_bo_new+0x1f4/0x314 [nouveau]
[77216.711308] [<ffff0000019e4620>] nv50_display_create+0x10c/0xa1c [nouveau]
[77216.718340] [<ffff0000019df898>] nouveau_display_create+0x50c/0x59c [nouveau]
[77216.725632] [<ffff0000019d3e24>] nouveau_drm_load+0x22c/0x8c0 [nouveau]
[77216.732286] [<ffff00000137a1a0>] drm_dev_register+0xc0/0xf0 [drm]
[77216.738409] [<ffff00000137b8a4>] drm_get_pci_dev+0xbc/0x188 [drm]
[77216.744663] [<ffff0000019d35e8>] nouveau_drm_probe+0x180/0x208 [nouveau]
[77216.751354] [<ffff0000084c30dc>] local_pci_probe+0x50/0xb4
[77216.756827] [<ffff0000084c3e40>] pci_device_probe+0xf8/0x148
[77216.762474] [<ffff0000085b6a10>] driver_probe_device+0x284/0x420
[77216.768467] [<ffff0000085b6ccc>] __driver_attach+0x120/0x124
[77216.774115] [<ffff0000085b446c>] bus_for_each_dev+0x6c/0xac
[77216.779673] [<ffff0000085b6204>] driver_attach+0x2c/0x34
[77216.784972] [<ffff0000085b5cb4>] bus_add_driver+0x244/0x2b0
[77216.790531] [<ffff0000085b78e4>] driver_register+0x68/0xfc
[77216.796004] [<ffff0000084c29a8>] __pci_register_driver+0x60/0x6c
[77216.802047] [<ffff00000137bcb8>] drm_pci_init+0x108/0x138 [drm]
[77216.808146] [<ffff000001530158>] nouveau_drm_init+0x158/0x10000 [nouveau]
[77216.814922] [<ffff0000080831a8>] do_one_initcall+0x44/0x128
[77216.820483] [<ffff0000081cad6c>] do_init_module+0x68/0x1e0
[77216.825957] [<ffff000008150d84>] load_module+0xfac/0x12bc
[77216.831343] [<ffff00000815132c>] SyS_finit_module+0xe4/0xf0
[77216.836902] [<ffff000008082b70>] el0_svc_naked+0x24/0x28

Where the reallocate fails. But that may not be the root cause..

Comment 1 Jeremy Linton 2016-12-02 21:26:39 UTC
This is apparently a known defect.

https://bugs.freedesktop.org/show_bug.cgi?id=94757

Comment 2 Justin M. Forbes 2017-04-11 14:45:01 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 25 kernel bugs.

Fedora 25 has now been rebased to 4.10.9-200.fc25.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 26, and are still experiencing this issue, please change the version to Fedora 26.

If you experience different issues, please open a new bug report for those.

Comment 3 Justin M. Forbes 2017-04-28 17:17:32 UTC
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 2 weeks. If you are still experiencing this issue, please reopen and attach the 
relevant data from the latest kernel you are running and any data that might have been requested previously.


Note You need to log in before you can comment on or make changes to this bug.