Description of problem: I'm experiencing a nouveau driver page fault when trying to use the fedora kernel with gnome-shell on jetson-tk1 (armhfp) Version-Release number of selected component (if applicable): kernel-5.11.5-300.fc34.armv7hl How reproducible: always Steps to Reproduce: 1. on jetson-tk1. gnome. systemctl isolate graphical 2. 3. Actual results: page:1706ccc7 refcount:0 mapcount:0 mapping:29d7e10e index:0x10039 pfn:0xf0481 aops:anon_aops.1 ino:48d7 flags: 0xf800000() raw: 0f800000 eec8a24c efbe1678 c2686110 00010039 00000000 ffffffff 00000000 raw: 00000000 page dumped because: VM_BUG_ON_PAGE(((unsigned int) page_ref_count(page) + 127u <= 127u)) ------------[ cut here ]------------ kernel BUG at include/linux/mm.h:1179! Internal error: Oops - BUG: 0 [#1] SMP ARM Modules linked in: rfkill ofpart spi_nor mtd snd_soc_tegra30_i2s snd_soc_tegra_pcm tegra_drm snd_soc_tegra_rt5640 snd_soc_tegra_utils snd_soc_rt5640 snd_hda_codec_hdmi snd_soc_rl6231 snd_hd> CPU: 2 PID: 859 Comm: gnome-shell Not tainted 5.11.5-300.fc34.armv7hl #1 Hardware name: NVIDIA Tegra SoC (Flattened Device Tree) PC is at get_page+0x20/0x38 LR is at __dump_page+0x110/0x464 pc : [<c04caeec>] lr : [<c04c69ec>] psr: 60000113 sp : c73ebdf0 ip : 2eb7a000 fp : a747e000 r10: a747f000 r9 : 0000071f r8 : c44b5600 r7 : a747f000 r6 : c75d21fc r5 : 00000000 r4 : eec8a224 r3 : 00000027 r2 : 00000027 r1 : 00000000 r0 : 00000059 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 10c5387d Table: 873e406a DAC: 00000051 Process gnome-shell (pid: 859, stack limit = 0xdd661172) Stack: (0xc73ebdf0 to 0xc73ec000) bde0: eec8a224 c04cd81c c2b399c0 eefc919c be00: eec8a224 0000071f c2b399c0 a747f000 000f0481 00000001 00000000 c04cdbb4 be20: 00000000 00000000 00000000 00000000 00000000 c5089000 00000001 c7208d00 be40: c2b399c0 00000001 0000071f c04cdcec 0000071f 00000000 000f0481 bf29a5c0 be60: 0000071f 00000001 00000080 00000010 00000000 00000001 00000000 00000000 be80: 00000000 00000000 c352ea30 00000000 c73ebef4 c5089000 c2b399c0 c73ebfb0 bea0: 00000040 c44b5648 00000800 bf3bda84 c73ebef4 c2b399c0 00000255 a747e000 bec0: c73ebfb0 c04cb7f4 00000001 c2b399c0 00000255 c04ceff0 c1b9a894 bebbf974 bee0: c5ec4200 c099f9f4 fffffff3 c099f9f4 00000000 c2b399c0 00000255 00100cca bf00: 00010038 a747e000 c73e69d0 c73e69d0 00000000 00000000 00000000 00000000 bf20: 00000000 eefc9164 c03002a4 c73ebfb0 a747e000 c2b399c0 c44b5600 00000805 bf40: 00000255 c44b5648 00000800 c0d37788 00000000 c04023b8 c5bdfa28 c5bdf800 bf60: c5bdfa1c 00000805 a747e000 ffffffff c73ebfb0 c1510e20 aef3db00 00001000 bf80: 00000000 c031400c a747e000 00000805 c73ebfb0 0000906e ae9757f0 40000010 bfa0: ffffffff 10c5387d 10c5387d c0300e80 0000906e 2001e000 a747e000 a747e000 bfc0: 0152f7f8 0152eee8 0152f7f8 0152ed38 0152ee58 aef3db00 00001000 00000000 bfe0: 00000000 bebbf9c0 afd22fec ae9757f0 40000010 ffffffff 00000000 00000000 [<c04caeec>] (get_page) from [<c04cd81c>] (insert_page+0xa8/0x114) [<c04cd81c>] (insert_page) from [<c04cdbb4>] (__vm_insert_mixed+0x94/0x1ac) [<c04cdbb4>] (__vm_insert_mixed) from [<c04cdcec>] (vmf_insert_mixed_prot+0x20/0x28) [<c04cdcec>] (vmf_insert_mixed_prot) from [<bf29a5c0>] (ttm_bo_vm_fault_reserved+0x280/0x318 [ttm]) [<bf29a5c0>] (ttm_bo_vm_fault_reserved [ttm]) from [<bf3bda84>] (nouveau_ttm_fault+0x60/0x90 [nouveau]) [<bf3bda84>] (nouveau_ttm_fault [nouveau]) from [<c04cb7f4>] (__do_fault+0x58/0xb0) [<c04cb7f4>] (__do_fault) from [<c04ceff0>] (handle_mm_fault+0x7c0/0x97c) [<c04ceff0>] (handle_mm_fault) from [<c0d37788>] (do_page_fault+0x2c0/0x348) [<c0d37788>] (do_page_fault) from [<c031400c>] (do_DataAbort+0x3c/0xbc) [<c031400c>] (do_DataAbort) from [<c0300e80>] (__dabt_usr+0x40/0x60) Exception stack(0xc73ebfb0 to 0xc73ebff8) bfa0: 0000906e 2001e000 a747e000 a747e000 bfc0: 0152f7f8 0152eee8 0152f7f8 0152ed38 0152ee58 aef3db00 00001000 00000000 bfe0: 00000000 bebbf9c0 afd22fec ae9757f0 40000010 ffffffff Code: e353007f 8a000002 e59f1014 ebffef94 (e7f001f2) ---[ end trace 38b95f8878f32175 ]--- Expected results: no page fault. Additional info: I'm not reproducing using the grate downstream kernel based on linux-next 20210302. I will try to reproduce with vanilla linux-next in the coming days.
FYI, I'm not reproducing using linux-next 20210302. Will try with 5.12-rc1...
5.12-rc1 also (still) have the page fault bug. But the triggered fault is a different one (related to polkit), and there I can have a graphical display... (but too unstable to verify gpu acceleration). [ 58.003759] BUG: Bad page state in process polkitd pfn:ee9b1 [ 58.009509] page:8a64ce78 refcount:2 mapcount:129 mapping:473e54ab index:0x0 pfn:0xee9b1 [ 58.017597] aops:0xc0b0ea14 ino:1749 [ 58.021177] flags: 0x40000000() [ 58.024339] raw: 40000000 00000100 00000122 c43d81f8 00000000 00000000 00000080 00000002 [ 58.032422] page dumped because: nonzero _refcount [ 58.037204] Modules linked in: nouveau tegra_drm host1x drm_ttm_helper tegra_soctherm ttm iova zram zsmalloc xhci_tegra ci_hdrc_tegra phy_tegra_xusb ahci_tegra libahci_platform tegra124_e [ 58.061017] CPU: 2 PID: 689 Comm: polkitd Not tainted 5.12.0-rc2-tegra+ #198 [ 58.068051] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree) [ 58.074305] [<c010ec40>] (unwind_backtrace) from [<c010a1ec>] (show_stack+0x10/0x14) [ 58.082039] [<c010a1ec>] (show_stack) from [<c0a86b20>] (dump_stack+0xc0/0xd4) [ 58.089250] [<c0a86b20>] (dump_stack) from [<c02341ec>] (bad_page+0xdc/0x10c) [ 58.096373] [<c02341ec>] (bad_page) from [<c02383d4>] (get_page_from_freelist+0xde8/0x116c) [ 58.104709] [<c02383d4>] (get_page_from_freelist) from [<c0238cd8>] (__alloc_pages_nodemask+0x17c/0x1014) [ 58.114258] [<c0238cd8>] (__alloc_pages_nodemask) from [<c021e478>] (__pte_alloc+0x24/0x178) [ 58.122679] [<c021e478>] (__pte_alloc) from [<c021fb40>] (copy_page_range+0x6e4/0xa18) [ 58.130580] [<c021fb40>] (copy_page_range) from [<c011f154>] (dup_mm+0x328/0x458) [ 58.138050] [<c011f154>] (dup_mm) from [<c011fee4>] (copy_process+0x980/0x16c4) [ 58.145344] [<c011fee4>] (copy_process) from [<c0120e9c>] (kernel_clone+0xa4/0x3e4) [ 58.152986] [<c0120e9c>] (kernel_clone) from [<c01214a0>] (sys_clone+0x74/0x90) [ 58.160281] [<c01214a0>] (sys_clone) from [<c01000c0>] (ret_fast_syscall+0x0/0x58) [ 58.167835] Exception stack(0xc56fffa8 to 0xc56ffff0) [ 58.172873] ffa0: b491e078 00000001 01200011 00000000 00000000 00000000 [ 58.181032] ffc0: b491e078 00000001 b4face1c 00000078 bea4a000 b491e550 00000001 bea4a264 [ 58.189188] ffe0: b491e010 bea49e38 b4f018ec b4f017fc [ 58.194225] Disabling lock debugging due to kernel taint [ 58.199523] BUG: Bad page state in process polkitd pfn:ee9b2 [ 58.205253] page:8be0376d refcount:2 mapcount:129 mapping:473e54ab index:0x0 pfn:0xee9b2 [ 58.213328] aops:0xc0b0ea14 ino:1749 [ 58.216892] flags: 0x40000000() [ 58.220025] raw: 40000000 00000100 00000122 c43d81f8 00000000 00000000 00000080 00000002 [ 58.228096] page dumped because: nonzero _refcount [ 58.232872] Modules linked in: nouveau tegra_drm host1x drm_ttm_helper tegra_soctherm ttm iova zram zsmalloc xhci_tegra ci_hdrc_tegra phy_tegra_xusb ahci_tegra libahci_platform tegra124_e [ 58.256679] CPU: 2 PID: 689 Comm: polkitd Tainted: G B 5.12.0-rc2-tegra+ #198 [ 58.265097] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree) [ 58.271348] [<c010ec40>] (unwind_backtrace) from [<c010a1ec>] (show_stack+0x10/0x14) [ 58.279077] [<c010a1ec>] (show_stack) from [<c0a86b20>] (dump_stack+0xc0/0xd4) [ 58.286284] [<c0a86b20>] (dump_stack) from [<c02341ec>] (bad_page+0xdc/0x10c) [ 58.293405] [<c02341ec>] (bad_page) from [<c02383d4>] (get_page_from_freelist+0xde8/0x116c) [ 58.301739] [<c02383d4>] (get_page_from_freelist) from [<c0238cd8>] (__alloc_pages_nodemask+0x17c/0x1014) [ 58.311288] [<c0238cd8>] (__alloc_pages_nodemask) from [<c021e478>] (__pte_alloc+0x24/0x178) [ 58.319709] [<c021e478>] (__pte_alloc) from [<c021fb40>] (copy_page_range+0x6e4/0xa18) [ 58.327609] [<c021fb40>] (copy_page_range) from [<c011f154>] (dup_mm+0x328/0x458) [ 58.335077] [<c011f154>] (dup_mm) from [<c011fee4>] (copy_process+0x980/0x16c4) [ 58.342371] [<c011fee4>] (copy_process) from [<c0120e9c>] (kernel_clone+0xa4/0x3e4) [ 58.350013] [<c0120e9c>] (kernel_clone) from [<c01214a0>] (sys_clone+0x74/0x90) [ 58.357308] [<c01214a0>] (sys_clone) from [<c01000c0>] (ret_fast_syscall+0x0/0x58) [ 58.364861] Exception stack(0xc56fffa8 to 0xc56ffff0) [ 58.369900] ffa0: b491e078 00000001 01200011 00000000 00000000 00000000 [ 58.378057] ffc0: b491e078 00000001 b4face1c 00000078 bea4a000 b491e550 00000001 bea4a264 [ 58.386214] ffe0: b491e010 bea49e38 b4f018ec b4f017fc [ 58.391250] BUG: Bad page state in process polkitd pfn:ee9b3 [ 58.396981] page:32413595 refcount:2 mapcount:129 mapping:473e54ab index:0x0 pfn:0xee9b3 [ 58.405054] aops:0xc0b0ea14 ino:1749
Created attachment 1762323 [details] dmesg with fedora kernel.
As this bug is concerned: 5.10.16-200.fc33.armv7hl is known good (doesn't exhibit the page fault). 5.11.0-rc6-next-20210201-tegra+ is known bad (already exhibit the issue).
5.11.0-rc4-next-20210119-tegra+ is known bad.
461619f5c3242aaee9ec3f0b7072719bd86ea207 is the first bad commit drm/nouveau: switch to new allocator (Will try to revert on top of 5.11.5) git bisect start # bad: [5c8fe583cce542aa0b84adc939ce85293de36e5e] Linux 5.11-rc1 git bisect bad 5c8fe583cce542aa0b84adc939ce85293de36e5e # good: [2c85ebc57b3e1817b6ce1a6b703928e113a90442] Linux 5.10 git bisect good 2c85ebc57b3e1817b6ce1a6b703928e113a90442 # bad: [2911ed9f47b47cb5ab87d03314b3b9fe008e607f] Merge tag 'char-misc-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc git bisect bad 2911ed9f47b47cb5ab87d03314b3b9fe008e607f # bad: [ac73e3dc8acd0a3be292755db30388c3580f5674] Merge branch 'akpm' (patches from Andrew) git bisect bad ac73e3dc8acd0a3be292755db30388c3580f5674 # bad: [b10733527bfd864605c33ab2e9a886eec317ec39] Merge tag 'amd-drm-next-5.11-2020-12-09' of git://people.freedesktop.org/~agd5f/linux into drm-next git bisect bad b10733527bfd864605c33ab2e9a886eec317ec39 # bad: [9713158cb2a918c3f6f5522eed23cdeb61f22e75] drm/amdgpu: Add and use seperate reg headers for dcn302 git bisect bad 9713158cb2a918c3f6f5522eed23cdeb61f22e75 # bad: [c0f98d2f8b076bf3e3183aa547395f919c943a14] Merge tag 'drm-misc-next-2020-11-05' of git://anongit.freedesktop.org/drm/drm-misc into drm-next git bisect bad c0f98d2f8b076bf3e3183aa547395f919c943a14 # good: [6a6e5988a2657cd0c91f6f1a3e7d194599248b6d] drm/ttm: replace last move_notify with delete_mem_notify git bisect good 6a6e5988a2657cd0c91f6f1a3e7d194599248b6d # good: [f566fdcd6cc49a9d5b5d782f56e3e7cb243f01b8] drm/i915: Force VT'd workarounds when running as a guest OS git bisect good f566fdcd6cc49a9d5b5d782f56e3e7cb243f01b8 # good: [e76ab2cf21c38331155ea613cdf18582f011c30f] drm/i915: Remove per-platform IIR HPD masking git bisect good e76ab2cf21c38331155ea613cdf18582f011c30f # bad: [268af50f38b1f2199a2e85e38073d7a25c20190c] drm/panfrost: Support cache-coherent integrations git bisect bad 268af50f38b1f2199a2e85e38073d7a25c20190c # good: [e000650375b65ff77c5ee852b5086f58c741179e] fbdev/atafb: Remove unused extern variables git bisect good e000650375b65ff77c5ee852b5086f58c741179e # bad: [461619f5c3242aaee9ec3f0b7072719bd86ea207] drm/nouveau: switch to new allocator git bisect bad 461619f5c3242aaee9ec3f0b7072719bd86ea207 # good: [d099fc8f540add80f725014fdd4f7f49f3c58911] drm/ttm: new TT backend allocation pool v3 git bisect good d099fc8f540add80f725014fdd4f7f49f3c58911 # good: [e93b2da9799e5cb97760969f3e1f02a5bdac29fe] drm/amdgpu: switch to new allocator v2 git bisect good e93b2da9799e5cb97760969f3e1f02a5bdac29fe # good: [0fe3cf3a53b5c1205ec7d321be1185b075dff205] drm/radeon: switch to new allocator v2 git bisect good 0fe3cf3a53b5c1205ec7d321be1185b075dff205 # first bad commit: [461619f5c3242aaee9ec3f0b7072719bd86ea207] drm/nouveau: switch to new allocator
with 5.14-rc5 as a base + tegra-next + tegra-drm-next + tegra-drm-fixes (scheduled for next) + PM patches (scheduled for 5.16, but optionals). And using libdrm scheduled for the new tegra uABI... I have no issue anymore to have a graphical display using Wayland on workstation Spin (jetson-tk1).
Actually, it doesn't seem that reliable on a second boot... So might need to wait for 5.16 to see more improvements (specially about iommu/memory/dGPU support...).