Bug 1937129
| Summary: | page fault with nouveau on jetson-tk1 | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Nicolas Chauvet (kwizart) <kwizart> | ||||
| Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||
| Status: | CLOSED UPSTREAM | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 34 | CC: | acaringi, adscvr, airlied, alciregi, bskeggs, hdegoede, jarodwilson, jeremy, jglisse, jonathan, josef, kernel-maint, lgoncalv, linville, masami256, mchehab, pbrobinson, ptalbert, steved | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2021-08-17 14:10:02 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 245418 | ||||||
| Attachments: |
|
||||||
|
Description
Nicolas Chauvet (kwizart)
2021-03-09 22:17:45 UTC
FYI, I'm not reproducing using linux-next 20210302. Will try with 5.12-rc1... 5.12-rc1 also (still) have the page fault bug. But the triggered fault is a different one (related to polkit), and there I can have a graphical display... (but too unstable to verify gpu acceleration). [ 58.003759] BUG: Bad page state in process polkitd pfn:ee9b1 [ 58.009509] page:8a64ce78 refcount:2 mapcount:129 mapping:473e54ab index:0x0 pfn:0xee9b1 [ 58.017597] aops:0xc0b0ea14 ino:1749 [ 58.021177] flags: 0x40000000() [ 58.024339] raw: 40000000 00000100 00000122 c43d81f8 00000000 00000000 00000080 00000002 [ 58.032422] page dumped because: nonzero _refcount [ 58.037204] Modules linked in: nouveau tegra_drm host1x drm_ttm_helper tegra_soctherm ttm iova zram zsmalloc xhci_tegra ci_hdrc_tegra phy_tegra_xusb ahci_tegra libahci_platform tegra124_e [ 58.061017] CPU: 2 PID: 689 Comm: polkitd Not tainted 5.12.0-rc2-tegra+ #198 [ 58.068051] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree) [ 58.074305] [<c010ec40>] (unwind_backtrace) from [<c010a1ec>] (show_stack+0x10/0x14) [ 58.082039] [<c010a1ec>] (show_stack) from [<c0a86b20>] (dump_stack+0xc0/0xd4) [ 58.089250] [<c0a86b20>] (dump_stack) from [<c02341ec>] (bad_page+0xdc/0x10c) [ 58.096373] [<c02341ec>] (bad_page) from [<c02383d4>] (get_page_from_freelist+0xde8/0x116c) [ 58.104709] [<c02383d4>] (get_page_from_freelist) from [<c0238cd8>] (__alloc_pages_nodemask+0x17c/0x1014) [ 58.114258] [<c0238cd8>] (__alloc_pages_nodemask) from [<c021e478>] (__pte_alloc+0x24/0x178) [ 58.122679] [<c021e478>] (__pte_alloc) from [<c021fb40>] (copy_page_range+0x6e4/0xa18) [ 58.130580] [<c021fb40>] (copy_page_range) from [<c011f154>] (dup_mm+0x328/0x458) [ 58.138050] [<c011f154>] (dup_mm) from [<c011fee4>] (copy_process+0x980/0x16c4) [ 58.145344] [<c011fee4>] (copy_process) from [<c0120e9c>] (kernel_clone+0xa4/0x3e4) [ 58.152986] [<c0120e9c>] (kernel_clone) from [<c01214a0>] (sys_clone+0x74/0x90) [ 58.160281] [<c01214a0>] (sys_clone) from [<c01000c0>] (ret_fast_syscall+0x0/0x58) [ 58.167835] Exception stack(0xc56fffa8 to 0xc56ffff0) [ 58.172873] ffa0: b491e078 00000001 01200011 00000000 00000000 00000000 [ 58.181032] ffc0: b491e078 00000001 b4face1c 00000078 bea4a000 b491e550 00000001 bea4a264 [ 58.189188] ffe0: b491e010 bea49e38 b4f018ec b4f017fc [ 58.194225] Disabling lock debugging due to kernel taint [ 58.199523] BUG: Bad page state in process polkitd pfn:ee9b2 [ 58.205253] page:8be0376d refcount:2 mapcount:129 mapping:473e54ab index:0x0 pfn:0xee9b2 [ 58.213328] aops:0xc0b0ea14 ino:1749 [ 58.216892] flags: 0x40000000() [ 58.220025] raw: 40000000 00000100 00000122 c43d81f8 00000000 00000000 00000080 00000002 [ 58.228096] page dumped because: nonzero _refcount [ 58.232872] Modules linked in: nouveau tegra_drm host1x drm_ttm_helper tegra_soctherm ttm iova zram zsmalloc xhci_tegra ci_hdrc_tegra phy_tegra_xusb ahci_tegra libahci_platform tegra124_e [ 58.256679] CPU: 2 PID: 689 Comm: polkitd Tainted: G B 5.12.0-rc2-tegra+ #198 [ 58.265097] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree) [ 58.271348] [<c010ec40>] (unwind_backtrace) from [<c010a1ec>] (show_stack+0x10/0x14) [ 58.279077] [<c010a1ec>] (show_stack) from [<c0a86b20>] (dump_stack+0xc0/0xd4) [ 58.286284] [<c0a86b20>] (dump_stack) from [<c02341ec>] (bad_page+0xdc/0x10c) [ 58.293405] [<c02341ec>] (bad_page) from [<c02383d4>] (get_page_from_freelist+0xde8/0x116c) [ 58.301739] [<c02383d4>] (get_page_from_freelist) from [<c0238cd8>] (__alloc_pages_nodemask+0x17c/0x1014) [ 58.311288] [<c0238cd8>] (__alloc_pages_nodemask) from [<c021e478>] (__pte_alloc+0x24/0x178) [ 58.319709] [<c021e478>] (__pte_alloc) from [<c021fb40>] (copy_page_range+0x6e4/0xa18) [ 58.327609] [<c021fb40>] (copy_page_range) from [<c011f154>] (dup_mm+0x328/0x458) [ 58.335077] [<c011f154>] (dup_mm) from [<c011fee4>] (copy_process+0x980/0x16c4) [ 58.342371] [<c011fee4>] (copy_process) from [<c0120e9c>] (kernel_clone+0xa4/0x3e4) [ 58.350013] [<c0120e9c>] (kernel_clone) from [<c01214a0>] (sys_clone+0x74/0x90) [ 58.357308] [<c01214a0>] (sys_clone) from [<c01000c0>] (ret_fast_syscall+0x0/0x58) [ 58.364861] Exception stack(0xc56fffa8 to 0xc56ffff0) [ 58.369900] ffa0: b491e078 00000001 01200011 00000000 00000000 00000000 [ 58.378057] ffc0: b491e078 00000001 b4face1c 00000078 bea4a000 b491e550 00000001 bea4a264 [ 58.386214] ffe0: b491e010 bea49e38 b4f018ec b4f017fc [ 58.391250] BUG: Bad page state in process polkitd pfn:ee9b3 [ 58.396981] page:32413595 refcount:2 mapcount:129 mapping:473e54ab index:0x0 pfn:0xee9b3 [ 58.405054] aops:0xc0b0ea14 ino:1749 Created attachment 1762323 [details]
dmesg with fedora kernel.
As this bug is concerned: 5.10.16-200.fc33.armv7hl is known good (doesn't exhibit the page fault). 5.11.0-rc6-next-20210201-tegra+ is known bad (already exhibit the issue). 5.11.0-rc4-next-20210119-tegra+ is known bad. 461619f5c3242aaee9ec3f0b7072719bd86ea207 is the first bad commit drm/nouveau: switch to new allocator (Will try to revert on top of 5.11.5) git bisect start # bad: [5c8fe583cce542aa0b84adc939ce85293de36e5e] Linux 5.11-rc1 git bisect bad 5c8fe583cce542aa0b84adc939ce85293de36e5e # good: [2c85ebc57b3e1817b6ce1a6b703928e113a90442] Linux 5.10 git bisect good 2c85ebc57b3e1817b6ce1a6b703928e113a90442 # bad: [2911ed9f47b47cb5ab87d03314b3b9fe008e607f] Merge tag 'char-misc-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc git bisect bad 2911ed9f47b47cb5ab87d03314b3b9fe008e607f # bad: [ac73e3dc8acd0a3be292755db30388c3580f5674] Merge branch 'akpm' (patches from Andrew) git bisect bad ac73e3dc8acd0a3be292755db30388c3580f5674 # bad: [b10733527bfd864605c33ab2e9a886eec317ec39] Merge tag 'amd-drm-next-5.11-2020-12-09' of git://people.freedesktop.org/~agd5f/linux into drm-next git bisect bad b10733527bfd864605c33ab2e9a886eec317ec39 # bad: [9713158cb2a918c3f6f5522eed23cdeb61f22e75] drm/amdgpu: Add and use seperate reg headers for dcn302 git bisect bad 9713158cb2a918c3f6f5522eed23cdeb61f22e75 # bad: [c0f98d2f8b076bf3e3183aa547395f919c943a14] Merge tag 'drm-misc-next-2020-11-05' of git://anongit.freedesktop.org/drm/drm-misc into drm-next git bisect bad c0f98d2f8b076bf3e3183aa547395f919c943a14 # good: [6a6e5988a2657cd0c91f6f1a3e7d194599248b6d] drm/ttm: replace last move_notify with delete_mem_notify git bisect good 6a6e5988a2657cd0c91f6f1a3e7d194599248b6d # good: [f566fdcd6cc49a9d5b5d782f56e3e7cb243f01b8] drm/i915: Force VT'd workarounds when running as a guest OS git bisect good f566fdcd6cc49a9d5b5d782f56e3e7cb243f01b8 # good: [e76ab2cf21c38331155ea613cdf18582f011c30f] drm/i915: Remove per-platform IIR HPD masking git bisect good e76ab2cf21c38331155ea613cdf18582f011c30f # bad: [268af50f38b1f2199a2e85e38073d7a25c20190c] drm/panfrost: Support cache-coherent integrations git bisect bad 268af50f38b1f2199a2e85e38073d7a25c20190c # good: [e000650375b65ff77c5ee852b5086f58c741179e] fbdev/atafb: Remove unused extern variables git bisect good e000650375b65ff77c5ee852b5086f58c741179e # bad: [461619f5c3242aaee9ec3f0b7072719bd86ea207] drm/nouveau: switch to new allocator git bisect bad 461619f5c3242aaee9ec3f0b7072719bd86ea207 # good: [d099fc8f540add80f725014fdd4f7f49f3c58911] drm/ttm: new TT backend allocation pool v3 git bisect good d099fc8f540add80f725014fdd4f7f49f3c58911 # good: [e93b2da9799e5cb97760969f3e1f02a5bdac29fe] drm/amdgpu: switch to new allocator v2 git bisect good e93b2da9799e5cb97760969f3e1f02a5bdac29fe # good: [0fe3cf3a53b5c1205ec7d321be1185b075dff205] drm/radeon: switch to new allocator v2 git bisect good 0fe3cf3a53b5c1205ec7d321be1185b075dff205 # first bad commit: [461619f5c3242aaee9ec3f0b7072719bd86ea207] drm/nouveau: switch to new allocator with 5.14-rc5 as a base + tegra-next + tegra-drm-next + tegra-drm-fixes (scheduled for next) + PM patches (scheduled for 5.16, but optionals). And using libdrm scheduled for the new tegra uABI... I have no issue anymore to have a graphical display using Wayland on workstation Spin (jetson-tk1). Actually, it doesn't seem that reliable on a second boot... So might need to wait for 5.16 to see more improvements (specially about iommu/memory/dGPU support...). |