Bug 1965784 - [PineBook Pro] panfrost ff9a0000.gpu: Unhandled Page fault in AS0 at VA 0x0000000015600000
Summary: [PineBook Pro] panfrost ff9a0000.gpu: Unhandled Page fault in AS0 at VA 0x000...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: mesa
Version: 34
Hardware: aarch64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Adam Jackson
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: ARMTracker
TreeView+ depends on / blocked
 
Reported: 2021-05-29 21:58 UTC by Dominik 'Rathann' Mierzejewski
Modified: 2021-06-21 01:03 UTC (History)
27 users (show)

Fixed In Version: mesa-21.1.3-1.fc34
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-06-21 01:03:41 UTC
Type: Bug


Attachments (Terms of Use)
dmesg from kernel 5.12.7 showing panfrost GPU faults (965.40 KB, text/plain)
2021-05-29 21:58 UTC, Dominik 'Rathann' Mierzejewski
no flags Details


Links
System ID Private Priority Status Summary Last Updated
freedesktop.org Gitlab mesa mesa issues 4737 0 None closed panfrost ff9a0000.gpu: Unhandled Page fault in AS1 at VA 0x0000000009801200 2021-05-31 21:20:20 UTC

Description Dominik 'Rathann' Mierzejewski 2021-05-29 21:58:10 UTC
Created attachment 1788039 [details]
dmesg from kernel 5.12.7 showing panfrost GPU faults

1. Please describe the problem:
On a Pinebook Pro machine, the following messages are repeated in the kernel log:
May 29 22:49:51 kernel: panfrost ff9a0000.gpu: Unhandled Page fault in AS0 at VA 0x0000000015600000
                        Reason: TODO
                        raw fault status: 0x660003C2
                        decoded fault status: SLAVE FAULT
                        exception type 0xC2: TRANSLATION_FAULT_LEVEL2
                        access type 0x3: WRITE
                        source id 0x6600
...
May 29 22:49:51 kernel: panfrost ff9a0000.gpu: Unhandled Page fault in AS0 at VA 0x0000000008000000
                        Reason: TODO
                        raw fault status: 0x660003C3
                        decoded fault status: SLAVE FAULT
                        exception type 0xC3: TRANSLATION_FAULT_LEVEL3
                        access type 0x3: WRITE
                        source id 0x6600
May 29 22:49:52 kernel: panfrost ff9a0000.gpu: gpu sched timeout, js=1, config=0x3300, status=0x8, head=0x53e7000, tail>
May 29 22:49:52 kernel: panfrost ff9a0000.gpu: Unhandled Page fault in AS0 at VA 0x00000000A315FD80
                        Reason: TODO
                        raw fault status: 0xF002C1
                        decoded fault status: SLAVE FAULT
                        exception type 0xC1: TRANSLATION_FAULT_LEVEL1
                        access type 0x2: READ
                        source id 0xF0
May 29 22:49:52 kernel: panfrost ff9a0000.gpu: gpu sched timeout, js=1, config=0x3301, status=0x8, head=0x59d0500, tail>
May 29 22:49:52 kernel: panfrost ff9a0000.gpu: gpu sched timeout, js=0, config=0x3300, status=0x8, head=0x51e3680, tail>
May 29 22:49:52 kernel: panfrost ff9a0000.gpu: Unhandled Page fault in AS0 at VA 0x00000000A315FC00
                        Reason: TODO
                        raw fault status: 0xF002C1
                        decoded fault status: SLAVE FAULT
                        exception type 0xC1: TRANSLATION_FAULT_LEVEL1
                        access type 0x2: READ
                        source id 0xF0
May 29 22:49:53 kernel: panfrost ff9a0000.gpu: gpu sched timeout, js=0, config=0x3300, status=0x8, head=0x77af0c0, tail>
May 29 22:49:53 kernel: panfrost ff9a0000.gpu: Unhandled Page fault in AS1 at VA 0x0000000015E00000
                        Reason: TODO
                        raw fault status: 0x660003C2
                        decoded fault status: SLAVE FAULT
                        exception type 0xC2: TRANSLATION_FAULT_LEVEL2
                        access type 0x3: WRITE
                        source id 0x6600
...
May 29 22:49:53 kernel: panfrost ff9a0000.gpu: gpu sched timeout, js=1, config=0x3301, status=0x8, head=0x77cd000, tail>
May 29 22:49:53 kernel: panfrost ff9a0000.gpu: Unhandled Page fault in AS0 at VA 0x00000000A315FC00
                        Reason: TODO
                        raw fault status: 0xF002C1
                        decoded fault status: SLAVE FAULT
                        exception type 0xC1: TRANSLATION_FAULT_LEVEL1
                        access type 0x2: READ
                        source id 0xF0
May 29 22:49:54 kernel: panfrost ff9a0000.gpu: gpu sched timeout, js=1, config=0x3301, status=0x8, head=0x5993000, tail>
May 29 22:49:54 kernel: panfrost ff9a0000.gpu: Unhandled Page fault in AS0 at VA 0x00000000A315FC00
                        Reason: TODO
                        raw fault status: 0xF002C1
                        decoded fault status: SLAVE FAULT
                        exception type 0xC1: TRANSLATION_FAULT_LEVEL1
                        access type 0x2: READ
                        source id 0xF0
May 29 22:49:54 kernel: panfrost ff9a0000.gpu: gpu sched timeout, js=0, config=0x3300, status=0x8, head=0x37d10c0, tail>
May 29 22:49:54 kernel: panfrost ff9a0000.gpu: Unhandled Page fault in AS1 at VA 0x00000000A315FC00
                        Reason: TODO
                        raw fault status: 0xF002C1
                        decoded fault status: SLAVE FAULT
                        exception type 0xC1: TRANSLATION_FAULT_LEVEL1
                        access type 0x2: READ
                        source id 0xF0

and similar.

2. What is the Version-Release number of the kernel:
5.12.7-300.fc34.aarch64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :
I think this started with Mesa upgrade to 21.1.1.

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:
1. Enable OpenGL compositing in Firefox, i.e. the following preferences:
gfx.canvas.azure.accelerated	true
gfx.xrender.enabled	true
layers.accelerate-all	true
layers.acceleration.force-enabled	true
webgl.out-of-process	true
2. Go to https://njumobile.pl/

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:
I haven't tried.

6. Are you running any modules that not shipped with directly Fedora's kernel?:
No.

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Comment 1 Peter Robinson 2021-05-30 11:19:25 UTC
Please use abrt or similar to get a full log with debug symbols

Comment 2 Dominik 'Rathann' Mierzejewski 2021-05-31 10:38:47 UTC
I'm pretty sure mesa-21.1.1 is the cause/trigger as downgrading to mesa-21.0.2 makes the issue stop.

abrt is not catching anything, any hints on how to get a full log? Should I run a rawhide kernel?

I got a kernel WARNING this time, too:
May 31 12:30:53 kernel: ------------[ cut here ]------------
May 31 12:30:53 kernel: Memory manager not clean during takedown.
May 31 12:30:53 kernel: WARNING: CPU: 4 PID: 7032 at drivers/gpu/drm/drm_mm.c:998 drm_mm_takedown+0x34/0x44 [drm]
May 31 12:30:53 kernel: Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace nfs_ssc fscache rfcomm snd_seq_dummy snd_hrtimer nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_f>
May 31 12:30:53 kernel:  videobuf2_memops industrialio_triggered_buffer snd_pcm brcmutil videobuf2_v4l2 kfifo_buf cfg80211 videobuf2_common nvmem_rockchip_efuse videodev mc snd_timer snd rfkill indus>
May 31 12:30:53 kernel: CPU: 4 PID: 7032 Comm: JS Helper Tainted: G        WC        5.12.7-300.fc34.aarch64 #1
May 31 12:30:53 kernel: Hardware name:  /, BIOS 2021.04 04/28/2021
May 31 12:30:53 kernel: pstate: 40400005 (nZcv daif +PAN -UAO -TCO BTYPE=--)
May 31 12:30:53 kernel: pc : drm_mm_takedown+0x34/0x44 [drm]
May 31 12:30:53 kernel: lr : drm_mm_takedown+0x34/0x44 [drm]
May 31 12:30:53 kernel: sp : ffff8000132cbb80
May 31 12:30:53 kernel: x29: ffff8000132cbb80 x28: ffff30dcbdf70000 
May 31 12:30:53 kernel: x27: ffff30dc85b76ca8 x26: ffffa2abdc402000 
May 31 12:30:53 kernel: x25: ffff30dc85b76cf0 x24: ffff30dc81d8c800 
May 31 12:30:53 kernel: x23: 0000000000000000 x22: ffff30dc81d8c800 
May 31 12:30:53 kernel: x21: ffff30dc85b76c00 x20: ffff30dc85b76cc8 
May 31 12:30:53 kernel: x19: ffff30dcf0946c00 x18: 0000000000000000 
May 31 12:30:53 kernel: x17: 0000000000000000 x16: ffffa2abdb7a0a54 
May 31 12:30:53 kernel: x15: 0000000000000040 x14: 0000000000000000 
May 31 12:30:53 kernel: x13: 0000000000000040 x12: ffff30dc91b756d0 
May 31 12:30:53 kernel: x11: ffffa2abdc9fcc98 x10: 00000000ffffe000 
May 31 12:30:53 kernel: x9 : ffffa2abda9e0d80 x8 : 00000000ffffdfff 
May 31 12:30:53 kernel: x7 : ffffa2abdc9fcc98 x6 : 0000000000000001 
May 31 12:30:53 kernel: x5 : ffff30dd76ef3148 x4 : 0000000000000000 
May 31 12:30:53 kernel: x3 : 0000000000000027 x2 : 0000000000000023 
May 31 12:30:53 kernel: x1 : ffff30dd76ef3150 x0 : 0000000000000029 
May 31 12:30:53 kernel: Call trace:
May 31 12:30:53 kernel:  drm_mm_takedown+0x34/0x44 [drm]
May 31 12:30:53 kernel:  panfrost_postclose+0x40/0x5c [panfrost]
May 31 12:30:53 kernel:  drm_file_free.part.0+0x1ac/0x250 [drm]
May 31 12:30:53 kernel:  drm_close_helper.isra.0+0x74/0x84 [drm]
May 31 12:30:53 kernel:  drm_release+0x78/0x154 [drm]
May 31 12:30:53 kernel:  __fput+0x88/0x244
May 31 12:30:53 kernel:  ____fput+0x1c/0x30
May 31 12:30:53 kernel:  task_work_run+0xcc/0x22c
May 31 12:30:53 kernel:  do_exit+0x1cc/0x460
May 31 12:30:53 kernel:  do_group_exit+0x44/0xac
May 31 12:30:53 kernel:  get_signal+0x1e4/0x940
May 31 12:30:53 kernel:  do_signal+0x84/0x270
May 31 12:30:53 kernel:  do_notify_resume+0xe0/0x390
May 31 12:30:53 kernel:  work_pending+0xc/0x498
May 31 12:30:53 kernel: ---[ end trace b5135138d9d2c413 ]---
May 31 12:31:28 kernel: panfrost_gem_shrinker_scan: 60 callbacks suppressed

Comment 3 Dominik 'Rathann' Mierzejewski 2021-05-31 12:09:03 UTC
It seems to be fixed in mesa main branch. I've just built 21.2.0-devel (git 234e1b7) and I'm unable to reproduce this anymore. Reassigning to mesa, then.

Comment 4 Dominik 'Rathann' Mierzejewski 2021-05-31 13:07:37 UTC
Upstream commit https://gitlab.freedesktop.org/mesa/mesa/-/commit/a89bc59980b3ea7b2f03d2994bae7dda689f637f looks relevant here. I'm going to try building 21.1.1 with that patch applied and report back.

Comment 5 Dominik 'Rathann' Mierzejewski 2021-05-31 13:52:05 UTC
Sadly, the patch depends on other patches not present in 21.1.1.

Comment 6 Dominik 'Rathann' Mierzejewski 2021-05-31 21:20:25 UTC
Upstream fix: https://gitlab.freedesktop.org/mesa/mesa/-/commit/fe9d37b0c6e89f11a5f25022a851da81d19dab73.patch . Scratch build with that patch fixes the issue: https://koji.fedoraproject.org/koji/taskinfo?taskID=69050219 . Upstream says fix will be included in the next 21.1.x release.

Comment 7 Fedora Update System 2021-06-18 20:39:27 UTC
FEDORA-2021-0ec322843a has been submitted as an update to Fedora 34. https://bodhi.fedoraproject.org/updates/FEDORA-2021-0ec322843a

Comment 8 Fedora Update System 2021-06-19 01:11:18 UTC
FEDORA-2021-0ec322843a has been pushed to the Fedora 34 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-0ec322843a`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-0ec322843a

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 9 Fedora Update System 2021-06-21 01:03:41 UTC
FEDORA-2021-0ec322843a has been pushed to the Fedora 34 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.