Bug 1965784

Summary: [PineBook Pro] panfrost ff9a0000.gpu: Unhandled Page fault in AS0 at VA 0x0000000015600000
Product: [Fedora] Fedora Reporter: Dominik 'Rathann' Mierzejewski <dominik>
Component: mesaAssignee: Adam Jackson <ajax>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 34CC: acaringi, adscvr, airlied, ajax, alciregi, bskeggs, caillon+fedoraproject, hdegoede, igor.raits, jarodwilson, jeremy, jglisse, jonathan, josef, kernel-maint, lgoncalv, linville, lyude, masami256, mchehab, pbrobinson, ptalbert, rclark, rhughes, rstrode, steved, tstellar
Target Milestone: ---   
Target Release: ---   
Hardware: aarch64   
OS: Unspecified   
Whiteboard:
Fixed In Version: mesa-21.1.3-1.fc34 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-06-21 01:03:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 245418    
Attachments:
Description Flags
dmesg from kernel 5.12.7 showing panfrost GPU faults none

Description Dominik 'Rathann' Mierzejewski 2021-05-29 21:58:10 UTC
Created attachment 1788039 [details]
dmesg from kernel 5.12.7 showing panfrost GPU faults

1. Please describe the problem:
On a Pinebook Pro machine, the following messages are repeated in the kernel log:
May 29 22:49:51 kernel: panfrost ff9a0000.gpu: Unhandled Page fault in AS0 at VA 0x0000000015600000
                        Reason: TODO
                        raw fault status: 0x660003C2
                        decoded fault status: SLAVE FAULT
                        exception type 0xC2: TRANSLATION_FAULT_LEVEL2
                        access type 0x3: WRITE
                        source id 0x6600
...
May 29 22:49:51 kernel: panfrost ff9a0000.gpu: Unhandled Page fault in AS0 at VA 0x0000000008000000
                        Reason: TODO
                        raw fault status: 0x660003C3
                        decoded fault status: SLAVE FAULT
                        exception type 0xC3: TRANSLATION_FAULT_LEVEL3
                        access type 0x3: WRITE
                        source id 0x6600
May 29 22:49:52 kernel: panfrost ff9a0000.gpu: gpu sched timeout, js=1, config=0x3300, status=0x8, head=0x53e7000, tail>
May 29 22:49:52 kernel: panfrost ff9a0000.gpu: Unhandled Page fault in AS0 at VA 0x00000000A315FD80
                        Reason: TODO
                        raw fault status: 0xF002C1
                        decoded fault status: SLAVE FAULT
                        exception type 0xC1: TRANSLATION_FAULT_LEVEL1
                        access type 0x2: READ
                        source id 0xF0
May 29 22:49:52 kernel: panfrost ff9a0000.gpu: gpu sched timeout, js=1, config=0x3301, status=0x8, head=0x59d0500, tail>
May 29 22:49:52 kernel: panfrost ff9a0000.gpu: gpu sched timeout, js=0, config=0x3300, status=0x8, head=0x51e3680, tail>
May 29 22:49:52 kernel: panfrost ff9a0000.gpu: Unhandled Page fault in AS0 at VA 0x00000000A315FC00
                        Reason: TODO
                        raw fault status: 0xF002C1
                        decoded fault status: SLAVE FAULT
                        exception type 0xC1: TRANSLATION_FAULT_LEVEL1
                        access type 0x2: READ
                        source id 0xF0
May 29 22:49:53 kernel: panfrost ff9a0000.gpu: gpu sched timeout, js=0, config=0x3300, status=0x8, head=0x77af0c0, tail>
May 29 22:49:53 kernel: panfrost ff9a0000.gpu: Unhandled Page fault in AS1 at VA 0x0000000015E00000
                        Reason: TODO
                        raw fault status: 0x660003C2
                        decoded fault status: SLAVE FAULT
                        exception type 0xC2: TRANSLATION_FAULT_LEVEL2
                        access type 0x3: WRITE
                        source id 0x6600
...
May 29 22:49:53 kernel: panfrost ff9a0000.gpu: gpu sched timeout, js=1, config=0x3301, status=0x8, head=0x77cd000, tail>
May 29 22:49:53 kernel: panfrost ff9a0000.gpu: Unhandled Page fault in AS0 at VA 0x00000000A315FC00
                        Reason: TODO
                        raw fault status: 0xF002C1
                        decoded fault status: SLAVE FAULT
                        exception type 0xC1: TRANSLATION_FAULT_LEVEL1
                        access type 0x2: READ
                        source id 0xF0
May 29 22:49:54 kernel: panfrost ff9a0000.gpu: gpu sched timeout, js=1, config=0x3301, status=0x8, head=0x5993000, tail>
May 29 22:49:54 kernel: panfrost ff9a0000.gpu: Unhandled Page fault in AS0 at VA 0x00000000A315FC00
                        Reason: TODO
                        raw fault status: 0xF002C1
                        decoded fault status: SLAVE FAULT
                        exception type 0xC1: TRANSLATION_FAULT_LEVEL1
                        access type 0x2: READ
                        source id 0xF0
May 29 22:49:54 kernel: panfrost ff9a0000.gpu: gpu sched timeout, js=0, config=0x3300, status=0x8, head=0x37d10c0, tail>
May 29 22:49:54 kernel: panfrost ff9a0000.gpu: Unhandled Page fault in AS1 at VA 0x00000000A315FC00
                        Reason: TODO
                        raw fault status: 0xF002C1
                        decoded fault status: SLAVE FAULT
                        exception type 0xC1: TRANSLATION_FAULT_LEVEL1
                        access type 0x2: READ
                        source id 0xF0

and similar.

2. What is the Version-Release number of the kernel:
5.12.7-300.fc34.aarch64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :
I think this started with Mesa upgrade to 21.1.1.

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:
1. Enable OpenGL compositing in Firefox, i.e. the following preferences:
gfx.canvas.azure.accelerated	true
gfx.xrender.enabled	true
layers.accelerate-all	true
layers.acceleration.force-enabled	true
webgl.out-of-process	true
2. Go to https://njumobile.pl/

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:
I haven't tried.

6. Are you running any modules that not shipped with directly Fedora's kernel?:
No.

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Comment 1 Peter Robinson 2021-05-30 11:19:25 UTC
Please use abrt or similar to get a full log with debug symbols

Comment 2 Dominik 'Rathann' Mierzejewski 2021-05-31 10:38:47 UTC
I'm pretty sure mesa-21.1.1 is the cause/trigger as downgrading to mesa-21.0.2 makes the issue stop.

abrt is not catching anything, any hints on how to get a full log? Should I run a rawhide kernel?

I got a kernel WARNING this time, too:
May 31 12:30:53 kernel: ------------[ cut here ]------------
May 31 12:30:53 kernel: Memory manager not clean during takedown.
May 31 12:30:53 kernel: WARNING: CPU: 4 PID: 7032 at drivers/gpu/drm/drm_mm.c:998 drm_mm_takedown+0x34/0x44 [drm]
May 31 12:30:53 kernel: Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace nfs_ssc fscache rfcomm snd_seq_dummy snd_hrtimer nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_f>
May 31 12:30:53 kernel:  videobuf2_memops industrialio_triggered_buffer snd_pcm brcmutil videobuf2_v4l2 kfifo_buf cfg80211 videobuf2_common nvmem_rockchip_efuse videodev mc snd_timer snd rfkill indus>
May 31 12:30:53 kernel: CPU: 4 PID: 7032 Comm: JS Helper Tainted: G        WC        5.12.7-300.fc34.aarch64 #1
May 31 12:30:53 kernel: Hardware name:  /, BIOS 2021.04 04/28/2021
May 31 12:30:53 kernel: pstate: 40400005 (nZcv daif +PAN -UAO -TCO BTYPE=--)
May 31 12:30:53 kernel: pc : drm_mm_takedown+0x34/0x44 [drm]
May 31 12:30:53 kernel: lr : drm_mm_takedown+0x34/0x44 [drm]
May 31 12:30:53 kernel: sp : ffff8000132cbb80
May 31 12:30:53 kernel: x29: ffff8000132cbb80 x28: ffff30dcbdf70000 
May 31 12:30:53 kernel: x27: ffff30dc85b76ca8 x26: ffffa2abdc402000 
May 31 12:30:53 kernel: x25: ffff30dc85b76cf0 x24: ffff30dc81d8c800 
May 31 12:30:53 kernel: x23: 0000000000000000 x22: ffff30dc81d8c800 
May 31 12:30:53 kernel: x21: ffff30dc85b76c00 x20: ffff30dc85b76cc8 
May 31 12:30:53 kernel: x19: ffff30dcf0946c00 x18: 0000000000000000 
May 31 12:30:53 kernel: x17: 0000000000000000 x16: ffffa2abdb7a0a54 
May 31 12:30:53 kernel: x15: 0000000000000040 x14: 0000000000000000 
May 31 12:30:53 kernel: x13: 0000000000000040 x12: ffff30dc91b756d0 
May 31 12:30:53 kernel: x11: ffffa2abdc9fcc98 x10: 00000000ffffe000 
May 31 12:30:53 kernel: x9 : ffffa2abda9e0d80 x8 : 00000000ffffdfff 
May 31 12:30:53 kernel: x7 : ffffa2abdc9fcc98 x6 : 0000000000000001 
May 31 12:30:53 kernel: x5 : ffff30dd76ef3148 x4 : 0000000000000000 
May 31 12:30:53 kernel: x3 : 0000000000000027 x2 : 0000000000000023 
May 31 12:30:53 kernel: x1 : ffff30dd76ef3150 x0 : 0000000000000029 
May 31 12:30:53 kernel: Call trace:
May 31 12:30:53 kernel:  drm_mm_takedown+0x34/0x44 [drm]
May 31 12:30:53 kernel:  panfrost_postclose+0x40/0x5c [panfrost]
May 31 12:30:53 kernel:  drm_file_free.part.0+0x1ac/0x250 [drm]
May 31 12:30:53 kernel:  drm_close_helper.isra.0+0x74/0x84 [drm]
May 31 12:30:53 kernel:  drm_release+0x78/0x154 [drm]
May 31 12:30:53 kernel:  __fput+0x88/0x244
May 31 12:30:53 kernel:  ____fput+0x1c/0x30
May 31 12:30:53 kernel:  task_work_run+0xcc/0x22c
May 31 12:30:53 kernel:  do_exit+0x1cc/0x460
May 31 12:30:53 kernel:  do_group_exit+0x44/0xac
May 31 12:30:53 kernel:  get_signal+0x1e4/0x940
May 31 12:30:53 kernel:  do_signal+0x84/0x270
May 31 12:30:53 kernel:  do_notify_resume+0xe0/0x390
May 31 12:30:53 kernel:  work_pending+0xc/0x498
May 31 12:30:53 kernel: ---[ end trace b5135138d9d2c413 ]---
May 31 12:31:28 kernel: panfrost_gem_shrinker_scan: 60 callbacks suppressed

Comment 3 Dominik 'Rathann' Mierzejewski 2021-05-31 12:09:03 UTC
It seems to be fixed in mesa main branch. I've just built 21.2.0-devel (git 234e1b7) and I'm unable to reproduce this anymore. Reassigning to mesa, then.

Comment 4 Dominik 'Rathann' Mierzejewski 2021-05-31 13:07:37 UTC
Upstream commit https://gitlab.freedesktop.org/mesa/mesa/-/commit/a89bc59980b3ea7b2f03d2994bae7dda689f637f looks relevant here. I'm going to try building 21.1.1 with that patch applied and report back.

Comment 5 Dominik 'Rathann' Mierzejewski 2021-05-31 13:52:05 UTC
Sadly, the patch depends on other patches not present in 21.1.1.

Comment 6 Dominik 'Rathann' Mierzejewski 2021-05-31 21:20:25 UTC
Upstream fix: https://gitlab.freedesktop.org/mesa/mesa/-/commit/fe9d37b0c6e89f11a5f25022a851da81d19dab73.patch . Scratch build with that patch fixes the issue: https://koji.fedoraproject.org/koji/taskinfo?taskID=69050219 . Upstream says fix will be included in the next 21.1.x release.

Comment 7 Fedora Update System 2021-06-18 20:39:27 UTC
FEDORA-2021-0ec322843a has been submitted as an update to Fedora 34. https://bodhi.fedoraproject.org/updates/FEDORA-2021-0ec322843a

Comment 8 Fedora Update System 2021-06-19 01:11:18 UTC
FEDORA-2021-0ec322843a has been pushed to the Fedora 34 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-0ec322843a`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-0ec322843a

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 9 Fedora Update System 2021-06-21 01:03:41 UTC
FEDORA-2021-0ec322843a has been pushed to the Fedora 34 stable repository.
If problem still persists, please make note of it in this bug report.