Bug 1753327
Summary: | Firefox crash in LLVM-generated shader code | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Ben Crocker <bcrocker> | ||||||
Component: | mesa | Assignee: | Ben Crocker <bcrocker> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Desktop QE <desktop-qa-list> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 8.1 | CC: | bugproxy, csoriano, hannsj_uhl, mprchlik, sguelton, tpelka, tstellar | ||||||
Target Milestone: | rc | ||||||||
Target Release: | 8.2 | ||||||||
Hardware: | ppc64le | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2020-04-28 15:41:41 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1711971 | ||||||||
Attachments: |
|
Description
Ben Crocker
2019-09-18 15:24:34 UTC
Created attachment 1616284 [details]
Two crashes on Power9 in fragment shader code
On a Power9 system with LLVM 8, here are a couple of crashes
in fragment shader code.
------- Comment From nemanjai.com 2019-09-20 09:27 EDT------- It appears clear from the analysis in this PR and the related one that we are accessing an invalid memory location off the TOC. There have been issues in the past with the runtime linker making a mistake as to whether it needs to branch to the local or the global entry point on a call. So I think it would be valuable information to find out whether the global entry point - namely, the sequence: 0x00007fff68000000: addis r2,r12,28778 0x00007fff68000004: addi r2,r2,-32768 is actually executed in the function that the failure is in. If it does not, this is an indication that we have considered this function local and branched to the local entry point when we probably needed to branch to a global entry point (i.e. the call is across a DSO boundary). Would it be possible to confirm where the function was entered? First of all, sorry for any confusion; the file attached to comment 1 (Two crashes on Power9 in fragment shader code) actually contains attachment 1 [details] (Crash on Power8 in vertex shader code) at the beginning. In the vertex shader crash, the faulting instruction is too far along in the code for register reconstruction to be possible (i.e., r12 has been overwritten three times), BUT in the fragment shader cases, the faulting instruction is very early, so I was able to reconstruct the register operations and confirm that the register state is consistent with the function having been entered via the global entry point: first fragment shader case: Dump of assembler code from 0x7fff7c5d0000 to 0x7fff7c5d0020: 0x00007fff7c5d0000: addis r2,r12,27421 ;; Prologue: set up TOC ptr in r2; 27421=0x6b1d, r12=0x7fff.7c5d.0000; r2 <- 0x7fff.e77a.0000 0x00007fff7c5d0004: addi r2,r2,-32768 ;; finish setting up r2; -32768=0x8000; exts()=0xffffffffffff8000; r2 <- 0x7fff.e779.8000 0x00007fff7c5d0008: stdu r1,-368(r1) 0x00007fff7c5d000c: ld r4,480(r1) 0x00007fff7c5d0010: ld r5,16(r4) 0x00007fff7c5d0014: addi r5,r5,1 0x00007fff7c5d0018: std r5,16(r4) 0x00007fff7c5d001c: ld r4,-32768(r2) ;; Faulting instruction; first use of r2 End of assembler dump. (gdb) info reg r0 0x0 0 r1 0x7fff25e8e240 140733829407296 r2 0x7fffe7798000 140737076887552 r3 0x1000857a900 1099651590400 r4 0x100086d71c8 1099653018056 r5 0x1 1 r6 0x1 1 r7 0x10008582510 1099651622160 r8 0x10008582520 1099651622176 r9 0x10008582530 1099651622192 r10 0x7fff25e8e468 140733829407848 r11 0x7fff7c5d0000 140735279857664 r12 0x7fff7c5d0000 140735279857664 r13 0x7fff25e968c0 140733829441728 r14 0x7fff8ef90000 140735592071168 r15 0x7fff25680000 140733820960768 r16 0x7fff8ef04410 140735591498768 r17 0x7fff8ef04420 140735591498784 r18 0x100086d6710 1099653015312 r19 0x7fff8ef00318 140735591482136 r20 0x10 16 r21 0x20 32 r22 0x30 48 r23 0x0 0 r24 0x100086d7170 1099653017968 r25 0x0 0 r26 0x100086d7148 1099653017928 r27 0x1000d266d90 1099732250000 r28 0x10008582500 1099651622144 r29 0x10009879980 1099671509376 r30 0x100086d71c8 1099653018056 r31 0x400 1024 pc 0x7fff2613ca24 0x7fff2613ca24 <lp_rast_shade_quads_mask+644> msr 0x800000000280f033 9223372036896780339 cr 0x42224488 1109542024 lr 0x7fff2613ca24 0x7fff2613ca24 <lp_rast_shade_quads_mask+644> ctr 0x7fff7c5d0000 140735279857664 xer 0x0 0 fpscr 0xb2004100 2986361088 vscr 0x1 1 vrsave 0xffffffff -1 ppr 0xc000000000000 3377699720527872 dscr 0x17 23 tar 0x0 0 second fragment shader case: Dump of assembler code from 0x7fffa42f0000 to 0x7fffa42f0020: 0x00007fffa42f0000: addis r2,r12,28354 ;; Prologue: set up TOC ptr in r2; 28354=0x6ec2, r12=0x7fff.a42f.0000; r2 <- 0x8000.12f1.0000 0x00007fffa42f0004: addi r2,r2,-32768 ;; finish setting up r2; r2 <- 0x8000.12f0.8000 0x00007fffa42f0008: stdu r1,-720(r1) 0x00007fffa42f000c: std r28,616(r1) 0x00007fffa42f0010: std r29,624(r1) 0x00007fffa42f0014: std r30,632(r1) 0x00007fffa42f0018: mtfprwa f1,r5 0x00007fffa42f001c: ld r5,-32768(r2) ;; Faulting instruction; first use of r2 End of assembler dump. (gdb) info reg r0 0x0 0 r1 0x7fff377edee0 140734124449504 r2 0x800012f08000 140737806106624 r3 0x10047780c90 1100710677648 r4 0x0 0 r5 0x0 0 r6 0x1 1 r7 0x10047787850 1100710705232 r8 0x10047787870 1100710705264 r9 0x10047787890 1100710705296 r10 0x7fff377ee268 140734124450408 r11 0x7fffa42f0000 140735947931648 r12 0x7fffa42f0000 140735947931648 r13 0x7fff377f68c0 140734124484800 r14 0x110000100900001 76561197969702913 r15 0x7fff377ee4e0 140734124451040 r16 0x100001 1048577 r17 0x200000 2097152 r18 0x200000 2097152 r19 0x0 0 r20 0x30000000100000 13510798883160064 r21 0x70000000500000 31525197396836352 r22 0x0 0 r23 0xffffffffffe00000 18446744073707454464 r24 0x3f00000 66060288 r25 0x0 0 r26 0x100351ed478 1100402840696 r27 0x100378b4f60 1100443504480 r28 0x10047787840 1100710705216 r29 0x10048ef7340 1100735279936 r30 0x100351ed4f8 1100402840824 r31 0x80 128 pc 0x7fffa42f001c 0x7fffa42f001c msr 0x800000000280f033 9223372036896780339 cr 0x48224428 1210205224 lr 0x7fff5df2ca24 0x7fff5df2ca24 <lp_rast_shade_quads_mask+644> ctr 0x7fffa42f0000 140735947931648 xer 0x20040000 537133056 fpscr 0xbf204300 3206562560 vscr 0x1 1 vrsave 0xffffffff -1 ppr 0xc000000000000 3377699720527872 dscr 0x17 23 tar 0x0 0 In the Power8 vertex shader case, just to review: (gdb) bt #0 0x00007fff680000d0 in () #1 0x00007ffedfaca834 in llvm_pipeline_generic (middle=0x7ffff377de70, fetch_info=<optimized out>, in_prim_info=0x30) at ../src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c:400 #2 0x00000100247287d0 in () (gdb) disas 0x00007fff68000000,0x00007fff680000e0 Dump of assembler code from 0x7fff68000000 to 0x7fff680000e0: 0x00007fff68000000: addis r2,r12,28778 0x00007fff68000004: addi r2,r2,-32768 0x00007fff68000008: clrldi r0,r1,59 0x00007fff6800000c: std r30,-32(r1) 0x00007fff68000010: mr r30,r1 0x00007fff68000014: subfic r0,r0,-352 0x00007fff68000018: stdux r1,r1,r0 0x00007fff6800001c: li r8,-256 0x00007fff68000020: mtfprwz f0,r7 0x00007fff68000024: li r7,12 0x00007fff68000028: addi r10,r6,-1 0x00007fff6800002c: lwz r11,8(r5) 0x00007fff68000030: std r29,-40(r30) 0x00007fff68000034: lwz r29,24(r5) 0x00007fff68000038: stxvd2x vs56,r30,r8 0x00007fff6800003c: li r8,-240 0x00007fff68000040: mtfprwz f2,r10 0x00007fff68000044: std r28,-48(r30) 0x00007fff68000048: lhz r12,0(r9) 0x00007fff6800004c: xxspltw vs34,vs0,1 0x00007fff68000050: addi r28,r11,-7 0x00007fff68000054: std r26,-64(r30) 0x00007fff68000058: std r27,-56(r30) 0x00007fff6800005c: addi r26,r29,-7 0x00007fff68000060: xxlxor vs6,vs6,vs6 0x00007fff68000064: xxspltw vs35,vs2,1 0x00007fff68000068: cmplw r28,r11 0x00007fff6800006c: vspltisw v1,8 0x00007fff68000070: std r23,-88(r30) 0x00007fff68000074: li r23,192 0x00007fff68000078: lwa r10,4(r9) 0x00007fff6800007c: stxvd2x vs57,r30,r8 0x00007fff68000080: li r8,-224 0x00007fff68000084: lwa r0,20(r9) 0x00007fff68000088: lhz r9,16(r9) 0x00007fff6800008c: stxvd2x vs58,r30,r8 0x00007fff68000090: li r8,-208 0x00007fff68000094: mtvsrd vs5,r12 0x00007fff68000098: ld r12,0(r5) 0x00007fff6800009c: std r24,-80(r30) 0x00007fff680000a0: std r25,-72(r30) 0x00007fff680000a4: subf r27,r10,r28 0x00007fff680000a8: xxlxor vs36,vs36,vs36 0x00007fff680000ac: vspltisw v5,1 0x00007fff680000b0: li r25,88 0x00007fff680000b4: li r24,140 0x00007fff680000b8: std r18,-128(r30) 0x00007fff680000bc: cmplw cr1,r27,r28 0x00007fff680000c0: subf r11,r0,r26 0x00007fff680000c4: mtvsrd vs7,r9 0x00007fff680000c8: add r9,r12,r10 0x00007fff680000cc: xxswapd vs5,vs5 => 0x00007fff680000d0: ld r28,-32768(r2) ;; Faulting instruction; first use of r2 0x00007fff680000d4: stxvd2x vs59,r30,r8 0x00007fff680000d8: li r8,-192 0x00007fff680000dc: cror 4*cr5+lt,gt,4*cr1+gt End of assembler dump. (gdb) info reg r2 r2 0x7fffd8698000 140736824180736 So, register r2 SEEMS to contain a reasonable value; however, 'maintenance info sections' reveals something interesting: (gdb) maint info sections Exec file: `/usr/lib64/firefox/firefox', file type elf64-powerpcle. [0] 0x00000238->0x00000249 at 0x00000238: .interp ALLOC LOAD READONLY DATA HAS_CONTENTS ... (a LOT of data omitted for the sake of brevity) ... [467] 0x7fff68000000->0x7fff68010000 at 0x3a540000: load240 ALLOC LOAD READONLY CODE HAS_CONTENTS ... (^ This is where the code listed above resides)... ... (a [1154] 0x7fff80a00000->0x7fff80a20000 at 0x40190000: load848 ALLOC LOAD READONLY CODE HAS_CONTENTS [1155] 0x7fff80a20000->0x7fff80a30000 at 0x401b0000: load849a ALLOC LOAD READONLY CODE HAS_CONTENTS [1156] 0x7fff80a30000->0x7fff80a30000 at 0x401c0000: load849b ALLOC READONLY CODE [1157] 0x7fff80a50000->0x7fff80a60000 at 0x401c0000: load850 ALLOC LOAD READONLY HAS_CONTENTS [1158] 0x7fff80a60000->0x7fff80a70000 at 0x401d0000: load851 ALLOC LOAD HAS_CONTENTS [1159] 0x7ffff3750000->0x7ffff3790000 at 0x401e0000: load852 ALLOC LOAD HAS_CONTENTS <end of section info> Note that r2 points somewhere BETWEEN [1158] and [1159]; in fact, it points to the base of [1158] + 0x57c38000. Can you also post the LLVM IR? I did a scratch build of mesa built against LLVM 9, can you test this and see if it's still an issue: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=24006318 ------- Comment From nemanjai.com 2019-10-11 09:47 EDT------- Hmm... I wonder if this is just going past what we can fit into the medium code model. We are trying to reach 1,885,962,240 bytes away from the function address when setting the TOC. Is it possible that we actually meant to reach more than 2,147,483,647 bytes away from the function address and "run time dynamic linker" just truncated the relocation rather than trapping? Basically, I wonder if this issue goes away with switching to the large code model. The static compiler option would be -mcmodel=large, I am not sure how to switch the code model for the JIT. (In reply to IBM Bug Proxy from comment #7) > ------- Comment From nemanjai.com 2019-10-11 09:47 EDT------- > Hmm... I wonder if this is just going past what we can fit into the medium > code model. We are trying to reach 1,885,962,240 bytes away from the > function address when setting the TOC. Is it possible that we actually meant > to reach more than 2,147,483,647 bytes away from the function address and > "run time dynamic linker" just truncated the relocation rather than trapping? > It looks like the default code model for JIT is actually small not medium: https://github.com/llvm-project/llvm/blob/release_80/lib/Target/PowerPC/PPCTargetMachine.cpp#L225 > Basically, I wonder if this issue goes away with switching to the large code > model. The static compiler option would be -mcmodel=large, I am not sure how > to switch the code model for the JIT. We have two options, we can change the default in LLVM or we can have mesa override the default with something else. Nemanja, do you have any opinion about which approach would be better? ------- Comment From nemanjai.com 2019-10-28 11:56 EDT------- I am fine with changing the default - small seems like a strange default anyway. But it would be good to confirm that this actually fixes the problem. Have you confirmed that changing the default fixes the problem? I added a call to builder.setCodeModel(CodeModel::Large) in a ppc64le-specific section of Mesa's lp_bld_misc.cpp:lp_build_create_jit_compiler_for_module() and got my first clean run of Web GL Conformance on the Power8 machine on which I observed this problem. The only real costs of the Large code model as compared to the Medium model appear to be: 1) an 8-byte pointer (stored in the code section immediately before the entry point); 2) a load instruction to load this pointer (vs. an add-immediate instruction). I would assert that these costs are negligible in the context of any graphics application utilizing the LLVM pipe. On the face of it, it (now) seems that the choice of anything other than the Large code model for dynamically generated, JIT-compiled shader programs was incorrect. How can we reasonably choose a code model with restricted addressing in the context of an application (any appication) that may easily outgrow the limits of that restricted addressing? (Unless we have foreknowledge of the application's requirements that would be outside the scope of what might reasonably be expected of the LLVM compiler/linker/loader.) I propose to commit the change to lp_bld_misc.cpp that I described above and thereby render the fix somewhat independent of any changes to LLVM, but I suggest we make the suggested changes to LLVM as well. If need be, we could make the code model configurable via environment variable. At the same time, I would like to ask whether it would have been possible for the linker (by which I mean, I think, the LLVM runtime linker) to detect the out-of-range access attempt and flag it as an error. OR, does the runtime linker already generate an error which Mesa is incorrectly ignoring? (In reply to Ben Crocker from comment #11) > I added a call to builder.setCodeModel(CodeModel::Large) > in a ppc64le-specific section of Mesa's > lp_bld_misc.cpp:lp_build_create_jit_compiler_for_module() > > and got my first clean run of Web GL Conformance on the Power8 machine > on which I observed this problem. > > The only real costs of the Large code model as compared to the Medium > model appear to be: > > 1) an 8-byte pointer (stored in the code section immediately before > the entry point); > > 2) a load instruction to load this pointer (vs. an add-immediate > instruction). > > I would assert that these costs are negligible in the context of any > graphics application utilizing the LLVM pipe. > > On the face of it, it (now) seems that the choice of anything other > than the Large code model for dynamically generated, JIT-compiled > shader programs was incorrect. How can we reasonably choose a code > model with restricted addressing in the context of an application (any > appication) that may easily outgrow the limits of that restricted > addressing? (Unless we have foreknowledge of the application's > requirements that would be outside the scope of what might reasonably > be expected of the LLVM compiler/linker/loader.) > > I propose to commit the change to lp_bld_misc.cpp that I described > above and thereby render the fix somewhat independent of any changes > to LLVM, but I suggest we make the suggested changes to LLVM as well. > I think this is a good solution. I can propose an LLVM change upstream to change the default to large. > If need be, we could make the code model configurable via environment > variable. > > At the same time, I would like to ask whether it would have been > possible for the linker (by which I mean, I think, the LLVM runtime > linker) to detect the out-of-range access attempt and flag it as an > error. OR, does the runtime linker already generate an error which > Mesa is incorrectly ignoring? I'm not sure about this. Fix checked in upstream on 2019-11-13 in git://anongit.freedesktop.org/mesa/mesa commit 9c3be6d21fa6a45852045d0286b80fb420f82fe3 Author: Ben Crocker <bcrocker> Date: Wed Nov 13 20:27:24 2019 +0000 llvmpipe: use ppc64le/ppc64 Large code model for JIT-compiled shaders Large programs, e.g. gnome-shell and firefox, may tax the addressability of the Medium code model once a (potentially unbounded) number of dynamically generated JIT-compiled shader programs are linked in and relocated. Yet the default code model as of LLVM 8 is Medium or even Small. The cost of changing from Medium to Large is negligible: - an additional 8-byte pointer stored immediately before the shader entrypoint; - change an add-immediate (addis) instruction to a load (ld). Testing with WebGL Conformance (https://www.khronos.org/registry/webgl/sdk/tests/webgl-conformance-tests.html) yields clean runs with this change (and crashes without it). Testing with glxgears shows no detectable performance difference. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1753327, 1753789, 1543572, 1747110, and 1582226 Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/223 Co-authored by: Nemanja Ivanovic <nemanjai.com>, Tom Stellard <tstellar> CC: mesa-stable.org Signed-off-by: Ben Crocker <bcrocker> Fixed in mesa-19.3.0-3.rc4.el8, build ID 1018360. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1633 |