Bug 1753327

Summary: Firefox crash in LLVM-generated shader code
Product: Red Hat Enterprise Linux 8 Reporter: Ben Crocker <bcrocker>
Component: mesaAssignee: Ben Crocker <bcrocker>
Status: CLOSED ERRATA QA Contact: Desktop QE <desktop-qa-list>
Severity: high Docs Contact:
Priority: high    
Version: 8.1CC: bugproxy, csoriano, hannsj_uhl, mprchlik, sguelton, tpelka, tstellar
Target Milestone: rc   
Target Release: 8.2   
Hardware: ppc64le   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-28 15:41:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1711971    
Attachments:
Description Flags
Crash on Power8 in vertex shader code
none
Two crashes on Power9 in fragment shader code none

Description Ben Crocker 2019-09-18 15:24:34 UTC
Created attachment 1616283 [details]
Crash on Power8 in vertex shader code

Description of problem: Process crash in LLVM-generated shader code


Version-Release number of selected component (if applicable):
LLVM 7, 8
Firefox 68


How reproducible: Just about every time


Steps to Reproduce:
1. Make sure TigerVNC and Firefox are installed on target machine:
2. Start VNC server on target; I do this with a script like this:
    #!/bin/bash
    vncserver :1 -geometry 1920x1080 +iglx &
3. On remote host, vncviewer <target>:1
4. (If GNOME shell does not appear in the viewer window, start it
   manually on the target: export DISPLAY=:1 gnome-shell)
5. Start Firefox on the target
6. Go to the WebGL Conformance test page:
   https://www.khronos.org/registry/webgl/sdk/tests/webgl-conformance-tests.html
7. Click on "[run] all"

Actual results: Page changes to error page proclaiming
"Gah.  Your tab just crashed."
Firefox dumps core with SEGV.
coredumpctl shows that the faulting instruction is a reference to the
TOC (table of contents) segment pointed to by r2; see attachments.

Expected results: Tests run to completion


Additional info:

Comment 1 Ben Crocker 2019-09-18 15:30:50 UTC
Created attachment 1616284 [details]
Two crashes on Power9 in fragment shader code

On a Power9 system with LLVM 8, here are a couple of crashes
in fragment shader code.

Comment 2 IBM Bug Proxy 2019-09-20 13:30:26 UTC
------- Comment From nemanjai.com 2019-09-20 09:27 EDT-------
It appears clear from the analysis in this PR and the related one that we are accessing an invalid memory location off the TOC. There have been issues in the past with the runtime linker making a mistake as to whether it needs to branch to the local or the global entry point on a call.
So I think it would be valuable information to find out whether the global entry point - namely, the sequence:
0x00007fff68000000:	addis   r2,r12,28778
0x00007fff68000004:	addi    r2,r2,-32768

is actually executed in the function that the failure is in. If it does not, this is an indication that we have considered this function local and branched to the local entry point when we probably needed to branch to a global entry point (i.e. the call is across a DSO boundary).

Would it be possible to confirm where the function was entered?

Comment 3 Ben Crocker 2019-09-24 18:06:17 UTC
First of all, sorry for any confusion; the file attached to comment 1
(Two crashes on Power9 in fragment shader code) actually contains
attachment 1 [details] (Crash on Power8 in vertex shader code) at the beginning.

In the vertex shader crash, the faulting instruction is too far along
in the code for register reconstruction to be possible (i.e., r12 has
been overwritten three times), BUT in the fragment shader cases, the
faulting instruction is very early, so I was able to reconstruct
the register operations and confirm that the register state is consistent
with the function having been entered via the global entry point:

first fragment shader case:

Dump of assembler code from 0x7fff7c5d0000 to 0x7fff7c5d0020:
   0x00007fff7c5d0000:	addis   r2,r12,27421	;; Prologue: set up TOC ptr in r2; 27421=0x6b1d, r12=0x7fff.7c5d.0000; r2 <- 0x7fff.e77a.0000
   0x00007fff7c5d0004:	addi    r2,r2,-32768	;; finish setting up r2; -32768=0x8000; exts()=0xffffffffffff8000; r2 <- 0x7fff.e779.8000
   0x00007fff7c5d0008:	stdu    r1,-368(r1)
   0x00007fff7c5d000c:	ld      r4,480(r1)
   0x00007fff7c5d0010:	ld      r5,16(r4)
   0x00007fff7c5d0014:	addi    r5,r5,1
   0x00007fff7c5d0018:	std     r5,16(r4)
   0x00007fff7c5d001c:	ld      r4,-32768(r2)	;; Faulting instruction; first use of r2
End of assembler dump.
(gdb) info reg
r0             0x0                 0
r1             0x7fff25e8e240      140733829407296
r2             0x7fffe7798000      140737076887552
r3             0x1000857a900       1099651590400
r4             0x100086d71c8       1099653018056
r5             0x1                 1
r6             0x1                 1
r7             0x10008582510       1099651622160
r8             0x10008582520       1099651622176
r9             0x10008582530       1099651622192
r10            0x7fff25e8e468      140733829407848
r11            0x7fff7c5d0000      140735279857664
r12            0x7fff7c5d0000      140735279857664
r13            0x7fff25e968c0      140733829441728
r14            0x7fff8ef90000      140735592071168
r15            0x7fff25680000      140733820960768
r16            0x7fff8ef04410      140735591498768
r17            0x7fff8ef04420      140735591498784
r18            0x100086d6710       1099653015312
r19            0x7fff8ef00318      140735591482136
r20            0x10                16
r21            0x20                32
r22            0x30                48
r23            0x0                 0
r24            0x100086d7170       1099653017968
r25            0x0                 0
r26            0x100086d7148       1099653017928
r27            0x1000d266d90       1099732250000
r28            0x10008582500       1099651622144
r29            0x10009879980       1099671509376
r30            0x100086d71c8       1099653018056
r31            0x400               1024
pc             0x7fff2613ca24      0x7fff2613ca24 <lp_rast_shade_quads_mask+644>
msr            0x800000000280f033  9223372036896780339
cr             0x42224488          1109542024
lr             0x7fff2613ca24      0x7fff2613ca24 <lp_rast_shade_quads_mask+644>
ctr            0x7fff7c5d0000      140735279857664
xer            0x0                 0
fpscr          0xb2004100          2986361088
vscr           0x1                 1
vrsave         0xffffffff          -1
ppr            0xc000000000000     3377699720527872
dscr           0x17                23
tar            0x0                 0

second fragment shader case:

Dump of assembler code from 0x7fffa42f0000 to 0x7fffa42f0020:
   0x00007fffa42f0000:  addis   r2,r12,28354	;; Prologue: set up TOC ptr in r2; 28354=0x6ec2, r12=0x7fff.a42f.0000; r2 <- 0x8000.12f1.0000
   0x00007fffa42f0004:  addi    r2,r2,-32768	;; finish setting up r2; r2 <- 0x8000.12f0.8000
   0x00007fffa42f0008:  stdu    r1,-720(r1)
   0x00007fffa42f000c:  std     r28,616(r1)
   0x00007fffa42f0010:  std     r29,624(r1)
   0x00007fffa42f0014:  std     r30,632(r1)
   0x00007fffa42f0018:  mtfprwa f1,r5
   0x00007fffa42f001c:  ld      r5,-32768(r2)	;; Faulting instruction; first use of r2
End of assembler dump.
(gdb) info reg
r0             0x0                 0
r1             0x7fff377edee0      140734124449504
r2             0x800012f08000      140737806106624
r3             0x10047780c90       1100710677648
r4             0x0                 0
r5             0x0                 0
r6             0x1                 1
r7             0x10047787850       1100710705232
r8             0x10047787870       1100710705264
r9             0x10047787890       1100710705296
r10            0x7fff377ee268      140734124450408
r11            0x7fffa42f0000      140735947931648
r12            0x7fffa42f0000      140735947931648
r13            0x7fff377f68c0      140734124484800
r14            0x110000100900001   76561197969702913
r15            0x7fff377ee4e0      140734124451040
r16            0x100001            1048577
r17            0x200000            2097152
r18            0x200000            2097152
r19            0x0                 0
r20            0x30000000100000    13510798883160064
r21            0x70000000500000    31525197396836352
r22            0x0                 0
r23            0xffffffffffe00000  18446744073707454464
r24            0x3f00000           66060288
r25            0x0                 0
r26            0x100351ed478       1100402840696
r27            0x100378b4f60       1100443504480
r28            0x10047787840       1100710705216
r29            0x10048ef7340       1100735279936
r30            0x100351ed4f8       1100402840824
r31            0x80                128
pc             0x7fffa42f001c      0x7fffa42f001c
msr            0x800000000280f033  9223372036896780339
cr             0x48224428          1210205224
lr             0x7fff5df2ca24      0x7fff5df2ca24 <lp_rast_shade_quads_mask+644>
ctr            0x7fffa42f0000      140735947931648
xer            0x20040000          537133056
fpscr          0xbf204300          3206562560
vscr           0x1                 1
vrsave         0xffffffff          -1
ppr            0xc000000000000     3377699720527872
dscr           0x17                23
tar            0x0                 0

Comment 4 Ben Crocker 2019-09-25 15:57:25 UTC
In the Power8 vertex shader case, just to review:

(gdb) bt
#0  0x00007fff680000d0 in  ()
#1  0x00007ffedfaca834 in llvm_pipeline_generic (middle=0x7ffff377de70, fetch_info=<optimized out>, in_prim_info=0x30)
    at ../src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c:400
#2  0x00000100247287d0 in  ()
(gdb) disas 0x00007fff68000000,0x00007fff680000e0
Dump of assembler code from 0x7fff68000000 to 0x7fff680000e0:
   0x00007fff68000000:	addis   r2,r12,28778
   0x00007fff68000004:	addi    r2,r2,-32768
   0x00007fff68000008:	clrldi  r0,r1,59
   0x00007fff6800000c:	std     r30,-32(r1)
   0x00007fff68000010:	mr      r30,r1
   0x00007fff68000014:	subfic  r0,r0,-352
   0x00007fff68000018:	stdux   r1,r1,r0
   0x00007fff6800001c:	li      r8,-256
   0x00007fff68000020:	mtfprwz f0,r7
   0x00007fff68000024:	li      r7,12
   0x00007fff68000028:	addi    r10,r6,-1
   0x00007fff6800002c:	lwz     r11,8(r5)
   0x00007fff68000030:	std     r29,-40(r30)
   0x00007fff68000034:	lwz     r29,24(r5)
   0x00007fff68000038:	stxvd2x vs56,r30,r8
   0x00007fff6800003c:	li      r8,-240
   0x00007fff68000040:	mtfprwz f2,r10
   0x00007fff68000044:	std     r28,-48(r30)
   0x00007fff68000048:	lhz     r12,0(r9)
   0x00007fff6800004c:	xxspltw vs34,vs0,1
   0x00007fff68000050:	addi    r28,r11,-7
   0x00007fff68000054:	std     r26,-64(r30)
   0x00007fff68000058:	std     r27,-56(r30)
   0x00007fff6800005c:	addi    r26,r29,-7
   0x00007fff68000060:	xxlxor  vs6,vs6,vs6
   0x00007fff68000064:	xxspltw vs35,vs2,1
   0x00007fff68000068:	cmplw   r28,r11
   0x00007fff6800006c:	vspltisw v1,8
   0x00007fff68000070:	std     r23,-88(r30)
   0x00007fff68000074:	li      r23,192
   0x00007fff68000078:	lwa     r10,4(r9)
   0x00007fff6800007c:	stxvd2x vs57,r30,r8
   0x00007fff68000080:	li      r8,-224
   0x00007fff68000084:	lwa     r0,20(r9)
   0x00007fff68000088:	lhz     r9,16(r9)
   0x00007fff6800008c:	stxvd2x vs58,r30,r8
   0x00007fff68000090:	li      r8,-208
   0x00007fff68000094:	mtvsrd  vs5,r12
   0x00007fff68000098:	ld      r12,0(r5)
   0x00007fff6800009c:	std     r24,-80(r30)
   0x00007fff680000a0:	std     r25,-72(r30)
   0x00007fff680000a4:	subf    r27,r10,r28
   0x00007fff680000a8:	xxlxor  vs36,vs36,vs36
   0x00007fff680000ac:	vspltisw v5,1
   0x00007fff680000b0:	li      r25,88
   0x00007fff680000b4:	li      r24,140
   0x00007fff680000b8:	std     r18,-128(r30)
   0x00007fff680000bc:	cmplw   cr1,r27,r28
   0x00007fff680000c0:	subf    r11,r0,r26
   0x00007fff680000c4:	mtvsrd  vs7,r9
   0x00007fff680000c8:	add     r9,r12,r10
   0x00007fff680000cc:	xxswapd vs5,vs5
=> 0x00007fff680000d0:	ld      r28,-32768(r2)      ;; Faulting instruction; first use of r2
   0x00007fff680000d4:	stxvd2x vs59,r30,r8
   0x00007fff680000d8:	li      r8,-192
   0x00007fff680000dc:	cror    4*cr5+lt,gt,4*cr1+gt
End of assembler dump.
(gdb) info reg r2
r2             0x7fffd8698000      140736824180736

So, register r2 SEEMS to contain a reasonable value; however,
'maintenance info sections' reveals something interesting:

(gdb) maint info sections
Exec file:
    `/usr/lib64/firefox/firefox', file type elf64-powerpcle.
 [0]     0x00000238->0x00000249 at 0x00000238: .interp ALLOC LOAD READONLY DATA HAS_CONTENTS
... (a LOT of data omitted for the sake of brevity) ...
 [467]     0x7fff68000000->0x7fff68010000 at 0x3a540000: load240 ALLOC LOAD READONLY CODE HAS_CONTENTS
... (^ This is where the code listed above resides)...
... (a 
 [1154]     0x7fff80a00000->0x7fff80a20000 at 0x40190000: load848 ALLOC LOAD READONLY CODE HAS_CONTENTS
 [1155]     0x7fff80a20000->0x7fff80a30000 at 0x401b0000: load849a ALLOC LOAD READONLY CODE HAS_CONTENTS
 [1156]     0x7fff80a30000->0x7fff80a30000 at 0x401c0000: load849b ALLOC READONLY CODE
 [1157]     0x7fff80a50000->0x7fff80a60000 at 0x401c0000: load850 ALLOC LOAD READONLY HAS_CONTENTS
 [1158]     0x7fff80a60000->0x7fff80a70000 at 0x401d0000: load851 ALLOC LOAD HAS_CONTENTS
 [1159]     0x7ffff3750000->0x7ffff3790000 at 0x401e0000: load852 ALLOC LOAD HAS_CONTENTS
<end of section info>

Note that r2 points somewhere BETWEEN [1158] and [1159]; in fact, it points to
the base of [1158] + 0x57c38000.

Comment 5 Tom Stellard 2019-10-04 14:36:01 UTC
Can you also post the LLVM IR?

Comment 6 Tom Stellard 2019-10-11 05:42:07 UTC
I did a scratch build of mesa built against LLVM 9, can you test this and see if it's still an issue: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=24006318

Comment 7 IBM Bug Proxy 2019-10-11 13:51:30 UTC
------- Comment From nemanjai.com 2019-10-11 09:47 EDT-------
Hmm... I wonder if this is just going past what we can fit into the medium code model. We are trying to reach 1,885,962,240 bytes away from the function address when setting the TOC. Is it possible that we actually meant to reach more than 2,147,483,647 bytes away from the function address and "run time dynamic linker" just truncated the relocation rather than trapping?

Basically, I wonder if this issue goes away with switching to the large code model. The static compiler option would be -mcmodel=large, I am not sure how to switch the code model for the JIT.

Comment 8 Tom Stellard 2019-10-11 16:33:04 UTC
(In reply to IBM Bug Proxy from comment #7)
> ------- Comment From nemanjai.com 2019-10-11 09:47 EDT-------
> Hmm... I wonder if this is just going past what we can fit into the medium
> code model. We are trying to reach 1,885,962,240 bytes away from the
> function address when setting the TOC. Is it possible that we actually meant
> to reach more than 2,147,483,647 bytes away from the function address and
> "run time dynamic linker" just truncated the relocation rather than trapping?
> 

It looks like the default code model for JIT is actually small not medium:
https://github.com/llvm-project/llvm/blob/release_80/lib/Target/PowerPC/PPCTargetMachine.cpp#L225

> Basically, I wonder if this issue goes away with switching to the large code
> model. The static compiler option would be -mcmodel=large, I am not sure how
> to switch the code model for the JIT.

We have two options, we can change the default in LLVM or we can have mesa override the default with something else.

Comment 9 Tom Stellard 2019-10-24 18:27:03 UTC
Nemanja, do you have any opinion about which approach would be better?

Comment 10 IBM Bug Proxy 2019-10-28 16:00:19 UTC
------- Comment From nemanjai.com 2019-10-28 11:56 EDT-------
I am fine with changing the default - small seems like a strange default anyway. But it would be good to confirm that this actually fixes the problem.

Have you confirmed that changing the default fixes the problem?

Comment 11 Ben Crocker 2019-11-05 19:32:01 UTC
I added a call to builder.setCodeModel(CodeModel::Large)
in a ppc64le-specific section of Mesa's
lp_bld_misc.cpp:lp_build_create_jit_compiler_for_module()

and got my first clean run of Web GL Conformance on the Power8 machine
on which I observed this problem.

The only real costs of the Large code model as compared to the Medium
model appear to be:

1) an 8-byte pointer (stored in the code section immediately before
   the entry point);

2) a load instruction to load this pointer (vs. an add-immediate
   instruction).

I would assert that these costs are negligible in the context of any
graphics application utilizing the LLVM pipe.

On the face of it, it (now) seems that the choice of anything other
than the Large code model for dynamically generated, JIT-compiled
shader programs was incorrect.  How can we reasonably choose a code
model with restricted addressing in the context of an application (any
appication) that may easily outgrow the limits of that restricted
addressing?  (Unless we have foreknowledge of the application's
requirements that would be outside the scope of what might reasonably
be expected of the LLVM compiler/linker/loader.)

I propose to commit the change to lp_bld_misc.cpp that I described
above and thereby render the fix somewhat independent of any changes
to LLVM, but I suggest we make the suggested changes to LLVM as well.

If need be, we could make the code model configurable via environment
variable.

At the same time, I would like to ask whether it would have been
possible for the linker (by which I mean, I think, the LLVM runtime
linker) to detect the out-of-range access attempt and flag it as an
error.  OR, does the runtime linker already generate an error which
Mesa is incorrectly ignoring?

Comment 12 Tom Stellard 2019-11-05 19:57:11 UTC
(In reply to Ben Crocker from comment #11)
> I added a call to builder.setCodeModel(CodeModel::Large)
> in a ppc64le-specific section of Mesa's
> lp_bld_misc.cpp:lp_build_create_jit_compiler_for_module()
> 
> and got my first clean run of Web GL Conformance on the Power8 machine
> on which I observed this problem.
> 
> The only real costs of the Large code model as compared to the Medium
> model appear to be:
> 
> 1) an 8-byte pointer (stored in the code section immediately before
>    the entry point);
> 
> 2) a load instruction to load this pointer (vs. an add-immediate
>    instruction).
> 
> I would assert that these costs are negligible in the context of any
> graphics application utilizing the LLVM pipe.
> 
> On the face of it, it (now) seems that the choice of anything other
> than the Large code model for dynamically generated, JIT-compiled
> shader programs was incorrect.  How can we reasonably choose a code
> model with restricted addressing in the context of an application (any
> appication) that may easily outgrow the limits of that restricted
> addressing?  (Unless we have foreknowledge of the application's
> requirements that would be outside the scope of what might reasonably
> be expected of the LLVM compiler/linker/loader.)
> 
> I propose to commit the change to lp_bld_misc.cpp that I described
> above and thereby render the fix somewhat independent of any changes
> to LLVM, but I suggest we make the suggested changes to LLVM as well.
> 

I think this is a good solution.  I can propose an LLVM change upstream to
change the default to large.

> If need be, we could make the code model configurable via environment
> variable.
> 
> At the same time, I would like to ask whether it would have been
> possible for the linker (by which I mean, I think, the LLVM runtime
> linker) to detect the out-of-range access attempt and flag it as an
> error.  OR, does the runtime linker already generate an error which
> Mesa is incorrectly ignoring?

I'm not sure about this.

Comment 13 Ben Crocker 2019-11-15 21:51:45 UTC
Fix checked in upstream on 2019-11-13 in git://anongit.freedesktop.org/mesa/mesa

commit 9c3be6d21fa6a45852045d0286b80fb420f82fe3
Author: Ben Crocker <bcrocker>
Date:   Wed Nov 13 20:27:24 2019 +0000

    llvmpipe: use ppc64le/ppc64 Large code model for JIT-compiled shaders
    
    Large programs, e.g. gnome-shell and firefox, may tax the
    addressability of the Medium code model once a (potentially unbounded)
    number of dynamically generated JIT-compiled shader programs are
    linked in and relocated.  Yet the default code model as of LLVM 8 is
    Medium or even Small.
    
    The cost of changing from Medium to Large is negligible:
    - an additional 8-byte pointer stored immediately before the shader entrypoint;
    - change an add-immediate (addis) instruction to a load (ld).
    
    Testing with WebGL Conformance
    (https://www.khronos.org/registry/webgl/sdk/tests/webgl-conformance-tests.html)
    yields clean runs with this change (and crashes without it).
    
    Testing with glxgears shows no detectable performance difference.
    
    Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1753327, 1753789, 1543572, 1747110, and 1582226
    
    Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/223
    
    Co-authored by: Nemanja Ivanovic <nemanjai.com>, Tom Stellard <tstellar>
    
    CC: mesa-stable.org
    
    Signed-off-by: Ben Crocker <bcrocker>

Comment 14 Ben Crocker 2019-12-09 19:07:56 UTC
Fixed in mesa-19.3.0-3.rc4.el8, build ID 1018360.

Comment 18 errata-xmlrpc 2020-04-28 15:41:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1633