Description of problem: Starting gnome on an ARM64 machine without hardware GL support, results in a crash with large pointers in llvmpipe JIT'ed code. Running the mesa unit tests results in failures too. In both cases running the programs under valgrind makes the problem disappear. ../../../../bin/test-driver: line 107: 7204 Segmentation fault (core dumped) "$@" > $log_file 2>&1 FAIL: lp_test_format ../../../../bin/test-driver: line 107: 7212 Segmentation fault (core dumped) "$@" > $log_file 2>&1 FAIL: lp_test_arit PASS: lp_test_blend PASS: lp_test_conv PASS: lp_test_printf ============================================================================ Testsuite summary for Mesa 17.0.0 ============================================================================ # TOTAL: 5 # PASS: 3 # SKIP: 0 # XFAIL: 0 # FAIL: 2 # XPASS: 0 # ERROR: 0 ============================================================================ See src/gallium/drivers/llvmpipe/test-suite.log Please report to https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa ============================================================================ Version-Release number of selected component (if applicable): mesa-dri-drivers-17.0.0-1.fc26.aarch64 How reproducible: 100% of the time starting gnome-shell (via vncserver) or running 'make check' in the fedora mesa repo. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: Running `src/gallium/drivers/llvmpipe/lp_test_arit -v` rsqrt.v4(0): ref = inf, out = inf, precision = 24.000000 bits, PASS rsqrt.v4(1): ref = 1, out = 1, precision = 24.000000 bits, PASS rsqrt.v4(1.00000001e-07): ref = 3162.27783, out = 3162.27783, precision = 24.000000 bits, PASS rsqrt.v4(4): ref = 0.5, out = 0.5, precision = 24.000000 bits, PASS rsqrt.v4(100000): ref = 0.00316227786, out = 0.00316227786, precision = 24.000000 bits, PASS rsqrt.v4(1.00000004e+35): ref = 3.16227777e-18, out = 3.16227777e-18, precision = 24.000000 bits, PASS rsqrt.v4(5.8799997e-39): ref = 1.30410138e+19, out = 1.30410138e+19, precision = 24.000000 bits, PASS rsqrt.v4(inf): ref = 0, out = 0, precision = 24.000000 bits, PASS Segmentation fault (core dumped) The backtrace looks like: (gdb) bt #0 0x0000ffff9b2400d0 in ?? () #1 0x0000000000000001 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) It seems that this is a call to the function returned from gallivm_jit_function(struct gallivm_state *gallivm, LLVMValueRef func) (gdb) disassemble 0x0000ffff9b2400c0,0x0000ffff9b2400e0 Dump of assembler code from 0xffff9b2400c0 to 0xffff9b2400e0: 0x0000ffff9b2400c0: scvtf s1, w12 0x0000ffff9b2400c4: movk x11, #0x9b25, lsl #16 0x0000ffff9b2400c8: movk x9, #0x1c 0x0000ffff9b2400cc: movk x10, #0x20 => 0x0000ffff9b2400d0: ldr s7, [x13] 0x0000ffff9b2400d4: fmadd s2, s1, s2, s5 0x0000ffff9b2400d8: movk x11, #0x24 0x0000ffff9b2400dc: ldr s5, [x9] (gdb) info registers x0 0x51250a0 85086368 x1 0x51250c0 85086400 x2 0xffff9b240000 281473284571136 x3 0x4531e0 4534752 x4 0x515b870 85309552 x5 0x0 0 x6 0xffffffffff 1099511627775 x7 0x514dcb0 85253296 x8 0x80000000 2147483648 x9 0xffff9b25001c 281473284636700 x10 0xffff9b250020 281473284636704 x11 0xffff9b25000c 281473284636684 x12 0x80000000 2147483648 x13 0xffffffff9b250014 -1692073964 x14 0x0 0 x15 0x2 2 x16 0xffff9b2001c8 281473284309448 x17 0xffff988f57f8 281473241274360 x18 0x0 0 x19 0x51250c0 85086400 x20 0x1 1 x21 0x512ddc0 85122496 x22 0x7f800000 2139095040 x23 0x51250a0 85086368 x24 0x1 1 x25 0x0 0 x26 0x1 1 x27 0x0 0 x28 0x4065e0 4220384 x29 0xfffff4eefc50 281474791046224 x30 0x406f1c 4222748 sp 0xfffff4eefc50 0xfffff4eefc50 pc 0xffff9b2400d0 0xffff9b2400d0 cpsr 0x60000000 [ EL=0 C Z ] fpsr 0x0 0 fpcr 0x0 0 (more to come)
Ok, found a fun pile of unit test failures in llvm too (starting with the fact that ARM64 can't DC against a page without write permissions). The bottom line is that there is a LLVMPipe code generation problem. Its trying to load the constant 2.44331568e-05 into s7 (in this example) and its loading the address of that constant 16 bits at a time into x13 with movk's, but it fails to load the top 16 bits, leaving whatever happens to be in that register stale. Interrestingly it seems that it has code which is trying to clear the top 16 bits as well, but the target register (zxr in this case!) seems to be incorrect. The code in question is in lp_build_sin_or_cos(). and looks like: LLVMValueRef coscof_p0 = lp_build_const_vec(gallivm, bld->type, 2.443315711809948E-005); setting GALLIVM_DEBUG="nopt" fixes the problem!
Possible upstream fix here: https://reviews.llvm.org/D27609
Changing component to llvm.
Created attachment 1261680 [details] Fix aarch64 relocation
To be clear the patch is against LLVM 3.9.1 in F26 & rawhide.
mesa-17.0.1-2.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-701f4d0d08
mesa-17.0.1-2.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-701f4d0d08
For reference there was a similar problem seen with "root" package building that was fixed by this too. Reference: https://pagure.io/releng/issue/6653
I updated my hikey to the latest rawhide last night and VNC/gnome-shell/firefox were working. I will run a clean F26 install in the next couple days on seattle/juno.
Proposed as a Freeze Exception for 26-alpha by Fedora user pbrobinson using the blocker tracking app because: This causes issues with the gnome desktop crashing on aarch64 when using the llvmpipe driver which is used for a number of usecases on aarch64.
mesa-17.0.2-1.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-741d36d0b1
+1 FE
Would this affect llvmpipe usage on x86_64 as well?
(In reply to Adam Williamson from comment #13) > Would this affect llvmpipe usage on x86_64 as well? No, it was an issue with aarch64 with llvm that was explicitly aarch64 codepaths
Discussed during the 2017-03-27 blocker review meeting: [1] The decision was made to classify this bug as an AcceptedFreezeException was made as it would be nice to have this fixed in Alpha release. [1] https://meetbot.fedoraproject.org/fedora-blocker-review/2017-03-27/f26-blocker-review.2017-03-27-16.01.txt
mesa-17.0.1-2.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.
mesa-17.0.2-1.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.
Not directly a 48-bit VA problem, but definitely irritated by it.