1545239 – miniruby crashing when compiled with -O2 or -O1 on aarch64

Bug 1545239 - miniruby crashing when compiled with -O2 or -O1 on aarch64

Summary: miniruby crashing when compiled with -O2 or -O1 on aarch64

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	gcc
Sub Component:
Version:	28
Hardware:	aarch64
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Jakub Jelinek
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	ARMTracker
TreeView+	depends on / blocked

Reported:	2018-02-14 13:05 UTC by Dan Horák
Modified:	2018-04-15 02:38 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2018-04-15 02:38:26 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Preprocessed source of vm.c (all as one file) (158.10 KB, application/x-gzip) 2018-02-15 19:28 UTC, Dave Malcolm	no flags	Details
Preprocessed source of vm.c (all as one file), with __attribute__ ((optimize("omit-frame-pointer"))) (141.72 KB, application/x-gzip) 2018-02-19 21:34 UTC, Dave Malcolm	no flags	Details
Generated asm for source and compile flags in comment #24 (1.91 MB, text/plain) 2018-02-20 15:41 UTC, Dave Malcolm	no flags	Details
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
GNU Compiler Collection	84521	0	None	None	None	2019-06-26 10:31:47 UTC
Ruby	14480	0	None	None	None	2018-02-16 08:56:05 UTC

Description Dan Horák 2018-02-14 13:05:02 UTC

The miniruby binary used to bootstrap ruby during the build is crashing when compiled with -O2 or-O1, but runs well with -O0. More details will follow.

The symptom is
...
gcc -O1 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -fstack-clash-protection -fPIC  -L. -Wl,-z,relro  -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -fstack-protector -rdynamic -Wl,-export-dynamic -fstack-protector  main.o dmydln.o miniinit.o dmyext.o miniprelude.o array.o bignum.o class.o compar.o compile.o complex.o cont.o debug.o debug_counter.o dir.o dln_find.o encoding.o enum.o enumerator.o error.o eval.o file.o gc.o hash.o inits.o io.o iseq.o load.o marshal.o math.o node.o numeric.o object.o pack.o parse.o proc.o process.o random.o range.o rational.o re.o regcomp.o regenc.o regerror.o regexec.o regparse.o regsyntax.o ruby.o safe.o signal.o sprintf.o st.o strftime.o string.o struct.o symbol.o thread.o time.o transcode.o util.o variable.o version.o vm.o vm_backtrace.o vm_dump.o vm_trace.o probes.o enc/ascii.o enc/us_ascii.o enc/unicode.o enc/utf_8.o enc/trans/newline.o setproctitle.o strlcat.o strlcpy.o addr2line.o   -lpthread -lgmp -ldl -lcrypt -lm   -o miniruby
:
./miniruby -I./lib -I. -I.ext/common  -n \
-e 'BEGIN{version=ARGV.shift;mis=ARGV.dup}' \
-e 'END{abort "UNICODE version mismatch: #{mis}" unless mis.empty?}' \
-e '(mis.delete(ARGF.path); ARGF.close) if /ONIG_UNICODE_VERSION_STRING +"#{Regexp.quote(version)}"/o' \
10.0.0 ./enc/unicode/10.0.0/casefold.h ./enc/unicode/10.0.0/name2ctype.h 
generating encdb.h
./miniruby -I./lib -I. -I.ext/common  ./tool/generic_erb.rb -c -o encdb.h ./template/encdb.h.tmpl ./enc enc
generating prelude.c
./miniruby -I./lib -I. -I.ext/common  ./tool/generic_erb.rb -I. -c -o prelude.c \
	./template/prelude.c.tmpl ./prelude.rb ./gem_prelude.rb ./abrt_prelude.rb
*** stack smashing detected ***: <unknown> terminated
encdb.h updated
...

When stack protector is disabled, then segfault appears.


Version-Release number of selected component (if applicable):
gcc-8.0.1-0.13.fc28.aarch64

Comment 1 Dan Horák 2018-02-14 13:07:32 UTC

task in koji - https://koji.fedoraproject.org/koji/taskinfo?taskID=25036130

Comment 2 Vít Ondruch 2018-02-14 14:31:25 UTC

Two findings:

1) In Koschei, it appears to fail since this build:

https://apps.fedoraproject.org/koschei/build/4061354

2) I tried -fno-stack-protector in this scratch build:

https://koji.fedoraproject.org/koji/taskinfo?taskID=25040883

and this was the output:

~~~
Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.HMxFF5
+ umask 022
+ cd /builddir/build/BUILD
+ cd ruby-2.5.0
+ sed -i s/-fstack-protector/-fno-stack-protector/ configure.ac
+ autoconf
+ export 'CFLAGS=-O2 -fno-stack-protector'
+ CFLAGS='-O2 -fno-stack-protector'
+ export 'LDFLAGS=-O2 -fno-stack-protector'
+ LDFLAGS='-O2 -fno-stack-protector'
+ export 'DLDFLAGS=-O2 -fno-stack-protector'
+ DLDFLAGS='-O2 -fno-stack-protector'
+ ./configure

... snip ...

linking miniruby
gcc -O2 -fno-stack-protector  -L. -O2 -fno-stack-protector -rdynamic -Wl,-export-dynamic -fno-stack-protector -pie  main.o dmydln.o miniinit.o dmyext.o miniprelude.o array.o bignum.o class.o compar.o compile.o complex.o cont.o debug.o debug_counter.o dir.o dln_find.o encoding.o enum.o enumerator.o error.o eval.o file.o gc.o hash.o inits.o io.o iseq.o load.o marshal.o math.o node.o numeric.o object.o pack.o parse.o proc.o process.o random.o range.o rational.o re.o regcomp.o regenc.o regerror.o regexec.o regparse.o regsyntax.o ruby.o safe.o signal.o sprintf.o st.o strftime.o string.o struct.o symbol.o thread.o time.o transcode.o util.o variable.o version.o vm.o vm_backtrace.o vm_dump.o vm_trace.o probes.o enc/ascii.o enc/us_ascii.o enc/unicode.o enc/utf_8.o enc/trans/newline.o setproctitle.o strlcat.o strlcpy.o addr2line.o   -lpthread -lgmp -ldl -lcrypt -lm   -o miniruby
:
./miniruby -I./lib -I. -I.ext/common  -n \
-e 'BEGIN{version=ARGV.shift;mis=ARGV.dup}' \
-e 'END{abort "UNICODE version mismatch: #{mis}" unless mis.empty?}' \
-e '(mis.delete(ARGF.path); ARGF.close) if /ONIG_UNICODE_VERSION_STRING +"#{Regexp.quote(version)}"/o' \
10.0.0 ./enc/unicode/10.0.0/casefold.h ./enc/unicode/10.0.0/name2ctype.h 
generating encdb.h
./miniruby -I./lib -I. -I.ext/common  ./tool/generic_erb.rb -c -o encdb.h ./template/encdb.h.tmpl ./enc enc
generating prelude.c
./miniruby -I./lib -I. -I.ext/common  ./tool/generic_erb.rb -I. -c -o prelude.c \
	./template/prelude.c.tmpl ./prelude.rb ./gem_prelude.rb 
./tool/generic_erb.rb:39: [BUG] Segmentation fault at 0x0000fffff025c658
ruby 2.5.0p0 (2017-12-25 revision 61468) [aarch64-linux]
-- Control frame information -----------------------------------------------
c:0004 p:0054 s:0027 E:000680 BLOCK  ./tool/generic_erb.rb:39 [FINISH]
c:0003 p:---- s:0022 e:000021 CFUNC  :map
c:0002 p:0135 s:0018 E:001510 EVAL   ./tool/generic_erb.rb:36 [FINISH]
c:0001 p:0000 s:0003 E:002280 (none) [FINISH]
-- Ruby level backtrace information ----------------------------------------
./tool/generic_erb.rb:36:in `<main>'
./tool/generic_erb.rb:36:in `map'
./tool/generic_erb.rb:39:in `block in <main>'
-- C level backtrace information -------------------------------------------
./miniruby(0x1a2ec8) [0xaaaad7376ec8]
./miniruby(0x1a2f48) [0xaaaad7376f48]
./miniruby(0x7f46c) [0xaaaad725346c]
./miniruby(0x13993c) [0xaaaad730d93c]
linux-vdso.so.1(0xffffbe19b66c) [0xffffbe19b66c]
[0xfffff025c658]
-- Other runtime information -----------------------------------------------
* Loaded script: ./tool/generic_erb.rb
* Loaded features:
    0 enumerator.so
    1 thread.rb
    2 rational.so
    3 complex.so
    4 /builddir/build/BUILD/ruby-2.5.0/lib/cgi/util.rb
    5 /builddir/build/BUILD/ruby-2.5.0/lib/erb.rb
    6 /builddir/build/BUILD/ruby-2.5.0/lib/optparse.rb
    7 /builddir/build/BUILD/ruby-2.5.0/lib/fileutils.rb
    8 /builddir/build/BUILD/ruby-2.5.0/tool/vpath.rb
    9 /builddir/build/BUILD/ruby-2.5.0/tool/colorize.rb
* Process memory map:
aaaad71d4000-aaaad7452000 r-xp 00000000 fc:03 268080                     /builddir/build/BUILD/ruby-2.5.0/miniruby
aaaad746f000-aaaad7474000 r--p 0028b000 fc:03 268080                     /builddir/build/BUILD/ruby-2.5.0/miniruby
aaaad7474000-aaaad7475000 rw-p 00290000 fc:03 268080                     /builddir/build/BUILD/ruby-2.5.0/miniruby
aaaad7475000-aaaad7486000 rw-p 00000000 00:00 0 
aaaada978000-aaaadac4e000 rw-p 00000000 00:00 0                          [heap]
ffffbd9cd000-ffffbdcb3000 r--s 00000000 fc:03 268080                     /builddir/build/BUILD/ruby-2.5.0/miniruby
ffffbdcb3000-ffffbdcc6000 r-xp 00000000 fc:03 8000563                    /usr/lib64/libgcc_s-8-20180210.so.1
ffffbdcc6000-ffffbdce2000 ---p 00013000 fc:03 8000563                    /usr/lib64/libgcc_s-8-20180210.so.1
ffffbdce2000-ffffbdce3000 r--p 0001f000 fc:03 8000563                    /usr/lib64/libgcc_s-8-20180210.so.1
ffffbdce3000-ffffbdce4000 rw-p 00020000 fc:03 8000563                    /usr/lib64/libgcc_s-8-20180210.so.1
ffffbdce4000-ffffbdce5000 ---p 00000000 00:00 0 
ffffbdce5000-ffffbde06000 rw-p 00000000 00:00 0 
ffffbde06000-ffffbdf60000 r-xp 00000000 fc:03 8003000                    /usr/lib64/libc-2.27.so
ffffbdf60000-ffffbdf72000 ---p 0015a000 fc:03 8003000                    /usr/lib64/libc-2.27.so
ffffbdf72000-ffffbdf76000 r--p 0015c000 fc:03 8003000                    /usr/lib64/libc-2.27.so
ffffbdf76000-ffffbdf78000 rw-p 00160000 fc:03 8003000                    /usr/lib64/libc-2.27.so
ffffbdf78000-ffffbdf7c000 rw-p 00000000 00:00 0 
ffffbdf7c000-ffffbe02d000 r-xp 00000000 fc:03 8003004                    /usr/lib64/libm-2.27.so
ffffbe02d000-ffffbe04b000 ---p 000b1000 fc:03 8003004                    /usr/lib64/libm-2.27.so
ffffbe04b000-ffffbe04c000 r--p 000bf000 fc:03 8003004                    /usr/lib64/libm-2.27.so
ffffbe04c000-ffffbe04d000 rw-p 000c0000 fc:03 8003004                    /usr/lib64/libm-2.27.so
ffffbe04d000-ffffbe06d000 r-xp 00000000 fc:03 8003334                    /usr/lib64/libcrypt.so.1.1.0
ffffbe06d000-ffffbe07c000 ---p 00020000 fc:03 8003334                    /usr/lib64/libcrypt.so.1.1.0
ffffbe07c000-ffffbe07d000 r--p 0001f000 fc:03 8003334                    /usr/lib64/libcrypt.so.1.1.0
ffffbe07d000-ffffbe086000 rw-p 00000000 00:00 0 
ffffbe086000-ffffbe089000 r-xp 00000000 fc:03 8003002                    /usr/lib64/libdl-2.27.so
ffffbe089000-ffffbe0a5000 ---p 00003000 fc:03 8003002                    /usr/lib64/libdl-2.27.so
ffffbe0a5000-ffffbe0a6000 r--p 0000f000 fc:03 8003002                    /usr/lib64/libdl-2.27.so
ffffbe0a6000-ffffbe0a7000 rw-p 00010000 fc:03 8003002                    /usr/lib64/libdl-2.27.so
ffffbe0a7000-ffffbe117000 r-xp 00000000 fc:03 8003324                    /usr/lib64/libgmp.so.10.3.2
ffffbe117000-ffffbe135000 ---p 00070000 fc:03 8003324                    /usr/lib64/libgmp.so.10.3.2
ffffbe135000-ffffbe137000 r--p 0007e000 fc:03 8003324                    /usr/lib64/libgmp.so.10.3.2
ffffbe137000-ffffbe138000 rw-p 00080000 fc:03 8003324                    /usr/lib64/libgmp.so.10.3.2
ffffbe138000-ffffbe151000 r-xp 00000000 fc:03 8003012                    /usr/lib64/libpthread-2.27.so
ffffbe151000-ffffbe167000 ---p 00019000 fc:03 8003012                    /usr/lib64/libpthread-2.27.so
ffffbe167000-ffffbe168000 r--p 0001f000 fc:03 8003012                    /usr/lib64/libpthread-2.27.so
ffffbe168000-ffffbe169000 rw-p 00020000 fc:03 8003012                    /usr/lib64/libpthread-2.27.so
ffffbe169000-ffffbe16d000 rw-p 00000000 00:00 0 
ffffbe16d000-ffffbe18c000 r-xp 00000000 fc:03 8002993                    /usr/lib64/ld-2.27.so
ffffbe191000-ffffbe195000 rw-p 00000000 00:00 0 
ffffbe19a000-ffffbe19b000 r--p 00000000 00:00 0                          [vvar]
ffffbe19b000-ffffbe19c000 r-xp 00000000 00:00 0                          [vdso]
ffffbe19c000-ffffbe19d000 r--p 0001f000 fc:03 8002993                    /usr/lib64/ld-2.27.so
ffffbe19d000-ffffbe19e000 rw-p 00020000 fc:03 8002993                    /usr/lib64/ld-2.27.so
ffffbe19e000-ffffbe19f000 rw-p 00000000 00:00 0 
ffffefa63000-fffff0262000 rw-p 00000000 00:00 0                          [stack]
[NOTE]
You may have encountered a bug in the Ruby interpreter or extension libraries.
Bug reports are welcome.
For details: http://www.ruby-lang.org/bugreport.html

... snip ...
~~~

So it should be possible to debug the issue ...

Comment 3 Jakub Jelinek 2018-02-14 14:51:39 UTC

Are no errors reported if you build it with -fsanitize=undefined?

Comment 4 Vít Ondruch 2018-02-14 15:11:46 UTC

https://koji.fedoraproject.org/koji/taskinfo?taskID=25044102

~~~
generating prelude.c
./miniruby -I./lib -I. -I.ext/common  ./tool/generic_erb.rb -I. -c -o prelude.c \
	./template/prelude.c.tmpl ./prelude.rb ./gem_prelude.rb 
./tool/generic_erb.rb:39: [BUG] Segmentation fault at 0x0000fffffe9ce570
ruby 2.5.0p0 (2017-12-25 revision 61468) [aarch64-linux]
-- Control frame information -----------------------------------------------
c:0004 p:0054 s:0027 E:001360 BLOCK  ./tool/generic_erb.rb:39 [FINISH]
c:0003 p:---- s:0022 e:000021 CFUNC  :map
c:0002 p:0135 s:0018 E:002480 EVAL   ./tool/generic_erb.rb:36 [FINISH]
c:0001 p:0000 s:0003 E:0006f0 (none) [FINISH]
-- Ruby level backtrace information ----------------------------------------
./tool/generic_erb.rb:36:in `<main>'
./tool/generic_erb.rb:36:in `map'
./tool/generic_erb.rb:39:in `block in <main>'
-- C level backtrace information -------------------------------------------
addr2line.c:263:19: runtime error: load of misaligned address 0xffff7e78a0ff for type 'unsigned int', which requires 4 byte alignment
0xffff7e78a0ff: note: pointer points here
 18 00 00 00 4d  00 00 00 02 00 47 00 00  00 04 01 fb 0e 0d 00 01  01 01 01 00 00 00 01 00  00 01 2f
             ^ 
addr2line.c:275:19: runtime error: load of misaligned address 0xffff7e78a105 for type 'unsigned int', which requires 4 byte alignment
0xffff7e78a105: note: pointer points here
 00 00 02 00 47 00 00  00 04 01 fb 0e 0d 00 01  01 01 01 00 00 00 01 00  00 01 2f 75 73 72 2f 69  6e
             ^ 
./miniruby(0x95b278) [0xaaaae1efb278]
./miniruby(0x95b350) [0xaaaae1efb350]
./miniruby(0x590814) [0xaaaae1b30814]
./miniruby(0x7d1610) [0xaaaae1d71610]
linux-vdso.so.1(0xffff7f9f266c) [0xffff7f9f266c]
[0xfffffe9ce570]
~~~

Comment 5 Jakub Jelinek 2018-02-14 15:14:47 UTC

So, miniruby is buggy.  Might not be the cause of it, but might be related.

Comment 6 Vít Ondruch 2018-02-14 15:41:55 UTC

If I am not mistaken, the addr2line code is executed from SIGSEV handler and the handler is definitely buggy on aarch64 [1]. But the SEGFAULT should never happen on the first place. The addr2line code should never be reached. Therefore I believe the issue is elsewhere.


[1] https://bugs.ruby-lang.org/issues/13758

Comment 7 Dave Malcolm 2018-02-14 21:55:23 UTC

I was able to reproduce the issue on an aarch64 box (gcc116 in the GCC Compile Farm) using a build of today's gcc 8.

By recompiling vm.c with different options and relinking miniruby with the various vm.o, I can get different behaviors from the miniruby invocation:
  ./miniruby -I./lib -I. -I.ext/common ./tool/generic_erb.rb -c -o transdb.h ./template/transdb.h.tmpl ./enc/trans enc/trans

With -fstack-protector:

-O2 and -O1:
  crash with:
    *** stack smashing detected ***: ./miniruby terminated

-O0:
  runs to completion

Without -fstack-protector:

-O2:
  crashes with:
    ./tool/generic_erb.rb:39: [BUG] Illegal instruction at 0x00000000215b5968

-O1:
  crashes with:
    ./tool/generic_erb.rb:39: [BUG] Segmentation fault at 0x0000000000000000

-O0:
  runs to completion

In every failing case, running under the debugger shows that the problem is happening within the 10th call to vm_call_opt_call, and the stack is corrupt at the end of the call.

It's a tail call of:
2065        return rb_vm_invoke_proc(ec, proc, argc, argv, calling->block_handler);

(gdb) call rb_backtrace_print_as_bugreport
-- Ruby level backtrace information ----------------------------------------
./tool/generic_erb.rb:36:in `<main>'
./tool/generic_erb.rb:36:in `map'
./tool/generic_erb.rb:39:in `block in <main>'

Not sure if this is an existing ruby bug that's been revealed by the newer compiler, or a compiler bug.

Any thoughts?

Comment 8 Dave Malcolm 2018-02-14 21:56:54 UTC

I tried running under valgrind, but it's a fairly ancient version on that machine; it emits (with vm.o built with -O1 -fstack-protector):

==31267== Memcheck, a memory error detector
==31267== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==31267== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==31267== Command: ./miniruby -I./lib -I. -I.ext/common ./tool/generic_erb.rb -c -o transdb.h ./template/transdb.h.tmpl ./enc/trans enc/trans
==31267== 
==31267== Warning: client switching stacks?  SP change: 0xffefffbf0 --> 0xffe8020b0
==31267==          to suppress, use: --max-stackframe=8379200 or greater
==31267== Invalid write of size 1
==31267==    at 0x5594E4: reserve_stack (thread_pthread.c:715)
==31267==    by 0x55CD27: ruby_init_stack (thread_pthread.c:748)
==31267==    by 0x41A13F: main (main.c:40)
==31267==  Address 0xffe8020b0 is on thread 1's stack
==31267==  in frame #0, created by reserve_stack (thread_pthread.c:670)
==31267== 
==31267== Warning: client switching stacks?  SP change: 0xffe8020b0 --> 0xffefffbf0
==31267==          to suppress, use: --max-stackframe=8379200 or greater
disInstr(arm64): unhandled instruction 0x5E61D821
disInstr(arm64): 0101'1110 0110'0001 1101'1000 0010'0001
==31267== valgrind: Unrecognised instruction at address 0x484c90.
==31267==    at 0x484C90: getrusage_time (gc.c:8601)
==31267==    by 0x48C58F: Init_heap (gc.c:2408)
==31267==    by 0x47A6CB: ruby_setup (eval.c:56)
==31267==    by 0x47BE9F: ruby_init (eval.c:78)
==31267==    by 0x41A143: main (main.c:41)
==31267== Your program just tried to execute an instruction that Valgrind
==31267== did not recognise.  There are two possible reasons for this.
==31267== 1. Your program has a bug and erroneously jumped to a non-code
==31267==    location.  If you are running Memcheck and you just saw a
==31267==    warning about a bad jump, it's probably your program's fault.
==31267== 2. The instruction is legitimate but Valgrind doesn't handle it,
==31267==    i.e. it's Valgrind's fault.  If you think this is the case or
==31267==    you are not sure, please let us know and we'll try to fix it.
==31267== Either way, Valgrind will now raise a SIGILL signal which will
==31267== probably kill your program.
==31267== 
==31267== Process terminating with default action of signal 4 (SIGILL)
==31267==  Illegal opcode at address 0x484C90
==31267==    at 0x484C90: getrusage_time (gc.c:8601)
==31267==    by 0x48C58F: Init_heap (gc.c:2408)
==31267==    by 0x47A6CB: ruby_setup (eval.c:56)
==31267==    by 0x47BE9F: ruby_init (eval.c:78)
==31267==    by 0x41A143: main (main.c:41)
==31267== 
==31267== HEAP SUMMARY:
==31267==     in use at exit: 1,488,184 bytes in 59 blocks
==31267==   total heap usage: 81 allocs, 22 frees, 1,509,192 bytes allocated
==31267== 
==31267== LEAK SUMMARY:
==31267==    definitely lost: 0 bytes in 0 blocks
==31267==    indirectly lost: 0 bytes in 0 blocks
==31267==      possibly lost: 1,446,336 bytes in 26 blocks
==31267==    still reachable: 41,848 bytes in 33 blocks
==31267==         suppressed: 0 bytes in 0 blocks
==31267== Rerun with --leak-check=full to see details of leaked memory
==31267== 
==31267== For counts of detected and suppressed errors, rerun with: -v
==31267== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

The illegal opcode there is at:
the illegal insn here is in:
  8600          if (try_clock_gettime && clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts) == 0) {
  8601              return ts.tv_sec + ts.tv_nsec * 1e-9;
  8602          }
which may be just a symptom of an ancient Valgrind.

Comment 9 Jakub Jelinek 2018-02-14 22:06:17 UTC

(In reply to Dave Malcolm from comment #7)
> I was able to reproduce the issue on an aarch64 box (gcc116 in the GCC
> Compile Farm) using a build of today's gcc 8.
> 
> By recompiling vm.c with different options and relinking miniruby with the
> various vm.o, I can get different behaviors from the miniruby invocation:
>   ./miniruby -I./lib -I. -I.ext/common ./tool/generic_erb.rb -c -o transdb.h
> ./template/transdb.h.tmpl ./enc/trans enc/trans
> 
> With -fstack-protector:
> 
> -O2 and -O1:
>   crash with:
>     *** stack smashing detected ***: ./miniruby terminated
> 
> -O0:
>   runs to completion
> 
> Without -fstack-protector:
> 
> -O2:
>   crashes with:
>     ./tool/generic_erb.rb:39: [BUG] Illegal instruction at 0x00000000215b5968
> 
> -O1:
>   crashes with:
>     ./tool/generic_erb.rb:39: [BUG] Segmentation fault at 0x0000000000000000
> 
> -O0:
>   runs to completion

Can you bisect which *.o file matters (perhaps first try the one with vm_call_opt_call), by trying all -O2 objects but one -O0 one (if that works) and all -O0 ones but one -O2 (if that fails)?
Attach here preprocessed source + full command line?

Next step would be to bisect which compiler svn revision broke it, perhaps we can find out something from that change or from looking at dump diffs.
I have a couple of aarch64 cross-compilers in bisect seed on my ws, but nothing substantive, but it shouldn't take too long to bisect a single issue.

I know the 2 bisections aren't nicest approach, but for wrong-code issues it often gives better starting point than spending hours in the debugger.

Comment 10 Dave Malcolm 2018-02-15 02:27:16 UTC

I tried hacking up Makefile to change the default flags, and then experimented with rebuilding with various optimization flags.

* Everything "-O0 -fstack-protector": works

* Everything "-O1 -fstack-protector": "*** stack smashing detected ***: ./miniruby terminated"

* Everything "-O0 -fstack-protector" apart from vm.c at "-O1 -fstack-protector": works

* Everything "-O0 -fstack-protector" apart from, one at a time each file at "-O2 -fstack-protector": works

* Everything "-O0 -fstack-protector" apart from main.c at "-O1 -fstack-protector": works
* Everything "-O0 -fstack-protector" apart from dmydln.c at "-O1 -fstack-protector": works
* Everything "-O0 -fstack-protector" apart from miniinit.c at "-O1 -fstack-protector": works
* Everything "-O0 -fstack-protector" apart from dmyext.c at "-O1 -fstack-protector": works

* Everything "-O1 -fstack-protector" apart from, one at a time each file "-O0 -fstack-protector" has varied results.
  * Most have "*** stack smashing detected ***: ./miniruby terminated" (specifically: each of: main.c, dmydln.c miniinit.c dmyext.c miniprelude.c bignum.c class.c compar.c compile.c complex.c cont.c debug.c debug_counter.c dir.c dln_find.c encoding.c enumerator.c error.c eval.c file.c gc.c hash.c inits.c io.c iseq.c load.c marshal.c math.c node.c numeric.c object.c pack.c parse.c proc.c process.c random.c range.c rational.c re.c regcomp.c regenc.c regerror.c regexec.c regparse.c regsyntax.c ruby.c safe.c signal.c sprintf.c st.c strftime.c string.c struct.c symbol.c thread.c time.c transcode.c util.c variable.c version.c vm_backtrace.c vm_dump.c vm_trace.c enc/ascii.c enc/us_ascii.c enc/unicode.c enc/utf_8.c enc/trans/newline.c ./missing/explicit_bzero.c ./missing/setproctitle.c ./missing/strlcat.c ./missing/strlcat.c ./missing/strlcpy.c addr2line.c)

  * With three files individually downgraded to -O0, it works: for each of: array.c enum.c vm.c (with the other two at -O1).

Comment 11 Dave Malcolm 2018-02-15 19:27:04 UTC

FWIW I experimented by moving vm_call_opt_call into its own source file (vm_insnhelper-2.c), but compiling with just that file at -O0 (everything else at -O1), it still fails.

(In reply to Dave Malcolm from comment #8)
> I tried running under valgrind, but it's a fairly ancient version on that
> machine; it emits (with vm.o built with -O1 -fstack-protector):

I built at up-to-date valgrind on the machine, which was able to handle the insns from comment #8.

With that, and connecting to valgrind from gdb shows the problem here:

==28265== Invalid read of size 8
==28265==    at 0x5984A8: vm_call_opt_call (vm_insnhelper-2.c:111)
==28265==  Address 0x1ffeffc9e8 is on thread 1's stack
==28265==  8568 bytes below stack pointer

(gdb) disassemble 
Dump of assembler code for function vm_call_opt_call:
   0x0000000000598354 <+0>:	stp	x29, x30, [sp,#-96]!
   0x0000000000598358 <+4>:	mov	x29, sp
   0x000000000059835c <+8>:	str	x0, [x29,#56]
   0x0000000000598360 <+12>:	str	x1, [x29,#48]
   0x0000000000598364 <+16>:	str	x2, [x29,#40]
   0x0000000000598368 <+20>:	str	x3, [x29,#32]
   0x000000000059836c <+24>:	str	x4, [x29,#24]
   0x0000000000598370 <+28>:	adrp	x0, 0x68c000
   0x0000000000598374 <+32>:	ldr	x0, [x0,#264]
   0x0000000000598378 <+36>:	ldr	x1, [x0]
   0x000000000059837c <+40>:	str	x1, [x29,#88]
   0x0000000000598380 <+44>:	mov	x1, #0x0                   	// #0
   0x0000000000598384 <+48>:	ldr	x0, [x29,#32]
   0x0000000000598388 <+52>:	ldr	w0, [x0,#8]
   0x000000000059838c <+56>:	and	w0, w0, #0x1
   0x0000000000598390 <+60>:	cmp	w0, #0x0
   0x0000000000598394 <+64>:	cset	w0, ne
   0x0000000000598398 <+68>:	and	w0, w0, #0xff
   0x000000000059839c <+72>:	and	x0, x0, #0xff
   0x00000000005983a0 <+76>:	cmp	x0, #0x0
   0x00000000005983a4 <+80>:	b.eq	0x5983b4 <vm_call_opt_call+96>
   0x00000000005983a8 <+84>:	ldr	x1, [x29,#40]
   0x00000000005983ac <+88>:	ldr	x0, [x29,#48]
   0x00000000005983b0 <+92>:	bl	0x5980f4 <vm_caller_setup_arg_splat>
   0x00000000005983b4 <+96>:	ldr	x0, [x29,#32]
   0x00000000005983b8 <+100>:	ldr	w0, [x0,#8]
   0x00000000005983bc <+104>:	and	w0, w0, #0x80
   0x00000000005983c0 <+108>:	cmp	w0, #0x0
   0x00000000005983c4 <+112>:	cset	w0, ne
   0x00000000005983c8 <+116>:	and	w0, w0, #0xff
   0x00000000005983cc <+120>:	and	x0, x0, #0xff
   0x00000000005983d0 <+124>:	cmp	x0, #0x0
   0x00000000005983d4 <+128>:	b.eq	0x5983e8 <vm_call_opt_call+148>
   0x00000000005983d8 <+132>:	ldr	x2, [x29,#32]
   0x00000000005983dc <+136>:	ldr	x1, [x29,#40]
   0x00000000005983e0 <+140>:	ldr	x0, [x29,#48]
   0x00000000005983e4 <+144>:	bl	0x598230 <vm_caller_setup_arg_kw>
   0x00000000005983e8 <+148>:	ldr	x0, [x29,#40]
   0x00000000005983ec <+152>:	ldr	w0, [x0,#16]
   0x00000000005983f0 <+156>:	str	w0, [x29,#68]
   0x00000000005983f4 <+160>:	ldrsw	x0, [x29,#68]
   0x00000000005983f8 <+164>:	lsl	x0, x0, #3
   0x00000000005983fc <+168>:	add	x0, x0, #0xf
   0x0000000000598400 <+172>:	lsr	x0, x0, #4
   0x0000000000598404 <+176>:	lsl	x0, x0, #4
   0x0000000000598408 <+180>:	sub	sp, sp, x0
   0x000000000059840c <+184>:	mov	x0, sp
   0x0000000000598410 <+188>:	add	x0, x0, #0xf
   0x0000000000598414 <+192>:	lsr	x0, x0, #4
   0x0000000000598418 <+196>:	lsl	x0, x0, #4
   0x000000000059841c <+200>:	str	x0, [x29,#72]
   0x0000000000598420 <+204>:	ldr	x0, [x29,#40]
   0x0000000000598424 <+208>:	ldr	x0, [x0,#8]
   0x0000000000598428 <+212>:	ldr	x0, [x0,#32]
   0x000000000059842c <+216>:	str	x0, [x29,#80]
   0x0000000000598430 <+220>:	ldr	x0, [x29,#48]
   0x0000000000598434 <+224>:	ldr	x1, [x0,#8]
   0x0000000000598438 <+228>:	ldrsw	x0, [x29,#68]
   0x000000000059843c <+232>:	lsl	x0, x0, #3
   0x0000000000598440 <+236>:	neg	x0, x0
   0x0000000000598444 <+240>:	add	x1, x1, x0
   0x0000000000598448 <+244>:	ldrsw	x0, [x29,#68]
   0x000000000059844c <+248>:	lsl	x0, x0, #3
   0x0000000000598450 <+252>:	mov	x2, x0
   0x0000000000598454 <+256>:	ldr	x0, [x29,#72]
   0x0000000000598458 <+260>:	bl	0x418fa0 <memcpy@plt>
   0x000000000059845c <+264>:	ldr	x0, [x29,#48]
   0x0000000000598460 <+268>:	ldr	x1, [x0,#8]
   0x0000000000598464 <+272>:	ldrsw	x0, [x29,#68]
   0x0000000000598468 <+276>:	lsl	x0, x0, #3
   0x000000000059846c <+280>:	mov	x2, #0xfffffffffffffff8    	// #-8
   0x0000000000598470 <+284>:	sub	x0, x2, x0
   0x0000000000598474 <+288>:	add	x1, x1, x0
   0x0000000000598478 <+292>:	ldr	x0, [x29,#48]
   0x000000000059847c <+296>:	str	x1, [x0,#8]
   0x0000000000598480 <+300>:	ldr	x0, [x29,#40]
   0x0000000000598484 <+304>:	ldr	x0, [x0]
   0x0000000000598488 <+308>:	mov	x4, x0
   0x000000000059848c <+312>:	ldr	x3, [x29,#72]
   0x0000000000598490 <+316>:	ldr	w2, [x29,#68]
   0x0000000000598494 <+320>:	ldr	x1, [x29,#80]
   0x0000000000598498 <+324>:	ldr	x0, [x29,#56]
   0x000000000059849c <+328>:	bl	0x592a14 <rb_vm_invoke_proc>
   0x00000000005984a0 <+332>:	adrp	x1, 0x68c000
   0x00000000005984a4 <+336>:	ldr	x1, [x1,#264]
=> 0x00000000005984a8 <+340>:	ldr	x2, [x29,#88]
   0x00000000005984ac <+344>:	ldr	x1, [x1]
   0x00000000005984b0 <+348>:	eor	x1, x2, x1
   0x00000000005984b4 <+352>:	cmp	x1, #0x0
   0x00000000005984b8 <+356>:	b.eq	0x5984c0 <vm_call_opt_call+364>
   0x00000000005984bc <+360>:	bl	0x419760 <__stack_chk_fail@plt>
   0x00000000005984c0 <+364>:	mov	sp, x29
   0x00000000005984c4 <+368>:	ldp	x29, x30, [sp],#96
   0x00000000005984c8 <+372>:	ret

Comment 12 Dave Malcolm 2018-02-15 19:28:45 UTC

Created attachment 1396723 [details]
Preprocessed source of vm.c (all as one file)

Compilation command line was:
  gcc -O1 -g -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -grecord-gcc-switches -fstack-protector -fPIC  -D_FORTIFY_SOURCE=2 -fstack-protector -fno-strict-overflow -fvisibility=hidden -fexcess-precision=standard -DRUBY_EXPORT   -I. -I.ext/include/aarch64-linux -I./include -I. -I./enc/unicode/10.0.0 -o vm.o -c vm.c

Comment 13 Dave Malcolm 2018-02-15 19:31:39 UTC

Crashes/valgrind complains if everything compiled with -O1.

The crash and valgrind complaint go away if vm.o compiled with -O0, everything else at -O1.

I'm not sure how to debug further.

Comment 14 Vít Ondruch 2018-02-16 08:56:06 UTC

I reported this Ruby upstream. Will see if that helps

Comment 15 Jun Aruga 2018-02-16 10:24:46 UTC

Thanks for working on this.

If "-O0" works, can you update ruby.spec something like a below way or anything to pass the build as a temporary fix for both rawhide and f27 until this issue will be fixed?
Because I want to update Fedora Ruby modularity that need to build rpms/ruby internally.
The temporary fix is better than current situation that we can not build it.

ruby.spec

```
%ifarch aarch64
optflags="-O0"
%else
optflags="-O2"
%endif

make %{?_smp_mflags} COPY="cp -p" Q= optflags="${optflags}"
```

Comment 16 Dan Horák 2018-02-16 10:37:51 UTC

I would do something like this in the ruby.spec

@@ -543,6 +543,11 @@ cp -a %{SOURCE6} .
 %build
 autoconf
 
+%ifarch aarch64
+# workaround for rhbz#1545239
+%global optflags %(echo %{optflags} | sed 's/-O2 /-O0 /')
+%endif
+
 %configure \
         --with-rubylibprefix='%{ruby_libdir}' \
         --with-archlibdir='%{_libdir}' \

Comment 17 Vít Ondruch 2018-02-16 12:43:13 UTC

(In reply to Jun Aruga from comment #15)
Sorry, not going to do that. That would just hide this issue.

Comment 18 Jakub Jelinek 2018-02-16 17:32:23 UTC

On vm.c with -O1 gcc bisection shows http://gcc.gnu.org/r254815, which just made -fomit-frame-pointer the default.  Let me see if it works with explicit -fomit-frame-pointer in older revisions.

Comment 19 Dave Malcolm 2018-02-16 17:50:06 UTC

Thanks.  Adding -fno-omit-frame-pointer to the build of vm.o (at -O1) fixes the miniruby crash.

Comment 20 Dave Malcolm 2018-02-16 18:06:40 UTC

(In reply to Jakub Jelinek from comment #18)
> Let me see if it works with explicit -fomit-frame-pointer in older revisions.

I tried that with r246616 (GNU C11 (GCC) version 7.0.1 20170331), and miniruby dies in the same way.

So it looks like a pre-existing issue with -fomit-frame-pointer that the change of default in gcc 8 (from r254815) has exposed.

Comment 21 Jakub Jelinek 2018-02-16 18:11:54 UTC

It is actually much older, I get the same crash if vm.c is compiled with
-mlittle-endian -mabi=lp64 -g -grecord-gcc-switches -O1 -Wall -Werror=format-security -fexceptions -fPIC -fstack-protector -fno-strict-overflow -fexcess-precision=standard -fomit-frame-pointer
with http://gcc.gnu.org/r204770, so already GCC 4.9 behaves that way too.

Note ruby uses -fno-omit-frame-pointer already, but only on mingw32.

Comment 22 Dave Malcolm 2018-02-16 21:43:52 UTC

I was able to reproduce the crash by recompiling vm.c at "-O0 -fno-inline" by adding
  __attribute__ ((optimize("omit-frame-pointer")))
to the top of vm.c

With that, I bisected vm.c to find where the above attribute triggered the crash.
The bisection ended within the #include "vm_insnhelper.c"

miniruby crashes with
  __attribute__ ((optimize("omit-frame-pointer")))
at vm_insnhelper.c line 1719
i.e. immediately *before* the definition of "call_cfunc_m1"

It works with
  __attribute__ ((optimize("omit-frame-pointer")))
at vm_insnhelper.c line 1725
i.e. immediately *after* the definition of "call_cfunc_m1"

Given that, I tried wrapping "call_cfunc_m1" with:

#pragma GCC push_options
#pragma GCC optimize ("omit-frame-pointer")

/* ... */

#pragma GCC pop_options

but doing so didn't trigger the crash on its own.



So I bisected an end-point for the optimization;
with:

#pragma GCC push_options
#pragma GCC optimize ("omit-frame-pointer")

at vm_insnhelper.c line 1719 onwards, immediately *before* definition of "call_cfunc_m1".

Putting
  #pragma GCC pop_options
at vm.c line 1167 (immediately *before* definition of "rb_vm_invoke_proc")
then miniruby works.

Moving the 
  #pragma GCC pop_options
to vm.c line 1184, immediately *after* definition of "rb_vm_invoke_proc"
causes miniruby to crash.

So the optimization seems to cause the crash somewhere in that range.

Comment 23 Dave Malcolm 2018-02-16 23:16:20 UTC

On attempting to debug this further I discovered that aarch64 seems to have some extra complexity here:

On attempting to compile without -fomit-frame-pointer, it hits:

aarch64_override_options_after_change_1 (opts=0x22d44a0 <global_options>) at ../../src/gcc/config/aarch64/aarch64.c:10497
(gdb) list
10486     /* If the frame pointer is enabled, set it to a special value that behaves
10487        similar to frame pointer omission.  If we don't do this all leaf functions
10488        will get a frame pointer even if flag_omit_leaf_frame_pointer is set.
10489        If flag_omit_frame_pointer has this special value, we must force the
10490        frame pointer if not in a leaf function.  We also need to force it in a
10491        leaf function if flag_omit_frame_pointer is not set or if LR is used.  */
10492     if (opts->x_flag_omit_frame_pointer == 0)
10493       opts->x_flag_omit_frame_pointer = 2;

and this gets used in aarch64_layout_frame:

  /* Emit a frame chain if the frame pointer is enabled.
     If -momit-leaf-frame-pointer is used, do not use a frame chain
     in leaf functions which do not use LR.  */
  if (flag_omit_frame_pointer == 2
      && !(flag_omit_leaf_frame_pointer && crtl->is_leaf
	   && !df_regs_ever_live_p (LR_REGNUM)))
    cfun->machine->frame.emit_frame_chain = true;

Comment 24 Dave Malcolm 2018-02-19 21:34:35 UTC

Created attachment 1398043 [details]
Preprocessed source of vm.c (all as one file), with __attribute__ ((optimize("omit-frame-pointer")))

By bisecting the push/pop_options pragmas, I was able to narrow down the set of functions in vm.c that need
  __attribute__ ((optimize("omit-frame-pointer")))
at -O0 in order to trigger the crash.

The list of functions is as follows:
  call_cfunc_m1
  call_cfunc_0
  vm_call_cfunc_with_frame
  vm_call_cfunc
  vm_call_method_each_type
  vm_call_method
  vm_call_general
  vm_yield_with_cfunc
  vm_exec_core
  vm_call0
  vm_call0_cfunc_with_frame
  vm_call0_cfunc
  rb_call0
  rb_call
  rb_yield_0
  rb_yield_1
  rb_yield
  rb_iterate0
  rb_iterate
  iterate_method
  rb_block_call
  eval_string_with_cref
  eval_string
  rb_f_eval
  invoke_block
  invoke_block_from_c_bh
  invoke_block_from_c_proc
  vm_invoke_proc

Here's an updated version of the vm.i, which has these attributes, and triggers the crash at -O0.  I also removed various "inline", removed some unused static fns, etc.

I'm using this line to compile it:
  gcc -O0 -g -Wall -Werror=format-security -fPIC  -fstack-protector -fno-strict-overflow -fvisibility=hidden -fexcess-precision=standard -o vm.o -c vm.i -fno-inline

Comment 25 Fedora End Of Life 2018-02-20 15:39:07 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 28 development cycle.
Changing version to '28'.

Comment 26 Dave Malcolm 2018-02-20 15:41:06 UTC

Created attachment 1398269 [details]
Generated asm for source and compile flags in comment #24

Comment 27 Dave Malcolm 2018-02-21 19:35:09 UTC

With -fomit-frame-pointer on *everything*, and hacking out the call to rb_thread_create_timer_thread in Init_Thread (to keep this single-threaded for simplicity), the bug appears to be a problem with setjmp/longjmp.

x29 (the frame pointer) is corrupted deep within the 15th call to vm_exec_core within the 10th call to vm_call_opt_call.

The write of the bogus value to x29 occurs here:

0x000000000057d9bc in rb_ec_tag_jump (ec=0x46b050 <all_iter_i>, st=RUBY_TAG_NONE) at vm.i:10459
10459     __builtin_longjmp(((ec->tag->buf)), (1));
2: /x $x29 = 0x7fffff8080

(gdb) disassemble 
Dump of assembler code for function rb_ec_tag_jump:
   0x000000000057d988 <+0>:     stp     x29, x30, [sp,#-32]!
   0x000000000057d98c <+4>:     mov     x29, sp
   0x000000000057d990 <+8>:     str     x0, [x29,#24]
   0x000000000057d994 <+12>:    str     w1, [x29,#20]
   0x000000000057d998 <+16>:    ldr     x0, [x29,#24]
   0x000000000057d99c <+20>:    ldr     x0, [x0,#24]
   0x000000000057d9a0 <+24>:    ldr     w1, [x29,#20]
   0x000000000057d9a4 <+28>:    str     w1, [x0,#336]
   0x000000000057d9a8 <+32>:    ldr     x0, [x29,#24]
   0x000000000057d9ac <+36>:    ldr     x0, [x0,#24]
   0x000000000057d9b0 <+40>:    add     x0, x0, #0x10
   0x000000000057d9b4 <+44>:    ldr     x1, [x0,#8]
   0x000000000057d9b8 <+48>:    ldr     x29, [x0]
=> 0x000000000057d9bc <+52>:    ldr     x0, [x0,#16]
   0x000000000057d9c0 <+56>:    mov     sp, x0
   0x000000000057d9c4 <+60>:    br      x1

when called from rb_iterate0, where the bogus x29 value has been fetched from the jmp_buf at +48.
 
A watchpoint on that memory shows it being set to the bogus value here in rb_iterate0:

   0x0000000000595e50 <+96>:    str     x0, [sp,#264]
   0x0000000000595e54 <+100>:   ldr     x0, [sp,#232]
   0x0000000000595e58 <+104>:   ldr     x0, [x0,#24]
   0x0000000000595e5c <+108>:   str     x0, [sp,#592]
   0x0000000000595e60 <+112>:   add     x0, sp, #0x108
   0x0000000000595e64 <+116>:   add     x0, x0, #0x10
   0x0000000000595e68 <+120>:   add     x1, sp, #0x260
   0x0000000000595e6c <+124>:   str     x1, [x0]
=> 0x0000000000595e70 <+128>:   adrp    x1, 0x595000 <raise_method_missing+544>

28300       struct rb_vm_tag _tag;
28301       _tag.state = RUBY_TAG_NONE;
28302       _tag.tag = ((VALUE)RUBY_Qundef);
28303       _tag.prev = _ec->tag;
28304       ;
28305       state = (__builtin_setjmp((_tag.buf)) ? rb_ec_tag_state((_ec))
28306                                             : ((void)(_ec->tag = &_tag), 0));

If I'm reading this right, the __builtin_longjmp rb_ec_tag_jump (in rb_iterate0) is attempting to restore x29 from the jmp_buf, but the __builtin_setjmp in rb_iterate0 isn't actually saving x29 there, and hence x29 gets corrupted at the longjmp, deep in the callstack, leading to an eventual crash when vm_call_opt_call eventually tries to use x29.

I'm now poring over RTL dumps...

Comment 28 Dave Malcolm 2018-02-21 21:31:12 UTC

After 234r.expand,
  rb_ec_tag_jump: the x29-part of the longjmp looks like this:
    (insn 15 14 16 2 (clobber (mem:BLK (reg/f:DI 29 x29) [0  A8])) "vm.i":10459 -1 (nil))
    (insn 16 15 17 2 (set (reg/f:DI 29 x29)
        (mem:DI (reg:DI 95) [187  S8 A8])) "vm.i":10459 -1
       (nil))

  rb_iterate0: the x29 part of the setjmp looks like this:
    (code_label/s 39 38 40 4 2337 (nil) [3 uses])
    (note 40 39 41 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
    (insn 41 40 42 4 (use (reg/f:DI 29 x29)) "vm.i":28305 -1
     (nil))
    (insn 42 41 43 4 (set (reg/f:DI 85 virtual-stack-vars)
        (reg/f:DI 29 x29)) "vm.i":28305 -1
     (nil))
    (insn 43 42 44 4 (use (reg/f:DI 29 x29)) "vm.i":28305 -1
     (nil))
    (insn 44 43 45 4 (clobber (reg/f:DI 29 x29)) "vm.i":28305 -1
     (nil))

i.e. the longjump has this set of x29:
    (insn 16 15 17 2 (set (reg/f:DI 29 x29)
        (mem:DI (reg:DI 95) [187  S8 A8])) "vm.i":10459 -1
       (nil))

and the setjmp has this write of it:
    (insn 42 41 43 4 (set (reg/f:DI 85 virtual-stack-vars)
        (reg/f:DI 29 x29)) "vm.i":28305 -1
     (nil))


After 235r.vregs, that write becomes:
    (insn 192 41 43 4 (set (reg/f:DI 64 sfp)
        (reg/f:DI 29 x29)) "vm.i":28305 -1
     (nil))

which becomes this, in 277r.ira:

    (insn 192 41 43 3 (set (reg/f:DI 64 sfp)
        (reg/f:DI 29 x29)) "vm.i":28305 47 {*movdi_aarch64}
     (nil))

but then becomes this in 278r.reload:
    (note 192 41 43 3 NOTE_INSN_DELETED)

with the dump showing:
  deleting insn with uid = 192.

Comment 29 Dave Malcolm 2018-02-22 18:51:52 UTC

I've filed this upstream with gcc as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84521 and created a minimal reproducer (see that upstream bug report).

Comment 30 Dave Malcolm 2018-02-22 19:26:12 UTC

I notice that in our builds, RUBY_SETJMP is using "__builtin_setjmp", rather than "setjmp".

This seems to come from the "configure" check; the rpm build log has:
  "checking for setjmp type... __builtin_setjmp"

Is this the default for upstream Ruby?

A GCC upstream developer notes:
> To me any use of __builtin_setjmp/__builtin_longjmp is almost always incorrect.
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84521#c3)

Comment 31 Dan Horák 2018-02-22 20:07:26 UTC

deferring the question to Vit, he is the Ruby guy

Comment 32 Vít Ondruch 2018-02-23 10:18:38 UTC

@Dave thx for analysis. I don't understand the details, but it was interesting to watch your progress.

(In reply to Dave Malcolm from comment #30)
> Is this the default for upstream Ruby?

Yes, this is default. Not sure why. I queried upstream about that [1].

However, there  is "--with-setjmp-type=setjmp" configuration option. If it is used, it seems the build is passing on aarch64 [2]. Not sure what is the impact though. Has __builtin_setjmp better performance (I am referring to "saving/restoring fewer registers")?



[1] https://bugs.ruby-lang.org/issues/14480#note-4
[2] https://koji.fedoraproject.org/koji/taskinfo?taskID=25249450

Comment 33 Florian Weimer 2018-02-24 18:09:31 UTC

(In reply to Vít Ondruch from comment #32)
> However, there  is "--with-setjmp-type=setjmp" configuration option. If it
> is used, it seems the build is passing on aarch64 [2]. Not sure what is the
> impact though. Has __builtin_setjmp better performance (I am referring to
> "saving/restoring fewer registers")?

I would assume so, but if it is saving and restoring too few registers, it doesn't really help that it's faster, does it?

I'm going to try a ruby build with --with-setjmp-type=setjmp on aarch64, in the hope that this will unblock us.

Comment 34 Jeff Law 2018-02-26 16:01:13 UTC

In general I would not expect applications to make direct use of the builtin setjmp/longjmp routines within GCC.  They should instead be using the OS provided setjmp/longjmp.

The builtin setjmp/longjmp are primarily for the use of the exception handling system on a small number of targets that do not have sufficient unwinding mechansisms.

Comment 35 Dave Malcolm 2018-02-26 16:30:10 UTC

The bug was worked around upstream today by commit r257984 (by changing the default on aarch64 back to -fno-omit-frame-pointer i.e. keep the frame pointer):
  https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=257984

That workaround/fix is not yet in Fedora's gcc rpms.

Comment 36 Vít Ondruch 2018-02-26 21:48:54 UTC

(In reply to Dave Malcolm from comment #30)
> This seems to come from the "configure" check; the rpm build log has:
>   "checking for setjmp type... __builtin_setjmp"
> 
> Is this the default for upstream Ruby?


So this is upstream response [1] to the question above:

Yes, it is, as far as I know since r15871 [2] ruby-core:16086 [3].


[1] https://bugs.ruby-lang.org/issues/14480#note-6
[2] https://bugs.ruby-lang.org/projects/ruby-trunk/repository/revisions/15871
[3] http://blade.nagaokaut.ac.jp/cgi-bin/vframe.rb/ruby/ruby-core/16086?15980-16545+split-mode-vertical

Comment 37 Jun Aruga 2018-03-26 17:44:51 UTC

I have a question.
Is this issue a random error?

Because I saw below koji build is success in rawhide.
https://apps.fedoraproject.org/koschei/package/ruby

But I tried to build modules/ruby building rpms/ruby internally 2 times.

I got below error 2 times.

```
https://koji.fedoraproject.org/koji/taskinfo?taskID=25990404
https://kojipkgs.fedoraproject.org//work/tasks/410/25990410/build.log
./miniruby -I./lib -I. -I.ext/common  ./enc/make_encmake.rb --builtin-encs="enc/ascii.o enc/us_ascii.o en
c/unicode.o enc/utf_8.o" --builtin-transes="enc/trans/newline.o" --module  enc.mk
make: *** [uncommon.mk:967: prelude.c] Aborted (core dumped)
```

Comment 38 Vít Ondruch 2018-03-27 05:53:50 UTC

(In reply to Jun Aruga from comment #37)
> Because I saw below koji build is success in rawhide.
> https://apps.fedoraproject.org/koschei/package/ruby
> 
> But I tried to build modules/ruby building rpms/ruby internally 2 times.

Your version is older the Rawhide version ...

Comment 39 Jun Aruga 2018-03-27 08:01:17 UTC

Hmm, interesting.
ruby-2.5.0-89.module_1539+dd658596.src.rpm was used to build Ruby module when I built it yesterday, against rawhide latest version: ruby-2.5.0-91.

Comment 40 Jun Aruga 2018-03-27 09:39:19 UTC

I had to add (empty) commit to modules/ruby to refer latest RPM packages.
But it was specification.
https://pagure.io/fm-orchestrator/issue/900

Comment 41 Vít Ondruch 2018-04-05 07:43:15 UTC

What is the status here? Are the GCC changes already in F28+ GCC? If the previous is true, should we keep the "setjmp" for AArch64? I am asking, since it seems that Ruby upstream is not going to change anything [1], so we should somehow officially resolve/close the issue.



[1] https://bugs.ruby-lang.org/issues/14480#note-10

Comment 42 Dave Malcolm 2018-04-06 16:37:53 UTC

(In reply to Dave Malcolm from comment #35)
> The bug was worked around upstream [2018-02-26] by commit r257984 (by changing the
> default on aarch64 back to -fno-omit-frame-pointer i.e. keep the frame
> pointer):
>   https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=257984
> 
> That workaround/fix is not yet in Fedora's gcc rpms.

It is now.

The pertinent upstream bug here is:
  PR target/84521
which is fixed in our gcc 8 rpms as of:
  8.0.1-0.17.
via this commit:
  https://src.fedoraproject.org/rpms/gcc/c/2a24f771d1af1b6fbb76b10ef357d0c9cd143b09?branch=master
and these builds:
  gcc-8.0.1-0.17.fc28:
    https://koji.fedoraproject.org/koji/buildinfo?buildID=1056623
  gcc-8.0.1-0.17.fc29:
    https://koji.fedoraproject.org/koji/buildinfo?buildID=1056618

Reading:
  https://bodhi.fedoraproject.org/updates/FEDORA-2018-24505f7c35
I see that "This update has been pushed to stable." and it contains gcc-8.0.1-0.20.fc28

Hence I believe that the bug is effectively fixed for both Fedora 28 and rawhide.

I'm marking this one as CLOSED CURRENTRELEASE; please feel free to reopen if you run into this issue with a more recent build of gcc.

Comment 43 Dave Malcolm 2018-04-06 16:39:32 UTC

(In reply to Jun Aruga from comment #37)
> I have a question.
> Is this issue a random error?
> 
> Because I saw below koji build is success in rawhide.
> https://apps.fedoraproject.org/koschei/package/ruby
> 
> But I tried to build modules/ruby building rpms/ruby internally 2 times.
> 
> I got below error 2 times.
> 
> ```
> https://koji.fedoraproject.org/koji/taskinfo?taskID=25990404
> https://kojipkgs.fedoraproject.org//work/tasks/410/25990410/build.log
> ./miniruby -I./lib -I. -I.ext/common  ./enc/make_encmake.rb
> --builtin-encs="enc/ascii.o enc/us_ascii.o en
> c/unicode.o enc/utf_8.o" --builtin-transes="enc/trans/newline.o" --module 
> enc.mk
> make: *** [uncommon.mk:967: prelude.c] Aborted (core dumped)
> ```

Looking at
  https://kojipkgs.fedoraproject.org//work/tasks/410/25990410/root.log
I see that this was with:
  8.0.1-0.16.fc28
which didn't have the fix.

Comment 44 Dave Malcolm 2018-04-06 16:49:51 UTC

(In reply to Vít Ondruch from comment #41)
> What is the status here? Are the GCC changes already in F28+ GCC?

I believe so; see comment #42.

> If the previous is true, should we keep the "setjmp" for AArch64? I am asking,
> since it seems that Ruby upstream is not going to change anything [1], so we
> should somehow officially resolve/close the issue.
> 
> 
> 
> [1] https://bugs.ruby-lang.org/issues/14480#note-10

Note that the workaround in the gcc rpm is papering over the issue, albeit a long-standing one: that __builtin_setjmp on aarch64 doesn't properly save the frame pointer, leading to clobbering of the frame pointer when __builtin_longjmp is used, hence leading to issues when used in conjunction with -fomit-frame-pointer.

If upstream Ruby want to use __builtin_setjmp as a performance optimization, that's up to them, I guess, but it's relying on none of the code ever using or interacting with -fomit-frame-pointer (until PR target/84521 is properly fixed).

I don't know if that answers your question; hope this is constructive.

Comment 45 Vít Ondruch 2018-04-09 08:46:00 UTC

(In reply to Dave Malcolm from comment #44)
> I don't know if that answers your question;

Neither do I ;) I forwarded your remark to upstream ticket and I'll stick with "setjmp" for AArch64 unless somebody has some convincing arguments against (but that would be discussion for different ticket).

Appreciate your help with this matter!

Comment 46 Jakub Jelinek 2018-04-09 08:55:24 UTC

Well, at this point there is no reason to treat AArch64 any differently than other targets.  So, either __builtin_setjmp is dangerous and shouldn't be used everywhere (it is), or it is ok everywhere (if you are prepared for further issues in the future).

Comment 47 Vít Ondruch 2018-04-10 07:11:49 UTC

(In reply to Jakub Jelinek from comment #46)
Ok, that is convincing. Going to drop the AArch64 special treatment. At least the .spec file will be simpler and the behavior similar to upstream defaults. Thx.

Comment 48 Fedora Update System 2018-04-11 09:56:18 UTC

ruby-2.5.1-92.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-dd8162c004

Comment 49 Fedora Update System 2018-04-11 22:58:59 UTC

ruby-2.5.1-92.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-dd8162c004

Comment 50 Fedora Update System 2018-04-15 02:38:26 UTC

ruby-2.5.1-92.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.