Bug 2049371 - gcc: segfault with inlining on ppc64le
Summary: gcc: segfault with inlining on ppc64le
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: gcc
Version: 36
Hardware: ppc64le
OS: Linux
unspecified
low
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: PPCTracker 2045261
TreeView+ depends on / blocked
 
Reported: 2022-02-02 04:21 UTC by Jerry James
Modified: 2023-05-25 16:51 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-05-25 16:51:54 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Compressed preprocessed source of foreign.c (91.41 KB, application/x-xz)
2022-02-02 04:21 UTC, Jerry James
no flags Details

Description Jerry James 2022-02-02 04:21:56 UTC
Created attachment 1858534 [details]
Compressed preprocessed source of foreign.c

Description of problem:
The clisp package segfaults during one of its tests, on ppc64le only.  This started with the introduction of GCC 12.  I have determined that if one file, foreign.c, is built with -fno-inline-small-functions, then the segfault does not happen.  I cannot figure out how to debug the issue with gdb.  Clisp uses libsigsegv, and even when using "handle SIGSEGV pass nostop", gdb reports that clisp segfaults and exits immediately on startup.  Valgrind reports that the segfault occurs when traversing a list of foreign objects that are to be freed after making a foreign function call.  Somehow a bad pointer is on the list.  Valgrind reports that the pointer is not stack-allocated, recently freed, or adjacent to any other allocated block.

The upstream source file is foreign.d, which is converted to foreign.c with clisp's own D-to-C converter.  It is normally built with these flags:

gcc -I/usr/include/libsvm  -I/builddir/build/BUILD/clisp-de01f0f47bb44d3a0f9e842464cf2520b238f356/src -I/builddir/build/BUILD/clisp-de01f0f47bb44d3a0f9e842464cf2520b238f356/build/gllib -I/builddir/build/BUILD/clisp-de01f0f47bb44d3a0f9e842464cf2520b238f356/src/gllib -O2 -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mcpu=power8 -mtune=power8 -fasynchronous-unwind-tables -fstack-clash-protection -Wa,--noexecstack -no-integrated-cpp -W -Wswitch -Wcomment -Wpointer-arith -Wreturn-type -Wmissing-declarations -Wimplicit -Wno-sign-compare -Wno-format-nonliteral -Wno-shift-negative-value -O2 -fwrapv -fno-strict-aliasing -DNO_ASM -DENABLE_UNICODE -DDYNAMIC_FFI -DDYNAMIC_MODULES -I. -c foreign.c

Clisp is built with LTO disabled due to some handwritten assembly language files.

Any help figuring out what is going on will be much appreciated.

Version-Release number of selected component (if applicable):
gcc-12.0.1-0.4.fc36.ppc64le

How reproducible:
Always.

Steps to Reproduce:
1. Build clisp on ppc64le in Rawhide

Actual results:
The readline test segfaults in the foreign code.

Expected results:
The test passes, as it does on every other architecture, and when built with GCC 11.

Additional info:

Comment 1 Jakub Jelinek 2022-02-03 17:04:02 UTC
Which test fails and how to invoke it (preferrably just running that single test and not the whole testsuite)?
If it is foreign.c, I can e.g. try to bisect which gcc revision made it fail and look from that, but for that it is better to have as few commands as possible for verification.

Comment 2 Jerry James 2022-02-04 05:19:00 UTC
It is the readline module test that segfaults.  Note that I have added a patch to the clisp package (Patch5: clisp-no-inline.patch) to work around this, so you will have to remove that.  Actually, I decided to narrow things down a bit more.  I added __attribute__((noinline)) to all of the short functions and then started removing them.  I ended with this patch, which is sufficient to avoid the segfault:

--- src/foreign.d.orig	2021-06-28 14:32:42.000000000 -0600
+++ src/foreign.d	2022-02-03 21:52:22.932176743 -0700
@@ -2417,7 +2417,7 @@ local void count_walk_post (object fvd,
 {
   unused(fvd); unused(obj); unused(walk);
 }
-local maygc void convert_to_foreign_needs (object fvd, object obj,
+local maygc __attribute__((noinline)) void convert_to_foreign_needs (object fvd, object obj,
                                            struct foreign_layout *sas)
 {
   struct walk_lisp walk

To run the readline module test, from the build directory do this:

cd build
./clisp -E UTF-8 -Emisc 1:1 -Epathname 1:1 -norc -C -i tests/tests -x "(ext:exit (plusp (or (run-some-tests :dirlist '("readline/") :srcdir \"../modules/\" :outdir \"./\") 0)))"

Comment 3 Jakub Jelinek 2022-02-04 16:18:22 UTC
Bisected to https://gcc.gnu.org/r12-6416 , with r12-6415 the test passes, with r12-6416 it segfaults.
Unfortunately it is a register allocator change and changes quite a lot in the callback and C_foreign_call_out functions.

Comment 4 Jakub Jelinek 2022-02-04 16:41:46 UTC
When I run
gdb --args /home/jakub/rpmbuild/BUILD/clisp-de01f0f47bb44d3a0f9e842464cf2520b238f356/build/base/lisp.run -B /home/jakub/rpmbuild/BUILD/clisp-de01f0f47bb44d3a0f9e842464cf2520b238f356/build -M /home/jakub/rpmbuild/BUILD/clisp-de01f0f47bb44d3a0f9e842464cf2520b238f356/build/base/lispinit.mem -N /home/jakub/rpmbuild/BUILD/clisp-de01f0f47bb44d3a0f9e842464cf2520b238f356/build/locale -E UTF-8 -Emisc 1:1 -Epathname 1:1 -norc -C -i tests/tests -x "(ext:exit (plusp (or (run-some-tests :dirlist '(readline/) :srcdir \"../modules/\" :outdir \"./\") 0)))"
rather than through the clisp wrapper (command copied and tweaked from strace -s 1024 -v), I see segfault on:
#0  walk_foreign_pointers (fvd=<optimized out>, data=data@entry=0x40026, walk=0x7fffffff33a8, walk@entry=0x7fffffff3478) at ../src/foreign.d:2033
2033	        if (*(void**)data == NULL)
where address 0x40026 isn't valid.
Backtrace is
#0  walk_foreign_pointers (fvd=<optimized out>, data=data@entry=0x40026, walk=0x7fffffff33a8, walk@entry=0x7fffffff3478) at ../src/foreign.d:2033
#1  0x00000000101af854 in free_foreign (data=0x40026, fvd=<optimized out>) at ../src/foreign.d:2176
#2  C_foreign_call_out (argcount=<optimized out>, rest_args_pointer=0x380000005a0) at ../src/foreign.d:4236
#3  0x0000000010057110 in funcall_subr (fun=0x1800000db68, args_on_stack=1) at ../src/eval.d:5251
#4  0x000000001005df10 in eval_ffunction (ffun=<optimized out>) at ../src/eval.d:4023
#5  eval1 (form=<optimized out>, form@entry=0x90000136af0) at ../src/eval.d:3142
#6  0x0000000010060668 in eval (form=0x90000136af0) at ../src/eval.d:3000
#7  0x000000001006de6c in C_progn () at ../src/control.d:296
#8  0x000000001005dfdc in eval_fsubr (args=<optimized out>, fun=<optimized out>) at ../src/eval.d:3298
#9  eval1 (form=<optimized out>, form@entry=0x90000137740) at ../src/eval.d:3135
#10 0x0000000010060668 in eval (form=0x90000137740) at ../src/eval.d:3000
#11 0x000000001006148c in funcall_iclosure (closure=0x98000062758, args_pointer=<optimized out>, args_pointer@entry=0x380000004d8, argcount=<optimized out>, argcount@entry=1)
    at ../src/eval.d:2778
#12 0x000000001005ef44 in eval_closure (closure=<optimized out>) at ../src/eval.d:3970
#13 eval1 (form=<optimized out>, form@entry=0x90000137990) at ../src/eval.d:3125
#14 0x0000000010060668 in eval (form=0x90000137990) at ../src/eval.d:3000
#15 0x00000000100729cc in C_let () at ../src/control.d:701
#16 C_let () at ../src/control.d:672
#17 0x000000001005dfdc in eval_fsubr (args=<optimized out>, fun=<optimized out>) at ../src/eval.d:3298
#18 eval1 (form=<optimized out>, form@entry=0x90000137870) at ../src/eval.d:3135
#19 0x0000000010060668 in eval (form=0x90000137870) at ../src/eval.d:3000
#20 0x0000000010060728 in eval_5env (form=<optimized out>, var_env=<optimized out>, fun_env=<optimized out>, block_env=<optimized out>, go_env=<optimized out>, 
    decl_env=<optimized out>) at ../src/eval.d:1118
#21 0x0000000010077630 in C_eval () at ../src/control.d:2196
#22 0x0000000010066fd0 in interpret_bytecode_ (closure=0x102af938 <back_trace>, codeptr=0x13800004d618, byteptr=<optimized out>) at ../src/eval.d:6834
#23 0x000000001006ce64 in funcall_closure (closure=<optimized out>, args_on_stack=<optimized out>) at ../src/eval.d:5659
#24 0x0000000010066e5c in interpret_bytecode_ (closure=0x980000604d0, codeptr=0x13800004dba0, byteptr=<optimized out>) at ../src/eval.d:6822
#25 0x000000001006ce64 in funcall_closure (closure=<optimized out>, args_on_stack=<optimized out>) at ../src/eval.d:5659
#26 0x0000000010069d3c in interpret_bytecode_ (closure=0x98000061660, codeptr=0xad, byteptr=<optimized out>) at ../src/eval.d:6883
#27 0x000000001006ce64 in funcall_closure (closure=<optimized out>, args_on_stack=<optimized out>) at ../src/eval.d:5659
#28 0x00000000100670b4 in interpret_bytecode_ (closure=0x98000061c68, codeptr=0x5c, byteptr=<optimized out>) at ../src/eval.d:6816
#29 0x000000001005f4a8 in eval_closure (closure=<optimized out>) at ../src/eval.d:3924
#30 eval1 (form=<optimized out>, form@entry=0x900001268b0) at ../src/eval.d:3125
#31 0x0000000010060668 in eval (form=0x900001268b0) at ../src/eval.d:3000
#32 0x000000001006e420 in C_or () at ../src/control.d:2507
#33 0x000000001005dfdc in eval_fsubr (args=<optimized out>, fun=<optimized out>) at ../src/eval.d:3298
#34 eval1 (form=<optimized out>, form@entry=0x900001268a0) at ../src/eval.d:3135
#35 0x0000000010060668 in eval (form=0x900001268a0) at ../src/eval.d:3000
#36 0x000000001005ed80 in eval_subr (fun=0x1800000bde0) at ../src/eval.d:3419
#37 eval1 (form=<optimized out>, form@entry=0x90000126890) at ../src/eval.d:3118
#38 0x0000000010060668 in eval (form=0x90000126890) at ../src/eval.d:3000
#39 0x000000001005e964 in eval_subr (fun=0x18000001bc8) at ../src/eval.d:3462
#40 eval1 (form=<optimized out>, form@entry=0x90000126880) at ../src/eval.d:3118
#41 0x0000000010060668 in eval (form=0x90000126880) at ../src/eval.d:3000
#42 0x000000001014be4c in C_read_eval_print () at ../src/debug.d:409
#43 0x0000000010057220 in funcall_subr (fun=0x180000024c0, args_on_stack=<optimized out>) at ../src/eval.d:5256
#44 0x0000000010066d74 in interpret_bytecode_ (closure=0x98000062430, codeptr=0x138000039e5c, byteptr=<optimized out>) at ../src/eval.d:6828
#45 0x000000001006ce64 in funcall_closure (closure=<optimized out>, args_on_stack=<optimized out>) at ../src/eval.d:5659
#46 0x0000000010076bd0 in C_driver () at ../src/control.d:2008
#47 0x0000000010066fd0 in interpret_bytecode_ (closure=0x102af938 <back_trace>, codeptr=0x138000039e2c, byteptr=<optimized out>) at ../src/eval.d:6834
--Type <RET> for more, q to quit, c to continue without paging--
#48 0x000000001006ce64 in funcall_closure (closure=<optimized out>, args_on_stack=<optimized out>) at ../src/eval.d:5659
#49 0x0000000010052ae0 in main_actions (p=0x102af998 <argv2>) at ../src/spvw.d:4033
#50 0x0000000010026a68 in main (argc=<optimized out>, argv=0x7fffffffeb88) at ../src/spvw.d:4321

Unfortunately, neither of the above mentioned routines appear in the backtrace.
C_foreign_call_out is called 31 times before it segfaults, callback twice.

Now, you said making convert_to_foreign_needs noinline cures it and that function is called by C_exec_on_stack (which isn't called by anything in the TU) and by C_foreign_call_out,
so most likely it is that call.

Comment 5 Jakub Jelinek 2022-02-04 16:44:06 UTC
Ah, actually C_foreign_call_out is in the backtrace (#2).

Comment 6 Jakub Jelinek 2022-02-04 16:53:02 UTC
But callback is called from it:
#0  0x00000000101aaec8 in callback (data=0x1, alist=0x7fffffff28a0) at ../src/foreign.d:4617
#1  0x00007ffff7c82140 in callback_receiver () from /lib64/libffcall.so.0
#2  0x00007ffff7ea89cc in _rl_dispatch_subseq (key=<optimized out>, map=<optimized out>, got_subseq=<optimized out>) at ../readline.c:887
#3  0x00007ffff7ea8fe8 in _rl_dispatch (key=<optimized out>, map=<optimized out>) at ../readline.c:833
#4  0x00007ffff7ea9e70 in readline_internal_char () at ../readline.c:645
#5  0x00007ffff7eb7764 in readline_internal_charloop () at ../readline.c:694
#6  readline_internal () at ../readline.c:706
#7  readline (prompt=<optimized out>) at ../readline.c:385
#8  0x00007ffff7c81da4 in avcall_call () from /lib64/libffcall.so.0
#9  0x00000000101af550 in C_foreign_call_out (argcount=<optimized out>, rest_args_pointer=0x380000005a0) at ../src/foreign.d:4216

Comment 7 Ben Cotton 2022-02-08 20:15:43 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 36 development cycle.
Changing version to 36.

Comment 8 Ben Cotton 2023-04-25 16:52:42 UTC
This message is a reminder that Fedora Linux 36 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 36 on 2023-05-16.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '36'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version. Note that the version field may be hidden.
Click the "Show advanced fields" button if you do not see it.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 36 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 9 Ludek Smid 2023-05-25 16:51:54 UTC
Fedora Linux 36 entered end-of-life (EOL) status on 2023-05-16.

Fedora Linux 36 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.