Bug 1570571

Summary: Segmentation Fault when trying to compile kernel 4.16.3-300.fc28.aarch64
Product: [Fedora] Fedora Reporter: Pierre-Francois RENARD <pfrenard>
Component: gccAssignee: Jakub Jelinek <jakub>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 28CC: airlied, aoliva, bskeggs, davejohansen, dmalcolm, ewk, fweimer, gary.buhrmaster, hdegoede, ichavero, itamar, jakub, jarodwilson, jglisse, john.j5live, jonathan, josef, jwakely, kernel-maint, labbott, law, linville, mchehab, mjg59, mpolacek, msebor, nickc, pbrobinson, steved, tgl
Target Milestone: ---   
Target Release: ---   
Hardware: aarch64   
OS: Linux   
Whiteboard:
Fixed In Version: gcc-8.1.1-1.fc28 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-07-10 11:45:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Output of zconf.tab.o
none
ccHedFD8.out from comment 10
none
preprocessed_source none

Description Pierre-Francois RENARD 2018-04-23 09:18:00 UTC
Description of problem:
when trying to compile the kernel I have a segmentation fault, and a request to submit a bug :)

Version-Release number of selected component (if applicable):
kernel 4.16.2-300.fc28.aarch64



How reproducible:
each time

Steps to Reproduce:
1. cd /usr/src/kernel/4.16.3-300.fc28.aarch64
2. make
3.

Actual results:
  HOSTCC  scripts/kconfig/zconf.tab.o
during RTL pass: loop2_invariant
In file included from scripts/kconfig/zconf.tab.c:2487:
scripts/kconfig/symbol.c: In function ‘sym_check_sym_deps’:
scripts/kconfig/symbol.c:1261:1: internal compiler error: Segmentation fault
 }
 ^
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://bugzilla.redhat.com/bugzilla> for instructions.
Preprocessed source stored into /tmp/cchUArFl.out file, please attach this to your bugreport.
make[2]: *** [scripts/Makefile.host:107: scripts/kconfig/zconf.tab.o] Error 1
make[1]: *** [Makefile:514: silentoldconfig] Error 2
make: *** No rule to make target 'include/config/auto.conf', needed by 'include/config/kernel.release'.  Stop.


Expected results:


Additional info:
few packages release:

autoconf-2.69-27.fc28.noarch
automake-1.15.1-5.fc28.noarch
bison-3.0.4-9.fc28.aarch64
flex-2.6.1-7.fc28.aarch64
gcc-8.0.1-0.20.fc28.aarch64
kernel-4.16.3-300.fc28.aarch64
kernel-core-4.16.3-300.fc28.aarch64
kernel-devel-4.16.3-300.fc28.aarch64
kernel-headers-4.16.3-300.fc28.aarch64
kernel-modules-4.16.3-300.fc28.aarch64
kernel-tools-4.16.0-1.fc28.aarch64
kernel-tools-libs-4.16.0-1.fc28.aarch64
libgcc-8.0.1-0.20.fc28.aarch64

Comment 1 Laura Abbott 2018-04-23 15:53:44 UTC
Does this happen repeatedly? We haven't seen reports on our build servers and compiler segfaults are often a good indication of hardware problems.

Comment 2 Pierre-Francois RENARD 2018-04-23 19:24:09 UTC
Yes I can reproduce it each time I try it.
I also change the raspberry pi (first one 3B, second one 3B+)
This is the same issue:

/usr/src/kernels/4.16.3-300.fc28.aarch64# make
  HOSTCC  scripts/basic/fixdep
  HOSTCC  scripts/kconfig/conf.o
  YACC    scripts/kconfig/zconf.tab.c
  LEX     scripts/kconfig/zconf.lex.c
  HOSTCC  scripts/kconfig/zconf.tab.o
during RTL pass: loop2_invariant
In file included from scripts/kconfig/zconf.tab.c:2487:
scripts/kconfig/symbol.c: In function ‘sym_check_sym_deps’:
scripts/kconfig/symbol.c:1261:1: internal compiler error: Segmentation fault
 }
 ^
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://bugzilla.redhat.com/bugzilla> for instructions.
Preprocessed source stored into /tmp/cc61KVj6.out file, please attach this to your bugreport.
make[2]: *** [scripts/Makefile.host:107: scripts/kconfig/zconf.tab.o] Error 1
make[1]: *** [Makefile:514: silentoldconfig] Error 2
make: *** No rule to make target 'include/config/auto.conf', needed by 'include/config/kernel.release'.  Stop.

Comment 3 Peter Robinson 2018-04-23 21:32:17 UTC
(In reply to RENARD from comment #2)
> Yes I can reproduce it each time I try it.
> I also change the raspberry pi (first one 3B, second one 3B+)
> This is the same issue:

How much swap have you got allocated?

Comment 4 Pierre-Francois RENARD 2018-04-24 06:20:26 UTC
(In reply to Peter Robinson from comment #3)
> (In reply to RENARD from comment #2)
> > Yes I can reproduce it each time I try it.
> > I also change the raspberry pi (first one 3B, second one 3B+)
> > This is the same issue:
> 
> How much swap have you got allocated?
720 MB,
during the compilation process, cc1 is running and there is more than 600 MB a free memory ( and nearly all swap is free)

Comment 5 Laura Abbott 2018-04-24 13:15:36 UTC
This is a gcc bug. Moving to gcc. The command that fails is 

  gcc -Wp,-MD,scripts/kconfig/.zconf.tab.o.d -Wall -Wmissing-prototypes -Wstrict-prototypes -O2 -fomit-frame-pointer -std=gnu89    -DCURSES_LOC="<curses.h>" -DLOCALE  -Iscripts/kconfig -c -o scripts/kconfig/zconf.tab.o scripts/kconfig/zconf.tab.c

Comment 6 Marek Polacek 2018-04-24 13:19:54 UTC
Can we please get the preprocessed source file?

Preprocessed source stored into /tmp/cchUArFl.out file, please attach this to your bugreport.

Comment 7 Peter Robinson 2018-04-24 13:20:38 UTC
Created attachment 1426060 [details]
Output of zconf.tab.o

Output of zconf.tab.o when run with:
$ gcc -E -Wp,-MD,scripts/kconfig/.zconf.tab.o.d -Wall -Wmissing-prototypes -Wstrict-prototypes -O2 -fomit-frame-pointer -std=gnu89    -DCURSES_LOC="<curses.h>" -DLOCALE  -Iscripts/kconfig -c -o scripts/kconfig/zconf.tab.o scripts/kconfig/zconf.tab.c

Comment 8 Jakub Jelinek 2018-04-24 13:55:16 UTC
And can you reproduce the ICE if you run the compiler on this file?
gcc -Wall -Wmissing-prototypes -Wstrict-prototypes -O2 -fomit-frame-pointer -std=gnu89 -S -xc the_file_from_#c7
?
It compiles just fine for me.

Comment 9 Jakub Jelinek 2018-04-24 14:14:51 UTC
Neither in a cross-compiler I've tried first, nor on aarch64 native:
rpm -q gcc; gcc -Wall -Wmissing-prototypes -Wstrict-prototypes -O2 -fomit-frame-pointer -std=gnu89 -S -xc rh1570571.i; echo $?
gcc-8.0.1-0.20.fc28.aarch64
In file included from scripts/kconfig/zconf.tab.c:2485:
scripts/kconfig/confdata.c: In function ‘conf_write’:
scripts/kconfig/confdata.c:773:19: warning: ‘%s’ directive writing likely 7 or more bytes into a region of size between 1 and 4097 [-Wformat-overflow=]
scripts/kconfig/confdata.c:773:19: note: assuming directive output of 7 bytes
scripts/kconfig/confdata.c:773:2: note: ‘sprintf’ output 1 or more bytes (assuming 4104) into a destination of size 4097
scripts/kconfig/confdata.c:776:20: warning: ‘.tmpconfig.’ directive writing 11 bytes into a region of size between 1 and 4097 [-Wformat-overflow=]
scripts/kconfig/confdata.c:776:3: note: ‘sprintf’ output between 13 and 4119 bytes into a destination of size 4097
0

Comment 10 Peter Robinson 2018-04-24 14:21:04 UTC
(In reply to Jakub Jelinek from comment #9)
> Neither in a cross-compiler I've tried first, nor on aarch64 native:

So it works fine for me in a cross compiler and on a mustang (Cortex-A57 based) but the reporter has a RPi (Cortex-A53) and I can recreate it on a Pine64 (alos A53). Not sure if that makes a difference.

Comment 11 Peter Robinson 2018-04-24 14:22:10 UTC
(In reply to Jakub Jelinek from comment #8)
> And can you reproduce the ICE if you run the compiler on this file?
> gcc -Wall -Wmissing-prototypes -Wstrict-prototypes -O2 -fomit-frame-pointer
> -std=gnu89 -S -xc the_file_from_#c7
> ?

Yes (will attach the out file):

$ gcc -Wall -Wmissing-prototypes -Wstrict-prototypes -O2 -fomit-frame-pointer -std=gnu89 -S -xc scripts/kconfig/zconf.tab.o
during RTL pass: loop2_invariant
In file included from scripts/kconfig/zconf.tab.c:2487:
scripts/kconfig/symbol.c: In function ‘sym_check_sym_deps’:
scripts/kconfig/symbol.c:1261:1: internal compiler error: Segmentation fault
 }
 ^
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://bugzilla.redhat.com/bugzilla> for instructions.
Preprocessed source stored into /tmp/ccHedFD8.out file, please attach this to your bugreport.

Comment 12 Peter Robinson 2018-04-24 14:23:58 UTC
Created attachment 1426086 [details]
ccHedFD8.out from comment 10

attached output from compile test

Comment 13 Pierre-Francois RENARD 2018-04-24 14:24:19 UTC
Created attachment 1426087 [details]
preprocessed_source

Comment 14 Marek Polacek 2018-04-24 14:40:23 UTC
Latest trunk still doesn't crash for me even with -mcpu=cortex-a53 -mtune=cortex-a53.

Comment 15 Peter Robinson 2018-04-24 14:44:29 UTC
(In reply to Marek Polacek from comment #14)
> Latest trunk still doesn't crash for me even with -mcpu=cortex-a53
> -mtune=cortex-a53.

What hardware? As I mentioned above building on a Cortex-a57 and cross compiling on x86_64 works for me too, but I can recreate the issue on a Pine64. Latest as pushed to the mirrors:
binutils-2.29.1-20.fc28.aarch64
libasan-8.0.1-0.20.fc28.aarch64
libstdc++-8.0.1-0.20.fc28.aarch64
libgomp-8.0.1-0.20.fc28.aarch64
libubsan-8.0.1-0.20.fc28.aarch64
gcc-8.0.1-0.20.fc28.aarch64
libatomic-8.0.1-0.20.fc28.aarch64
libgcc-8.0.1-0.20.fc28.aarch64
cpp-8.0.1-0.20.fc28.aarch64

Comment 16 Jakub Jelinek 2018-04-24 14:53:20 UTC
So is there some Pine64 hw we can ssh in and see it?  Still it would be very strange, generally gcc ICEs are easily reproduceable no matter on which exact hw you run it or if cross-compiling or not.

Tried even compiling with --param ggc-min-expand=0 --param ggc-min-heapsize=0 and it succeeeded too, so is unlikely garbage collection related.

Comment 17 Peter Robinson 2018-04-24 14:55:45 UTC
> So is there some Pine64 hw we can ssh in and see it?  Still it would be very

No idea, this one is sitting on my desk next to me, not sure if you're located in a office but if so I would bet someone would have a RPi3 lying around they could loan you with an aarch64 install on.

Comment 19 Marek Polacek 2018-04-24 15:11:56 UTC
(In reply to Peter Robinson from comment #15)
> (In reply to Marek Polacek from comment #14)
> > Latest trunk still doesn't crash for me even with -mcpu=cortex-a53
> > -mtune=cortex-a53.
> 
> What hardware? As I mentioned above building on a Cortex-a57 and cross
> compiling on x86_64 works for me too, but I can recreate the issue on a
> Pine64.

That was
Processor	: AArch64 Processor rev 1 (aarch64)
Features	: fp asimd evtstrm 
CPU implementer	: 0x50
CPU architecture: AArch64
CPU variant	: 0x0
CPU part	: 0x000
CPU revision	: 1

Hardware	: APM X-Gene Mustang board

Comment 20 Jakub Jelinek 2018-04-25 07:35:09 UTC
Ok, reproduced on Jonathan's Pine64, but it looks weird, any known CPU hw issues or kernel bugs?
gdb --args /usr/libexec/gcc/aarch64-redhat-linux/8/cc1 -quiet -mlittle-endian -mabi=lp64 -O2 -Wall -Wmissing-prototypes -Wstrict-prototypes -std=gnu90 -fomit-frame-pointer /tmp/zconf.tab.i -o /tmp/zconf.tab.s2

cc1 binary has code like:
   0x0000000000d2bffc <rtx_addr_can_trap_p_1(const_rtx, poly_int64, poly_int64, machine_mode, bool)+524>:	adrp	x6, 0x16c9000 <targetm+2584>
   0x0000000000d2c000 <rtx_addr_can_trap_p_1(const_rtx, poly_int64, poly_int64, machine_mode, bool)+528>:	stp	x2, x3, [sp, #56]
   0x0000000000d2c004 <rtx_addr_can_trap_p_1(const_rtx, poly_int64, poly_int64, machine_mode, bool)+532>:	ldr	x0, [x6, #496]
   0x0000000000d2c008 <rtx_addr_can_trap_p_1(const_rtx, poly_int64, poly_int64, machine_mode, bool)+536>:	str	x4, [sp, #72]
   0x0000000000d2c00c <rtx_addr_can_trap_p_1(const_rtx, poly_int64, poly_int64, machine_mode, bool)+540>:	blr	x0
   0x0000000000d2c010 <rtx_addr_can_trap_p_1(const_rtx, poly_int64, poly_int64, machine_mode, bool)+544>:	mov	x19, x0
where 0x16c9000 + 496 is 0x16c91f0, &targetm.starting_frame_offset, and that contains
0x16c91f0 <targetm+3080>:	0x000000000124be50
unmodified all the time (the targetm object is in .data section, but contains lots of function pointers that are never modified).
Now, if I put a breakpoint at *0x0000000000d2c00c and/or *0x0000000000d2c008, the first time $x0 contains the expected 0x000000000124be50 value,
but randomly either the second time, or e.g. 8th time (or sometimes never) $x0 contains 0x16d74a0 value instead, which is not a address of a function, but rather an address into the .data section, inside of the aarch64_types_binop_qualifiers array.  If this happens, $x6 is still correctly 0x16c9000
and *(long *)0x16c91f0 still contains 0x000000000124be50, just $x0 is incorrect.  There are no branches to 0xd2c008 in the code.
Now, if I put a breakpoint on *0x0000000000d2c004 too, I can never reproduce the problem, it always compiles it fine, which suggest wild branching (say indirect call through corruption) to 0xd2c008 is unlikely.
The cc1 binary is single threaded position dependent binary, it installs a couple of signal handlers, but only for the fatal signals:
toplev.c:      signal (SIGSEGV, crash_signal);
toplev.c:      signal (SIGILL, crash_signal);
toplev.c:      signal (SIGBUS, crash_signal);
toplev.c:      signal (SIGABRT, crash_signal);
toplev.c:      signal (SIGIOT, crash_signal);
toplev.c:      signal (SIGFPE, crash_signal);

Comment 21 Jakub Jelinek 2018-04-25 07:36:59 UTC
The source code for the above snippet is:
          if (x == frame_pointer_rtx)
            {
              if (FRAME_GROWS_DOWNWARD)
                {
                  high_bound = targetm.starting_frame_offset ();
                  low_bound  = high_bound - get_frame_size ();
                }
              else
                {
                  low_bound  = targetm.starting_frame_offset ();
                  high_bound = low_bound + get_frame_size ();
                }
            }
(the high_bound = targetm.starting_frame_offset () line).

Comment 22 Peter Robinson 2018-04-25 07:55:21 UTC
(In reply to Jakub Jelinek from comment #20)
> Ok, reproduced on Jonathan's Pine64, but it looks weird, any known CPU hw
> issues or kernel bugs?

None I'm aware of, but probably better answered by ARM people. The cpu core revisions are the same across Pine64/RPi3/RPi3+ :
Vendor ID:           ARM
Model:               4
Model name:          Cortex-A53
Stepping:            r0p4

Although the pine64 has a few more feature flags:
RPI: Flags: fp asimd evtstrm crc32 cpuid
Pine64: Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid

Comment 23 Peter Robinson 2018-04-29 19:27:35 UTC
I suspect https://bugzilla.redhat.com/show_bug.cgi?id=1562522 is a similar/related problem. We've also had a couple of other similar reports on IRC with building on the RPi

Comment 24 Tom Lane 2018-05-06 04:46:53 UTC
I got a bunch of gcc segfaults with the same "during RTL pass: loop2_invariant" symptom while trying to compile Postgres on a Raspberry Pi 3B+, using current F28 aarch64.  It is not 100% reproducible (in fact, sometimes gcc itself tells me that; other times, retrying the compile succeeds; but some of the .c files fail pretty repeatably).

It smells like a kernel problem to me, especially since I also see noticeable flakiness at the GNOME/UI level --- for instance, it took me several reboots to get the machine's networking configured :-(.  I initially suspected I had a flaky SD card or some such, but Raspbian works perfectly on the same hardware.

Comment 25 Gary Buhrmaster 2018-05-06 15:10:04 UTC
FWIW, the gcc segfault, at least for my test case, seems to be gone when using gcc 8.1 from koji gcc 8.1 build 1078521 https://koji.fedoraproject.org/koji/buildinfo?buildID=1078521

Comment 26 Peter Robinson 2018-05-06 23:03:10 UTC
Yes, I got the vast majority of the way through the same kernel build that failed basically straight up previously, core kernel build, storage ran out most of the way through the modules build, but it's working.

I'd be interested to know the upstream fix.

Comment 27 Tom Lane 2018-05-06 23:15:37 UTC
Yeah, 8.1.1-1 works for me too.

I do notice that it seems to compile over 2x slower than the 6.3.0 armv6 gcc included with the Raspbian build I was comparing to ... is that expected?

Postgres build on Raspbian (Debian 9.4):

$ time make -j4 -s
...
All of PostgreSQL successfully made. Ready to install.

real    11m9.048s
user    35m7.695s
sys     1m27.023s

Identical test case on same hardware, F28 + gcc 8.1.1:

$ time make -j4 -s
...
All of PostgreSQL successfully made. Ready to install.

real    24m11.778s
user    70m46.477s
sys     4m38.827s

Comment 28 Jakub Jelinek 2018-05-06 23:23:45 UTC
There was no fix for this, my guess is that the gcc binary just changed enough that this kernel or hw bug is latent.

Comment 29 Peter Robinson 2018-05-06 23:28:35 UTC
> I do notice that it seems to compile over 2x slower than the 6.3.0 armv6 gcc
> included with the Raspbian build I was comparing to ... is that expected?

Yup, it's more than likely before the cpufreq and clock scaling drivers aren't upstream yet so we can't run at full speed and completely unrelated to the compiler or this bug

Comment 30 Tom Lane 2018-05-07 00:31:12 UTC
(In reply to Peter Robinson from comment #29)
> Yup, it's more than likely before the cpufreq and clock scaling drivers
> aren't upstream yet so we can't run at full speed and completely unrelated
> to the compiler or this bug

Confirmed, some other non-compile-bound tests are also circa 2x slower.

Do you know offhand of BZs I can watch for progress on those issues?

Comment 31 Peter Robinson 2018-05-07 07:06:11 UTC
> Do you know offhand of BZs I can watch for progress on those issues?

No, we don't have them, follow along in the upstream kernel/rpi community.