Bug 1570571
Summary: | Segmentation Fault when trying to compile kernel 4.16.3-300.fc28.aarch64 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Pierre-Francois RENARD <pfrenard> | ||||||||
Component: | gcc | Assignee: | Jakub Jelinek <jakub> | ||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | unspecified | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 28 | CC: | airlied, aoliva, bskeggs, davejohansen, dmalcolm, ewk, fweimer, gary.buhrmaster, hdegoede, ichavero, itamar, jakub, jarodwilson, jglisse, john.j5live, jonathan, josef, jwakely, kernel-maint, labbott, law, linville, mchehab, mjg59, mpolacek, msebor, nickc, pbrobinson, steved, tgl | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | aarch64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | gcc-8.1.1-1.fc28 | Doc Type: | If docs needed, set a value | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2018-07-10 11:45:53 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Pierre-Francois RENARD
2018-04-23 09:18:00 UTC
Does this happen repeatedly? We haven't seen reports on our build servers and compiler segfaults are often a good indication of hardware problems. Yes I can reproduce it each time I try it. I also change the raspberry pi (first one 3B, second one 3B+) This is the same issue: /usr/src/kernels/4.16.3-300.fc28.aarch64# make HOSTCC scripts/basic/fixdep HOSTCC scripts/kconfig/conf.o YACC scripts/kconfig/zconf.tab.c LEX scripts/kconfig/zconf.lex.c HOSTCC scripts/kconfig/zconf.tab.o during RTL pass: loop2_invariant In file included from scripts/kconfig/zconf.tab.c:2487: scripts/kconfig/symbol.c: In function ‘sym_check_sym_deps’: scripts/kconfig/symbol.c:1261:1: internal compiler error: Segmentation fault } ^ Please submit a full bug report, with preprocessed source if appropriate. See <http://bugzilla.redhat.com/bugzilla> for instructions. Preprocessed source stored into /tmp/cc61KVj6.out file, please attach this to your bugreport. make[2]: *** [scripts/Makefile.host:107: scripts/kconfig/zconf.tab.o] Error 1 make[1]: *** [Makefile:514: silentoldconfig] Error 2 make: *** No rule to make target 'include/config/auto.conf', needed by 'include/config/kernel.release'. Stop. (In reply to RENARD from comment #2) > Yes I can reproduce it each time I try it. > I also change the raspberry pi (first one 3B, second one 3B+) > This is the same issue: How much swap have you got allocated? (In reply to Peter Robinson from comment #3) > (In reply to RENARD from comment #2) > > Yes I can reproduce it each time I try it. > > I also change the raspberry pi (first one 3B, second one 3B+) > > This is the same issue: > > How much swap have you got allocated? 720 MB, during the compilation process, cc1 is running and there is more than 600 MB a free memory ( and nearly all swap is free) This is a gcc bug. Moving to gcc. The command that fails is gcc -Wp,-MD,scripts/kconfig/.zconf.tab.o.d -Wall -Wmissing-prototypes -Wstrict-prototypes -O2 -fomit-frame-pointer -std=gnu89 -DCURSES_LOC="<curses.h>" -DLOCALE -Iscripts/kconfig -c -o scripts/kconfig/zconf.tab.o scripts/kconfig/zconf.tab.c Can we please get the preprocessed source file? Preprocessed source stored into /tmp/cchUArFl.out file, please attach this to your bugreport. Created attachment 1426060 [details]
Output of zconf.tab.o
Output of zconf.tab.o when run with:
$ gcc -E -Wp,-MD,scripts/kconfig/.zconf.tab.o.d -Wall -Wmissing-prototypes -Wstrict-prototypes -O2 -fomit-frame-pointer -std=gnu89 -DCURSES_LOC="<curses.h>" -DLOCALE -Iscripts/kconfig -c -o scripts/kconfig/zconf.tab.o scripts/kconfig/zconf.tab.c
And can you reproduce the ICE if you run the compiler on this file? gcc -Wall -Wmissing-prototypes -Wstrict-prototypes -O2 -fomit-frame-pointer -std=gnu89 -S -xc the_file_from_#c7 ? It compiles just fine for me. Neither in a cross-compiler I've tried first, nor on aarch64 native: rpm -q gcc; gcc -Wall -Wmissing-prototypes -Wstrict-prototypes -O2 -fomit-frame-pointer -std=gnu89 -S -xc rh1570571.i; echo $? gcc-8.0.1-0.20.fc28.aarch64 In file included from scripts/kconfig/zconf.tab.c:2485: scripts/kconfig/confdata.c: In function ‘conf_write’: scripts/kconfig/confdata.c:773:19: warning: ‘%s’ directive writing likely 7 or more bytes into a region of size between 1 and 4097 [-Wformat-overflow=] scripts/kconfig/confdata.c:773:19: note: assuming directive output of 7 bytes scripts/kconfig/confdata.c:773:2: note: ‘sprintf’ output 1 or more bytes (assuming 4104) into a destination of size 4097 scripts/kconfig/confdata.c:776:20: warning: ‘.tmpconfig.’ directive writing 11 bytes into a region of size between 1 and 4097 [-Wformat-overflow=] scripts/kconfig/confdata.c:776:3: note: ‘sprintf’ output between 13 and 4119 bytes into a destination of size 4097 0 (In reply to Jakub Jelinek from comment #9) > Neither in a cross-compiler I've tried first, nor on aarch64 native: So it works fine for me in a cross compiler and on a mustang (Cortex-A57 based) but the reporter has a RPi (Cortex-A53) and I can recreate it on a Pine64 (alos A53). Not sure if that makes a difference. (In reply to Jakub Jelinek from comment #8) > And can you reproduce the ICE if you run the compiler on this file? > gcc -Wall -Wmissing-prototypes -Wstrict-prototypes -O2 -fomit-frame-pointer > -std=gnu89 -S -xc the_file_from_#c7 > ? Yes (will attach the out file): $ gcc -Wall -Wmissing-prototypes -Wstrict-prototypes -O2 -fomit-frame-pointer -std=gnu89 -S -xc scripts/kconfig/zconf.tab.o during RTL pass: loop2_invariant In file included from scripts/kconfig/zconf.tab.c:2487: scripts/kconfig/symbol.c: In function ‘sym_check_sym_deps’: scripts/kconfig/symbol.c:1261:1: internal compiler error: Segmentation fault } ^ Please submit a full bug report, with preprocessed source if appropriate. See <http://bugzilla.redhat.com/bugzilla> for instructions. Preprocessed source stored into /tmp/ccHedFD8.out file, please attach this to your bugreport. Created attachment 1426086 [details] ccHedFD8.out from comment 10 attached output from compile test Created attachment 1426087 [details]
preprocessed_source
Latest trunk still doesn't crash for me even with -mcpu=cortex-a53 -mtune=cortex-a53. (In reply to Marek Polacek from comment #14) > Latest trunk still doesn't crash for me even with -mcpu=cortex-a53 > -mtune=cortex-a53. What hardware? As I mentioned above building on a Cortex-a57 and cross compiling on x86_64 works for me too, but I can recreate the issue on a Pine64. Latest as pushed to the mirrors: binutils-2.29.1-20.fc28.aarch64 libasan-8.0.1-0.20.fc28.aarch64 libstdc++-8.0.1-0.20.fc28.aarch64 libgomp-8.0.1-0.20.fc28.aarch64 libubsan-8.0.1-0.20.fc28.aarch64 gcc-8.0.1-0.20.fc28.aarch64 libatomic-8.0.1-0.20.fc28.aarch64 libgcc-8.0.1-0.20.fc28.aarch64 cpp-8.0.1-0.20.fc28.aarch64 So is there some Pine64 hw we can ssh in and see it? Still it would be very strange, generally gcc ICEs are easily reproduceable no matter on which exact hw you run it or if cross-compiling or not. Tried even compiling with --param ggc-min-expand=0 --param ggc-min-heapsize=0 and it succeeeded too, so is unlikely garbage collection related. > So is there some Pine64 hw we can ssh in and see it? Still it would be very
No idea, this one is sitting on my desk next to me, not sure if you're located in a office but if so I would bet someone would have a RPi3 lying around they could loan you with an aarch64 install on.
(In reply to Peter Robinson from comment #15) > (In reply to Marek Polacek from comment #14) > > Latest trunk still doesn't crash for me even with -mcpu=cortex-a53 > > -mtune=cortex-a53. > > What hardware? As I mentioned above building on a Cortex-a57 and cross > compiling on x86_64 works for me too, but I can recreate the issue on a > Pine64. That was Processor : AArch64 Processor rev 1 (aarch64) Features : fp asimd evtstrm CPU implementer : 0x50 CPU architecture: AArch64 CPU variant : 0x0 CPU part : 0x000 CPU revision : 1 Hardware : APM X-Gene Mustang board Ok, reproduced on Jonathan's Pine64, but it looks weird, any known CPU hw issues or kernel bugs? gdb --args /usr/libexec/gcc/aarch64-redhat-linux/8/cc1 -quiet -mlittle-endian -mabi=lp64 -O2 -Wall -Wmissing-prototypes -Wstrict-prototypes -std=gnu90 -fomit-frame-pointer /tmp/zconf.tab.i -o /tmp/zconf.tab.s2 cc1 binary has code like: 0x0000000000d2bffc <rtx_addr_can_trap_p_1(const_rtx, poly_int64, poly_int64, machine_mode, bool)+524>: adrp x6, 0x16c9000 <targetm+2584> 0x0000000000d2c000 <rtx_addr_can_trap_p_1(const_rtx, poly_int64, poly_int64, machine_mode, bool)+528>: stp x2, x3, [sp, #56] 0x0000000000d2c004 <rtx_addr_can_trap_p_1(const_rtx, poly_int64, poly_int64, machine_mode, bool)+532>: ldr x0, [x6, #496] 0x0000000000d2c008 <rtx_addr_can_trap_p_1(const_rtx, poly_int64, poly_int64, machine_mode, bool)+536>: str x4, [sp, #72] 0x0000000000d2c00c <rtx_addr_can_trap_p_1(const_rtx, poly_int64, poly_int64, machine_mode, bool)+540>: blr x0 0x0000000000d2c010 <rtx_addr_can_trap_p_1(const_rtx, poly_int64, poly_int64, machine_mode, bool)+544>: mov x19, x0 where 0x16c9000 + 496 is 0x16c91f0, &targetm.starting_frame_offset, and that contains 0x16c91f0 <targetm+3080>: 0x000000000124be50 unmodified all the time (the targetm object is in .data section, but contains lots of function pointers that are never modified). Now, if I put a breakpoint at *0x0000000000d2c00c and/or *0x0000000000d2c008, the first time $x0 contains the expected 0x000000000124be50 value, but randomly either the second time, or e.g. 8th time (or sometimes never) $x0 contains 0x16d74a0 value instead, which is not a address of a function, but rather an address into the .data section, inside of the aarch64_types_binop_qualifiers array. If this happens, $x6 is still correctly 0x16c9000 and *(long *)0x16c91f0 still contains 0x000000000124be50, just $x0 is incorrect. There are no branches to 0xd2c008 in the code. Now, if I put a breakpoint on *0x0000000000d2c004 too, I can never reproduce the problem, it always compiles it fine, which suggest wild branching (say indirect call through corruption) to 0xd2c008 is unlikely. The cc1 binary is single threaded position dependent binary, it installs a couple of signal handlers, but only for the fatal signals: toplev.c: signal (SIGSEGV, crash_signal); toplev.c: signal (SIGILL, crash_signal); toplev.c: signal (SIGBUS, crash_signal); toplev.c: signal (SIGABRT, crash_signal); toplev.c: signal (SIGIOT, crash_signal); toplev.c: signal (SIGFPE, crash_signal); The source code for the above snippet is: if (x == frame_pointer_rtx) { if (FRAME_GROWS_DOWNWARD) { high_bound = targetm.starting_frame_offset (); low_bound = high_bound - get_frame_size (); } else { low_bound = targetm.starting_frame_offset (); high_bound = low_bound + get_frame_size (); } } (the high_bound = targetm.starting_frame_offset () line). (In reply to Jakub Jelinek from comment #20) > Ok, reproduced on Jonathan's Pine64, but it looks weird, any known CPU hw > issues or kernel bugs? None I'm aware of, but probably better answered by ARM people. The cpu core revisions are the same across Pine64/RPi3/RPi3+ : Vendor ID: ARM Model: 4 Model name: Cortex-A53 Stepping: r0p4 Although the pine64 has a few more feature flags: RPI: Flags: fp asimd evtstrm crc32 cpuid Pine64: Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid I suspect https://bugzilla.redhat.com/show_bug.cgi?id=1562522 is a similar/related problem. We've also had a couple of other similar reports on IRC with building on the RPi I got a bunch of gcc segfaults with the same "during RTL pass: loop2_invariant" symptom while trying to compile Postgres on a Raspberry Pi 3B+, using current F28 aarch64. It is not 100% reproducible (in fact, sometimes gcc itself tells me that; other times, retrying the compile succeeds; but some of the .c files fail pretty repeatably). It smells like a kernel problem to me, especially since I also see noticeable flakiness at the GNOME/UI level --- for instance, it took me several reboots to get the machine's networking configured :-(. I initially suspected I had a flaky SD card or some such, but Raspbian works perfectly on the same hardware. FWIW, the gcc segfault, at least for my test case, seems to be gone when using gcc 8.1 from koji gcc 8.1 build 1078521 https://koji.fedoraproject.org/koji/buildinfo?buildID=1078521 Yes, I got the vast majority of the way through the same kernel build that failed basically straight up previously, core kernel build, storage ran out most of the way through the modules build, but it's working. I'd be interested to know the upstream fix. Yeah, 8.1.1-1 works for me too. I do notice that it seems to compile over 2x slower than the 6.3.0 armv6 gcc included with the Raspbian build I was comparing to ... is that expected? Postgres build on Raspbian (Debian 9.4): $ time make -j4 -s ... All of PostgreSQL successfully made. Ready to install. real 11m9.048s user 35m7.695s sys 1m27.023s Identical test case on same hardware, F28 + gcc 8.1.1: $ time make -j4 -s ... All of PostgreSQL successfully made. Ready to install. real 24m11.778s user 70m46.477s sys 4m38.827s There was no fix for this, my guess is that the gcc binary just changed enough that this kernel or hw bug is latent. > I do notice that it seems to compile over 2x slower than the 6.3.0 armv6 gcc
> included with the Raspbian build I was comparing to ... is that expected?
Yup, it's more than likely before the cpufreq and clock scaling drivers aren't upstream yet so we can't run at full speed and completely unrelated to the compiler or this bug
(In reply to Peter Robinson from comment #29) > Yup, it's more than likely before the cpufreq and clock scaling drivers > aren't upstream yet so we can't run at full speed and completely unrelated > to the compiler or this bug Confirmed, some other non-compile-bound tests are also circa 2x slower. Do you know offhand of BZs I can watch for progress on those issues?
> Do you know offhand of BZs I can watch for progress on those issues?
No, we don't have them, follow along in the upstream kernel/rpi community.
|