Bug 1058991 - gcc PCH bug causes segfaults on aarch64
Summary: gcc PCH bug causes segfaults on aarch64
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: gcc
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: ARM64, F-ExcludeArch-aarch64
TreeView+ depends on / blocked
 
Reported: 2014-01-28 23:14 UTC by Brendan Conoboy
Modified: 2014-02-04 15:39 UTC (History)
6 users (show)

Fixed In Version: gcc-4.8.2-14.fc21
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-02-04 15:39:28 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
define TRY_EMPTY_VM_SPACE on aarch64 (468 bytes, patch)
2014-01-30 00:17 UTC, Kyle McMartin
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
GNU Compiler Collection 60010 0 None None None Never

Description Brendan Conoboy 2014-01-28 23:14:28 UTC
Description of problem:

Building large packages such as java-1.7.0-openjdk and wxGTK generally results in an ICE


Version-Release number of selected component (if applicable):
This appears to happen with any gcc 4.8, from Fedora 19 through rawhide.  The kernel records a message like the following:

[362391.181971] cc1plus[12825]: unhandled level 3 translation fault (11) at 0x7fae518488, esr 0x92000007
[362391.191200] pgd = ffffffc342e62000
[362391.194682] [7fae518488] *pgd=00000041a9c66003, *pmd=00000042ded58003, *pte=0000000000000000

[362391.204835] Pid: 12825, comm:              cc1plus
[362391.209733] CPU: 7    Not tainted  (3.8.0-mustang_sw_1.08.12-beta_rc.jkkm4 #9)
[362391.217019] PC is at 0xd30ee0
[362391.220116] LR is at 0xc6f474
[362391.223166] pc : [<0000000000d30ee0>] lr : [<0000000000c6f474>] pstate: 60000000
[362391.230656] sp : 0000007ff6a2b700
[362391.234048] x29: 0000007ff6a2b700 x28: 0000000024636b68 
[362391.239491] x27: 0000000000000000 x26: 0000000000e271e8 
[362391.244904] x25: 0000000000000000 x24: 0000000024658650 
[362391.250346] x23: 0000007facf26000 x22: 0000000000000003 
[362391.255757] x21: 0000000000001d22 x20: 0000007fae518450 
[362391.261193] x19: 0000000000001d22 x18: 000000000000000f 
[362391.266605] x17: 0000000000001d22 x16: 0000000000000000 
[362391.272049] x15: 0000000024603650 x14: 0000000000000fe0 
[362391.277460] x13: 0000000024603650 x12: 0000007ff6a2b6d0 
[362391.282905] x11: 00000000fffffff0 x10: 0000000024603200 
[362391.288343] x9 : fefefefefeff736b x8 : 746c756166206e6f 
[362391.293753] x7 : 0000007ff6a2b6d0 x6 : 0000000024603643 
[362391.299190] x5 : 00000000246041e0 x4 : 000000000071f118 
[362391.304600] x3 : 0000000000001d22 x2 : 0000007fae518450 
[362391.310036] x1 : 0000000000001d22 x0 : 0000007fae518450 

Here are two example builds of wxGTK which failed:
http://arm.koji.fedoraproject.org/koji/taskinfo?taskID=2225239
...
g++: internal compiler error: Segmentation fault (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://bugzilla.redhat.com/bugzilla> for instructions.
make: *** [basedll_convauto.o] Error 4
...

http://arm.koji.fedoraproject.org/koji/taskinfo?taskID=2225210
...
g++ -c -o basedll_fs_arc.o -I./.pch/wxprec_basedll -D__WXGTK__     -DWXBUILDING      -I./src/regex  -DwxUSE_GUI=0 -DWXMAKINGDLL_BASE -DwxUSE_BASE=1 -fPIC -DPIC -D_FILE_OFFSET_BITS=64 -D_LARGE_FILES -I/builddir/build/BUILD/wxGTK-2.8.12/lib/wx/include/gtk2-unicode-release-2.8 -I./include -pthread -I/usr/include/gtk-2.0 -I/usr/lib64/gtk-2.0/include -I/usr/include/pango-1.0 -I/usr/include/atk-1.0 -I/usr/include/cairo -I/usr/include/pixman-1 -I/usr/include/libdrm -I/usr/include/libpng16 -I/usr/include/gdk-pixbuf-2.0 -I/usr/include/libpng16 -I/usr/include/pango-1.0 -I/usr/include/harfbuzz -I/usr/include/pango-1.0 -I/usr/include/freetype2 -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -pthread -I/usr/include/gstreamer-0.10 -I/usr/include/libxml2 -I/usr/include/gconf/2 -I/usr/include/dbus-1.0 -I/usr/lib64/dbus-1.0/include -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -DWX_PRECOMP -pthread -Wall -Wundef -Wno-ctor-dtor-privacy -g -O0 -pthread -I/usr/include/libgnomeprintui-2.2 -I/usr/include/libgnomeprint-2.2 -I/usr/include/libxml2 -I/usr/include/libgnomecanvas-2.0 -I/usr/include/gail-1.0 -I/usr/include/libart-2.0 -I/usr/include/gtk-2.0 -I/usr/lib64/gtk-2.0/include -I/usr/include/pango-1.0 -I/usr/include/atk-1.0 -I/usr/include/cairo -I/usr/include/pixman-1 -I/usr/include/libdrm -I/usr/include/libpng16 -I/usr/include/gdk-pixbuf-2.0 -I/usr/include/libpng16 -I/usr/include/pango-1.0 -I/usr/include/harfbuzz -I/usr/include/pango-1.0 -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/freetype2 -I/usr/include/SDL -D_GNU_SOURCE=1 -D_REENTRANT -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -fno-stack-protector -fno-strict-aliasing ./src/common/fs_arc.cpp
g++: internal compiler error: Segmentation fault (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://bugzilla.redhat.com/bugzilla> for instructions.
make: *** [basedll_fs_arc.o] Error 4
...

Note during these failed builds the value of /proc/sys/kernel/randomize_va_space is "2".  If set to 0 the build will succeed:

http://arm.koji.fedoraproject.org/koji/buildinfo?buildID=183839

How reproducible:
It happens every time, but not in the same place every time.

Steps to Reproduce:
1. Set /proc/sys/kernel/randomize_va_space to 2
2. Build wxGTK with rawhide
3. Boom

Actual results:

ICE

Expected results:

Successful build.

Additional info:

This isuse was originally discovered building java-1.7.0-openjdk, but wxGTK produces the error faster and doesn't pull Java into the picture.

Comment 1 Jakub Jelinek 2014-01-28 23:22:20 UTC
First of all, that looks like a kernel bug, what normal process doesn't shouldn't result in such messages.

Second, supposedly aarch64 should add it's own define to gcc/config/host-linux.c, but you really want to file/discuss this upstream, I have no idea what address would be appropriate for that, no idea what the memory layout on aarch64 is etc.
If it is added upstream, I can consider backporting it.

Comment 2 Kyle McMartin 2014-01-29 23:45:55 UTC
The kernel message is just the result of the SIGSEGV... it means we had a valid translation for the vaddr through two levels of the page table, but not the third... It'll probably not generate such descriptive fault messages in production when print-fatal-signals is off, but since it's such a new port, such messages are a bit instructive.

In any event, thanks for the hint at looking at host-linux.c, I think I've got a theory as to why this is occuring as a result of it!

Comment 3 Kyle McMartin 2014-01-30 00:17:45 UTC
Created attachment 857311 [details]
define TRY_EMPTY_VM_SPACE on aarch64

http://gcc.gnu.org/bugzilla//show_bug.cgi?id=45979
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14940

looks like the same issue as in these PRs...

I've attached a ``fix''. It looks like PCH is basically relying on our mmap being effectively a MAP_FIXED and the address being unused... on AArch64, we're mapping executables at 4MB, so the attempt to mmap at 0 would fail for two reasons (we also disallow mmap to the first page or so CONFIG_MMAP_MIN_ADDR.)

Anyway, things look hunky dory in my testing when using 0x100000000 as X86_64 and others do.

Comment 5 Peter Robinson 2014-02-04 15:39:28 UTC
Now patched locally and sent upstream


Note You need to log in before you can comment on or make changes to this bug.