Bug 1721553

Summary: "one or more PCH files were found, but they were invalid"
Product: Red Hat Enterprise Linux 8 Reporter: Vít Ondruch <vondruch>
Component: rubyAssignee: ruby maint <ruby-maint>
Status: CLOSED WONTFIX QA Contact: RHEL CS Apps Subsystem QE <rhel-cs-apps-subsystem-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.0CC: ahajkova, anto.trande, fweimer, gsgatlin, jakub, jaruga, jhughes, law, nerijus, ohudlick, pvalena, samuel-rhbugs, sergio, smakarov
Target Milestone: rcKeywords: Triaged
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-01 07:41:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vít Ondruch 2019-06-18 14:24:08 UTC
Description of problem:
Ruby 2.6 introduced JIT which is using GCC on background. However, I am facing "one or more PCH files were found, but they were invalid" error trying to run the build locally in Mock. Here is the reproducer using Ruby test suite:

~~~
$ cd /builddir/build/BUILD/ruby-2.6.3/

$ make test-all TESTS=test/ruby/test_rubyvm_mjit.rb 
Run options: "--ruby=./miniruby -I./lib -I. -I.ext/common  ./tool/runruby.rb --extout=.ext  -- --disable-gems" --excludes-dir=./test/excludes --name=!/memory_leak/

# Running tests:

[1/4] TestRubyVMMJIT#test_pause = 0.24 s
  1) Failure:
TestRubyVMMJIT#test_pause [/builddir/build/BUILD/ruby-2.6.3/test/ruby/test_rubyvm_mjit.rb:32]:
unexpected stdout:
```
truefalsefalse```

stderr:
```
/tmp/_ruby_mjit_p712u0.c:1:37: error: one or more PCH files were found, but they were invalid
 #include "/tmp/_ruby_mjit_hp712u0.h"
                                     ^
compilation terminated due to -Wfatal-errors.
/tmp/_ruby_mjit_p712u1.c:1:37: error: one or more PCH files were found, but they were invalid
 #include "/tmp/_ruby_mjit_hp712u0.h"
                                     ^
compilation terminated due to -Wfatal-errors.
/tmp/_ruby_mjit_p712u2.c:1:37: error: one or more PCH files were found, but they were invalid
 #include "/tmp/_ruby_mjit_hp712u0.h"
                                     ^
compilation terminated due to -Wfatal-errors.
/tmp/_ruby_mjit_p712u3.c:1:37: error: one or more PCH files were found, but they were invalid
 #include "/tmp/_ruby_mjit_hp712u0.h"
                                     ^
compilation terminated due to -Wfatal-errors.
/tmp/_ruby_mjit_p712u4.c:1:37: error: one or more PCH files were found, but they were invalid
 #include "/tmp/_ruby_mjit_hp712u0.h"
                                     ^
compilation terminated due to -Wfatal-errors.
Successful MJIT finish
```.
<5> expected but was
<0>.

Finished tests in 0.937667s, 4.2659 tests/s, 24.5290 assertions/s.  
4 tests, 23 assertions, 1 failures, 0 errors, 0 skips

ruby -v: ruby 2.6.3p62 (2019-04-16 revision 67580) [x86_64-linux]
make: *** [uncommon.mk:761: yes-test-all] Error 1
~~~

The easiest reproducer I was able to come up with is:

~~~
$ rpm -q ruby-devel
ruby-devel-2.6.3-105.module+el8.1.0+3379+255f1bbf.x86_64

$ rm /tmp/*

$ ls /tmp/

$ /usr/bin/gcc -m64 -w -Wfatal-errors -fPIC -shared -w -pipe -O3 -nodefaultlibs -nostdlib -o /tmp/_ruby_mjit_hp606u0.h.gch /usr/include/rb_mjit_min_header-2.6.3.h

$ echo '#include "/tmp/_ruby_mjit_hp606u0.h"' > /tmp/test.c

$ /usr/bin/gcc /usr/bin/gcc -m64 -w -Wfatal-errors -fPIC -shared -w -pipe -O3 -o /tmp/test.o /tmp/test.c -c -lgcc -m64 -Wl,-z,relro -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -nostartfiles -nodefaultlibs -nostdlib
/tmp/test.c:1:37: error: one or more PCH files were found, but they were invalid
 #include "/tmp/_ruby_mjit_hp606u0.h"
                                     ^
compilation terminated due to -Wfatal-errors.
~~~

Please note that all the compilation flags are as submitted by the Ruby JIT. Without "-Wfatal-errors", the output is not any better:

~~~
$ /usr/bin/gcc /usr/bin/gcc -m64 -w -fPIC -shared -w -pipe -O3 -o /tmp/test.o /tmp/test.c -c -lgcc -m64 -Wl,-z,relro -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -nostartfiles -nodefaultlibs -nostdlib
/tmp/test.c:1:37: error: one or more PCH files were found, but they were invalid
 #include "/tmp/_ruby_mjit_hp606u0.h"
                                     ^
/tmp/test.c:1:37: error: use -Winvalid-pch for more information
/tmp/test.c:1:10: fatal error: /tmp/_ruby_mjit_hp606u0.h: No such file or directory
 #include "/tmp/_ruby_mjit_hp606u0.h"
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
~~~

Trying to use the "-Winfalid-pch" as suggested:

~~~
$ /usr/bin/gcc -m64 -w -Winvalid-pch -fPIC -shared -w -pipe -O3 -o /tmp/test.o /tmp/test.c -c -lgcc -m64 -Wl,-z,relro -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -nostartfiles -nodefaultlibs -nostdlib
/tmp/test.c:1:37: error: one or more PCH files were found, but they were invalid
 #include "/tmp/_ruby_mjit_hp606u0.h"
                                     ^
/tmp/test.c:1:10: fatal error: /tmp/_ruby_mjit_hp606u0.h: No such file or directory
 #include "/tmp/_ruby_mjit_hp606u0.h"
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
~~~

That does not help. I would expect something more meaningful.

But as it turns out, this appears to resolve the issues:

~~~
$ touch /tmp/_ruby_mjit_hp606u0.h

$ /usr/bin/gcc -m64 -w -Winvalid-pch -fPIC -shared -w -pipe -O3 -o /tmp/test.o /tmp/test.c -c -lgcc -m64 -Wl,-z,relro -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -nostartfiles -nodefaultlibs -nostdlib
~~~

Ultimately, using DBG, I think the issues is here:

~~~
Run till exit from #0  cpp_get_options (pfile=pfile@entry=0x555557329790) at ../../libcpp/directives.c:2592
0x000055555593fc5d in c_common_valid_pch (pfile=0x555557329790, name=0x55555733ef70 "/tmp/_ruby_mjit_hp98u0.h.gch", fd=3) at ../../gcc/c-family/c-pch.c:298
298	      if (cpp_get_options (pfile)->warn_invalid_pch)
Value returned is $1 = (cpp_options *) 0x555557329be8
(gdb) list
293	     pointers, but no support for that exists at present.
294	     Since we have the same executable, it should only be necessary to
295	     check one function.  */
296	  if (v.pch_init != &pch_init)
297	    {
298	      if (cpp_get_options (pfile)->warn_invalid_pch)
299		cpp_error (pfile, CPP_DL_WARNING,
300			   "%s: had text segment at different address", name);
301	      return 2;
302	    }
(gdb) c
Continuing.
/tmp/test.c:1:36: error: one or more PCH files were found, but they were invalid
 #include "/tmp/_ruby_mjit_hp98u0.h"
                                    ^
compilation terminated due to -Wfatal-errors.
[Inferior 2 (process 362) exited with code 01]
~~~

But I have no idea what it actually means and how it happens.

Please note, that this just works without any issues on Fedora Rawhide with gcc 9.1.1.


Version-Release number of selected component (if applicable):
$ rpm -q gcc
gcc-8.3.1-4.4.el8.x86_64


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:
The RUBY JIT tests are failing in mock with "one or more PCH files were found, but they were invalid" errors.


Expected results:
The RUBY JIT tests are passing.


Additional info:
There are two additional things:

1) The test suite fails testing locally in mock, while it passes just fine in Brew. I really don't understand what could be the difference, because all the packages versions are of the same or very similar NVR. Could be the mock version culprit? That does not seem plausible, but just in case, this is my Mock version:

~~~
$ rpm -q mock
mock-1.4.16-1.fc31.noarch
~~~

2) From the GCC code:

https://github.com/gcc-mirror/gcc/blob/gcc-8_3_0-release/gcc/c-family/c-pch.c#L299

I'd expect that the "-Winvalid-pch" will provide more insights, i.e. precisely the message from the "cpp_error" call, but the message is not show. So why is the "-Winvalid-pch" suggested? Where is the error message lost?

Comment 1 Pavel Valena 2019-06-20 16:14:36 UTC
I can confirm this issue in non-mock enviroment:

```
[root@ci-vm-10-0-137-26 1mt]# rm -rf /tmp/*
[root@ci-vm-10-0-137-26 1mt]# /usr/bin/gcc -m64 -w -Wfatal-errors -fPIC -shared -w -pipe -O3 -nodefaultlibs -nostdlib -o /tmp/_ruby_mjit_hp606u0.h.gch /usr/include/rb_mjit_min_header-2.6.3.h
[root@ci-vm-10-0-137-26 1mt]# echo '#include "/tmp/_ruby_mjit_hp606u0.h"' > /tmp/test.c
[root@ci-vm-10-0-137-26 1mt]# /usr/bin/gcc /usr/bin/gcc -m64 -w -Wfatal-errors -fPIC -shared -w -pipe -O3 -o /tmp/test.o /tmp/test.c -c -lgcc -m64 -Wl,-z,relro -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -nostartfiles -nodefaultlibs -nostdlib
/tmp/test.c:1:37: error: one or more PCH files were found, but they were invalid
 #include "/tmp/_ruby_mjit_hp606u0.h"
                                     ^
compilation terminated due to -Wfatal-errors.
[root@ci-vm-10-0-137-26 1mt]# uname -a
Linux ci-vm-10-0-137-26.hosted.upshift.rdu2.redhat.com 4.18.0-107.el8.x86_64 #1 SMP Fri Jun 14 13:46:34 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
[root@ci-vm-10-0-137-26 1mt]# rpm -q gcc
gcc-8.3.1-4.4.el8.x86_64

```

And also the 'fix':
```
[root@ci-vm-10-0-137-26 1mt]# touch /tmp/_ruby_mjit_hp606u0.h
[root@ci-vm-10-0-137-26 1mt]# /usr/bin/gcc -m64 -w -Winvalid-pch -fPIC -shared -w -pipe -O3 -o /tmp/test.o /tmp/test.c -c -lgcc -m64 -Wl,-z,relro -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -nostartfiles -nodefaultlibs -nostdlib
[root@ci-vm-10-0-137-26 1mt]# 

```

Comment 2 Jakub Jelinek 2019-06-20 16:25:05 UTC
Perhaps if you don't use -w (and maybe add that -Winvalid-pch), it will actually tell you what was wrong with the PCH file.

Comment 3 Vít Ondruch 2019-06-24 12:44:05 UTC
(In reply to Jakub Jelinek from comment #2)
> Perhaps if you don't use -w (and maybe add that -Winvalid-pch), it will
> actually tell you what was wrong with the PCH file.

Ah, so removing all the "-w" options helped to get the output from the source location I pointed in my initial comment:

~~~
$ /usr/bin/gcc -m64 -Winvalid-pch -fPIC -shared -pipe -O3 -o /tmp/test.o /tmp/test.c -c -lgcc -m64 -Wl,-z,relro -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -nostartfiles -nodefaultlibs -nostdlib
/tmp/test.c:1:37: warning: /tmp/_ruby_mjit_hp606u0.h.gch: had text segment at different address
 #include "/tmp/_ruby_mjit_hp606u0.h"
                                     ^
/tmp/test.c:1:37: error: one or more PCH files were found, but they were invalid
/tmp/test.c:1:10: fatal error: /tmp/_ruby_mjit_hp606u0.h: No such file or directory
 #include "/tmp/_ruby_mjit_hp606u0.h"
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
~~~

Thx for the tip. Nevertheless, it does not help (to me) to understand what is the issue actually.

Comment 4 Jakub Jelinek 2019-06-24 12:55:04 UTC
The "had text segment at different address" error is about having the PCH file generated with a compiler that is different from the compiler that is trying to read it,
the test checks if a function pointer saved in the PCH file matches the corresponding function pointer in the cc1 or cc1plus binary.  If it is different, any saved function pointers would need relocation and gcc isn't prepared to do that.
So, the possibilities are either that the compiler is built with PIE (but that is not the case of gcc in the distro), or say that cc1plus was used to write it and cc1 to read it, or vice versa, or slightly different gcc revision was used to save it vs. read it etc.
The commands e.g. in #c1 can't be trusted, /usr/bin/gcc /usr/bin/gcc ... would surely error out.
So e.g. use -v on each command to see really which binary has been used to save it and restore it, and make sure you don't ship the *.gch files in packages or if you upgrade compiler you remove any of them you have around.

Comment 5 Vít Ondruch 2019-06-24 13:12:02 UTC
(In reply to Jakub Jelinek from comment #4)
> The "had text segment at different address" error is about having the PCH
> file generated with a compiler that is different from the compiler that is
> trying to read it,

That would be much better candidate for the explanation of the warning the what is currently says.

> So e.g. use -v on each command to see really which binary has been used to
> save it and restore it,

So this is once again the initial reproducer, this time with "-v" option:

~~~
$ rm /tmp/*

$ echo '#include "/tmp/_ruby_mjit_hp606u0.h"' > /tmp/test.c

$ /usr/bin/gcc -m64 -w -Wfatal-errors -fPIC -shared -w -pipe -O3 -nodefaultlibs -nostdlib -o /tmp/_ruby_mjit_hp606u0.h.gch /builddir/build/BUILD/ruby-2.6.3/rb_mjit_header.h -v
Using built-in specs.
COLLECT_GCC=/usr/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/8/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl --disable-libmpx --enable-offload-targets=nvptx-none --without-cuda-driver --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 8.3.1 20190507 (Red Hat 8.3.1-4) (GCC) 
COLLECT_GCC_OPTIONS='-m64' '-w' '-Wfatal-errors' '-fPIC' '-shared' '-w' '-pipe' '-O3' '-nodefaultlibs' '-nostdlib' '-o' '/tmp/_ruby_mjit_hp606u0.h.gch' '-v' '-mtune=generic' '-march=x86-64'
 /usr/libexec/gcc/x86_64-redhat-linux/8/cc1 -quiet -v /builddir/build/BUILD/ruby-2.6.3/rb_mjit_header.h -quiet -dumpbase rb_mjit_header.h -m64 -mtune=generic -march=x86-64 -auxbase rb_mjit_header -O3 -Wfatal-errors -w -w -version -fPIC -o /tmp/ccpLGsa0.s --output-pch= /tmp/_ruby_mjit_hp606u0.h.gch
GNU C17 (GCC) version 8.3.1 20190507 (Red Hat 8.3.1-4) (x86_64-redhat-linux)
	compiled by GNU C version 8.3.1 20190507 (Red Hat 8.3.1-4), GMP version 6.1.2, MPFR version 3.1.6-p2, MPC version 1.0.2, isl version isl-0.16.1-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring nonexistent directory "/usr/lib/gcc/x86_64-redhat-linux/8/include-fixed"
ignoring nonexistent directory "/usr/lib/gcc/x86_64-redhat-linux/8/../../../../x86_64-redhat-linux/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/lib/gcc/x86_64-redhat-linux/8/include
 /usr/local/include
 /usr/include
End of search list.
GNU C17 (GCC) version 8.3.1 20190507 (Red Hat 8.3.1-4) (x86_64-redhat-linux)
	compiled by GNU C version 8.3.1 20190507 (Red Hat 8.3.1-4), GMP version 6.1.2, MPFR version 3.1.6-p2, MPC version 1.0.2, isl version isl-0.16.1-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 14815f09a6b4b1c3695d30c51b76fd95

$ /usr/bin/gcc -m64 -Winvalid-pch -fPIC -shared -pipe -O3 -o /tmp/test.o /tmp/test.c -c -lgcc -m64 -Wl,-z,relro -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -nostartfiles -nodefaultlibs -nostdlib -v
Using built-in specs.
Reading specs from /usr/lib/rpm/redhat/redhat-hardened-ld
COLLECT_GCC=/usr/bin/gcc
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl --disable-libmpx --enable-offload-targets=nvptx-none --without-cuda-driver --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 8.3.1 20190507 (Red Hat 8.3.1-4) (GCC) 
COLLECT_GCC_OPTIONS='-Winvalid-pch' '-fPIC' '-shared' '-pipe' '-O3' '-o' '/tmp/test.o' '-c' '-m64' '-specs=/usr/lib/rpm/redhat/redhat-hardened-ld' '-nostartfiles' '-nodefaultlibs' '-nostdlib' '-v' '-mtune=generic' '-march=x86-64'
 /usr/libexec/gcc/x86_64-redhat-linux/8/cc1 -quiet -v /tmp/test.c -quiet -dumpbase test.c -m64 -mtune=generic -march=x86-64 -auxbase-strip /tmp/test.o -O3 -Winvalid-pch -version -fPIC -o - |
 as -v --64 -o /tmp/test.o
GNU assembler version 2.30 (x86_64-redhat-linux) using BFD version version 2.30-57.el8
GNU C17 (GCC) version 8.3.1 20190507 (Red Hat 8.3.1-4) (x86_64-redhat-linux)
	compiled by GNU C version 8.3.1 20190507 (Red Hat 8.3.1-4), GMP version 6.1.2, MPFR version 3.1.6-p2, MPC version 1.0.2, isl version isl-0.16.1-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring nonexistent directory "/usr/lib/gcc/x86_64-redhat-linux/8/include-fixed"
ignoring nonexistent directory "/usr/lib/gcc/x86_64-redhat-linux/8/../../../../x86_64-redhat-linux/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/lib/gcc/x86_64-redhat-linux/8/include
 /usr/local/include
 /usr/include
End of search list.
GNU C17 (GCC) version 8.3.1 20190507 (Red Hat 8.3.1-4) (x86_64-redhat-linux)
	compiled by GNU C version 8.3.1 20190507 (Red Hat 8.3.1-4), GMP version 6.1.2, MPFR version 3.1.6-p2, MPC version 1.0.2, isl version isl-0.16.1-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 14815f09a6b4b1c3695d30c51b76fd95
/tmp/test.c:1:37: warning: /tmp/_ruby_mjit_hp606u0.h.gch: had text segment at different address
 #include "/tmp/_ruby_mjit_hp606u0.h"
                                     ^
/tmp/test.c:1:37: error: one or more PCH files were found, but they were invalid
/tmp/test.c:1:10: fatal error: /tmp/_ruby_mjit_hp606u0.h: No such file or directory
 #include "/tmp/_ruby_mjit_hp606u0.h"
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
~~~

The header precompilation as well as the compilation appears to use "/usr/libexec/gcc/x86_64-redhat-linux/8/cc1".

> and make sure you don't ship the *.gch files in packages or if you upgrade compiler you remove any of them you have around.

As far as I can tell, there is only one *.gch file on the system and that is the one file generated in the first step:

~~~
# find / -name *.gch
/tmp/_ruby_mjit_hp606u0.h.gch
~~~

Comment 6 Vít Ondruch 2019-06-24 13:19:58 UTC
There used to be some similarly looking Ruby issue using Gentoo GCC reported here:

https://bugs.ruby-lang.org/issues/15513

And here is the fix in Ruby:

https://bugs.ruby-lang.org/projects/ruby-trunk/repository/git/revisions/ec3cdb34ce6f19be7a6d887357fd1763f97992d1/diff

Which refers to this bug:

https://gitweb.gentoo.org/proj/gcc-patches.git/tree/7.3.0/gentoo/13_all_default-ssp-fix.patch

Not sure if this might be helpful.

Comment 10 Florian Weimer 2019-07-04 16:23:33 UTC
(In reply to Vít Ondruch from comment #0)
> 1) The test suite fails testing locally in mock, while it passes just fine
> in Brew. I really don't understand what could be the difference, because all
> the packages versions are of the same or very similar NVR. Could be the mock
> version culprit?

No, this happens because the builders disable ASLR, via something like “setarch -R”.  This means that the mapping addresses are consistent across a single build. which is sufficient to get the Ruby test suite to pass.

Comment 11 Marek Polacek 2019-07-11 18:50:18 UTC
Hardening was added by removing %undefine _hardened_build because of #1624114.  So it was an intentional change.  I never liked it but apparently the trend was to harden everything.  :/

Comment 12 Pavel Valena 2019-10-10 10:55:32 UTC
Not sure whether it helps, but I've also encountered the issue when building for epel-8-x86_64 in my COPR repo:

https://copr.fedorainfracloud.org/coprs/pvalena/ruby/build/1051665/

Comment 13 Gary Gatling 2019-11-17 16:01:43 UTC
I am also getting this error when trying to compile MAME (https://www.mamedev.org/) on RHEL 8. I do not see the error when using devtoolset-7 on RHEL 7 or when compiling on fedora. It seems to be a gcc issue on RHEL 8.

https://github.com/mamedev/mame/issues/5914

Comment 14 Johnny Hughes 2019-11-26 17:00:06 UTC
we are also getting this error trying to compile ruby for CentOS 8.1.1911

https://koji.mbox.centos.org/pkgs/work/tasks/9706/69706/build.log

Comment 20 Sergio Basto 2020-02-02 20:26:43 UTC
Hi,
what is an PCH file ? 

I see this on copr with epel8  [1] 

[1]
cc1plus: warning: /builddir/build/BUILD/VirtualBox-6.1.2/obj/obj/VBoxAPIWrap/pch/VBoxAPIWrap-precomp_gcc.h.gch: had text segment at different address
cc1plus: error: one or more PCH files were found, but they were invalid
cc1plus: fatal error: /builddir/build/BUILD/VirtualBox-6.1.2/obj/obj/VBoxAPIWrap/pch/VBoxAPIWrap-precomp_gcc.h: No such file or directory


https://copr-be.cloud.fedoraproject.org/results/sergiomb/vboxfor23/epel-8-x86_64/01217024-VirtualBox/

Comment 21 Vít Ondruch 2020-02-12 13:58:13 UTC
(In reply to Sergio Monteiro Basto from comment #20)
> what is an PCH file ? 

pre-compiled headers

Comment 22 Vít Ondruch 2020-02-12 14:07:03 UTC
(In reply to Marek Polacek from comment #11)
> Hardening was added by removing %undefine _hardened_build because of
> #1624114.

IMHO, the ticket should have been CLOSED WONTFIX. The only reason for hardening GCC in that ticket is "others are doing it, so do it as well", IOW it was just "trend" as you call it. I don't see any discussion why it should not be done. All the scan failures should be considered false positives for GCC.

Also, I don't quite understand why GCC bothers with the address so much.

Comment 23 Jakub Jelinek 2020-02-12 14:17:00 UTC
(In reply to Vít Ondruch from comment #22)
> Also, I don't quite understand why GCC bothers with the address so much.

PCH is essentially a GCC memory dump and some objects maintained in the GC (and thus dumped) do contain function or variable addresses.  With ASLR they would need to be relocated to the different addresses, but that is something the GCC PCH doesn't do ATM (it does some relocation processing only on pointers to GC objects, as that is what is tracked in the GTY infrastructure.  The function/variable addresses aren't tracked there and it is hard to guess if a particular value in memory is a function/variable address, or just some value that happens to compare equal to some address.

Comment 27 Vít Ondruch 2020-02-12 15:42:08 UTC
(In reply to Jakub Jelinek from comment #23)
> With ASLR they would need to be relocated to the different addresses, but that is something
> the GCC PCH doesn't do ATM.

So is this something GCC upstream is considering? I don't know anything about GCC and I might be naive, but it sounds to me as this would be the right solution.

Comment 28 Jakub Jelinek 2020-02-12 15:46:13 UTC
No.  As I said, nothing tracks in the GC type descriptions anything that could be usable to find out if it is something that is a function or variable pointer in the binary.

Comment 29 Jeff Law 2020-02-12 18:30:02 UTC
Folks, we need to keep in mind that while historically GCC hasn't been an attack surface, things like common criteria certifications may require us to enable hardening options, even in cases where there's a measurable cost.  Additionally, as folks are moving to using GCC as a JIT, it suddenly becomes a much more interesting attack vector.

There's a meeting on the 21st to discuss this across the key teams (Ruby, Security & GCC).  Let's not make any decisions here, but instead wait until that meeting.  If you need an invite, I'm sure we can get you one.

Comment 30 Sergio Basto 2020-02-12 22:13:33 UTC
(In reply to Vít Ondruch from comment #22)
> (In reply to Marek Polacek from comment #11)
> > Hardening was added by removing %undefine _hardened_build because of
> > #1624114.
> 
> IMHO, the ticket should have been CLOSED WONTFIX. The only reason for
> hardening GCC in that ticket is "others are doing it, so do it as well", IOW
> it was just "trend" as you call it. I don't see any discussion why it should
> not be done. All the scan failures should be considered false positives for
> GCC.
> 
> Also, I don't quite understand why GCC bothers with the address so much.

but how I build my package on EPEL 8 if [1] ? 

[1]
cc1plus: error: one or more PCH files were found, but they were invalid

Comment 31 Gary Gatling 2020-02-26 15:07:16 UTC
Just curious if anything was decided at meeting on 21st.

Comment 32 Vít Ondruch 2020-02-26 15:54:58 UTC
(In reply to Gary Gatling from comment #31)
> Just curious if anything was decided at meeting on 21st.

No. The meeting was postponed to the 28th.

Comment 33 Jun Aruga 2020-02-27 17:13:04 UTC
As a reference, I created the reproducer scripts with GitHub, Travis CI, and Fedora/RHEL(UBI 7/8) containers.

https://github.com/junaruga/ruby-jit-test
https://travis-ci.org/junaruga/ruby-jit-test/builds/655863378

* Ruby 2.7 (rpms/ruby) on Fedora rawhide: OK (gcc-10.0.1-0.8.fc33.x86_64, redhat-rpm-config-153-1.fc33.noarch)
* Ruby 2.6 (rpms/ruby) on Fedora 31: OK (gcc-9.2.1-1.fc31.x86_64, redhat-rpm-config-142-1.fc31.noarch)
* Ruby 2.6 (rpms/ruby) on Fedora 30: OK (gcc-9.2.1-1.fc30.x86_64, redhat-rpm-config-132-1.fc30.noarch)
* RHSCL rh-ruby26 on RHEL 7.7: OK (gcc-4.8.5-39.el7.x86_64, redhat-rpm-config-9.1.0-88.el7.noarch)
* Module Ruby 2.6 on RHEL 8.1: ERROR (gcc-8.3.1-4.5.el8.x86_64, redhat-rpm-config-120-1.el8.noarch)

Here is the easily reproducing command on your environment.

```
$ sudo yum -y install gcc redhat-rpm-config ruby-devel
$ TMP="$(pwd)" ruby --disable-gems --jit-verbose=2 --jit-save-temps --jit-min-calls=1 --jit-wait -e '1.times { puts "Hello" }'
```

Ref: https://travis-ci.org/junaruga/ruby-jit-test/jobs/655863383#L672

Comment 34 Jeff Law 2020-03-13 16:01:57 UTC
So here's the promised summary of the meeting from Feb 28th.


To recap the technical issue.  Ruby has recently enabled a JIT mode which
ultimately calls "gcc" under the hood.  Pre-compiled headers (PCH) are used in
this context to provide improved compile-time performance.  This is failing in
RHEL 8 as shown in this BZ.

The failure in RHEL 8 is due to GCC being compiled as a Position Independent
Executable (PIE).  PIE is fundamentally incompatible with GCC's PCH
implementation.  GCC's PCH implementation is essentially a memory dump after
reading the header files that can be read back in to restore the compiler's
state.  PIE scrambles addresses and as a result nothing from the memory dump is
where it's expected to be when GCC reads the PCH back in.

PIE for all executables was introduced as part of an across-the-board effort to
improve our security stance for RHEL 8.  It was initially driven by common
criteria, but it's not solely for common criteria.

So the first question we needed to answer was whether or not we could turn off
PIE for GCC itself.  How would that affect our common criteria certification and
ultimately our ability to sell into government contracts.  Mark Thacker (product
security) and Brian Goholler (RHEL product management) made it clear that PIE 
was critical for RHEL 8, not just for common criteria, but as a general selling
point to customers.  They could not in good conscience support disabling PIE for
GCC.

That essentially pushed the discussion towards changing Ruby.  There's 3 options.
One to simply not use a JIT at all in Ruby.  Continue using GCC as a JIT, but
without PCH support and finally using Clang/LLVM as a JIT for Ruby.  It seemed
that Vit should own making a decision for Ruby in RHEL rather than the tools
team.

Vit indicated that not using the JIT at all could be controversial and
potentially put Red Hat in a poor position messaging-wise, plus concerns about
the performance.

We all agreed that disabling PCH, but using GCC's JIT would work, but it would be
significantly slower, perhaps slow enough that the no-JIT option would be
preferred.

Vlad and Vit discussed the Clang/LLVM JIT implementation, which is significantly
different than GCC.  The Clang/LLVM approach (pickle the abstract syntax tree)
does not suffer from the same fundamental problems as the dump/restore of memory
approach that GCC uses.  Additionally they indicated that it's likely more
upstream developers are using Clang/LLVM as a JIT rather than GCC.

Brian chimed in with a concern WRT container sizes that would arise from using a
JIT in Ruby.  It's a real concern, one that Vit and the Ruby team will have to
evaluate with product management.

Vit was unaware of the general direction the tools team is taking WRT Clang/LLVM
going forward.  In simplest terms we want to start removing the roadblocks to
using Clang/LLVM -- the choice of compiler technology should flow from the
upstream project.  This direction is not something we have advertised yet, but
the situation with Ruby is a good reason to start advertising that change in
direction.  Josh expressed concerns with how the proposal would go over in Fedora
and the desire to avoid crazy outcomes.  Again, the core idea should be that
compiler selection should really be driven by upstreams.  Vit would (quite
reasonably) like to see us start that discussion with FESCO or on the Fedora
devel list -- it's a very reasonable ask.

Mark and Brian both expressed concerns WRT security feature parity -- it's
something we definitely have to continue to push on.  My primary concern has been
getting to parity between LLVM/Clang and GCC, but they're looking at Go & Rust as
well.  They made it clear that RHEL projects need to continue to follow RHEL
security policies regardless of the compiler technologies used by those projects.

Separately from the Friday meeting the tools team discussed PCH this week as
well.  We believe there's a security issue that needs to be addressed around PCH.
Namely that a malicious PCH could be created to gain control of GCC -- think
about the embedded function pointers in the memory dump.  An attacker could
change those in fun an interesting ways.  We are considering turning off the PCH
reader for RHEL GCC to avoid the security vulnerability and we are also considering
turning off PCH generation since any PCH generated would be unusable due to PIE.
Jeff owns opening a BZ for that issue.

Accordingly, I'm transferring this BZ to the Ruby team.

Comment 35 Jun Aruga 2020-03-17 17:02:40 UTC
Jeff, thank you for the summary. Very helpful.

> We all agreed that disabling PCH, but using GCC's JIT would work, but it would be
significantly slower, perhaps slow enough that the no-JIT option would be
preferred.

I agree with disabling PCH. And disabling PCH for the security and the slower JIT could be documented in the RHEL 8 release note - Ruby section after I will have done this task.

> Vit indicated that not using the JIT at all could be controversial and
potentially put Red Hat in a poor position messaging-wise, plus concerns about
the performance.

I agree with the potential risk. In other words, I assume the customers might chose the security than the performance, when they have to chose one of them. When they want a performance in the Ruby application, just implementing the part with the Ruby C extension, or separated C, C++, Go-lang program could be right way in terms of the application design.

> Additionally they indicated that it's likely more
upstream developers are using Clang/LLVM as a JIT rather than GCC.

Could you tell us the information source of "it's likely more upstream developers are using Clang/LLVM as a JIT rather than GCC."? I did not know the fact. Seeing the Ruby project's Travis CI, gcc cases are more tested than Clang/LLVM.

Comment 36 Vít Ondruch 2020-03-18 08:34:01 UTC
I just opened Ruby upstream ticket to discuss this with upstream:

https://bugs.ruby-lang.org/issues/16694

Comment 37 Sergio Basto 2020-03-19 03:15:08 UTC
Just to remind you, that I got a similar issue with VirtualBox [1], which don't use any ruby , so my first guess is that it's not a ruby problem . 


[1]
https://copr.fedorainfracloud.org/coprs/sergiomb/vboxfor23/package/VirtualBox/
https://download.copr.fedorainfracloud.org/results/sergiomb/vboxfor23/epel-8-x86_64/01311590-VirtualBox/
https://download.copr.fedorainfracloud.org/results/sergiomb/vboxfor23/epel-8-x86_64/01311590-VirtualBox/build.log.gz

cc1plus: warning: /builddir/build/BUILD/VirtualBox-6.1.4/obj/obj/VBoxAPIWrap/pch/VBoxAPIWrap-precomp_gcc.h.gch: had text segment at different address
cc1plus: error: one or more PCH files were found, but they were invalid
cc1plus: fatal error: /builddir/build/BUILD/VirtualBox-6.1.4/obj/obj/VBoxAPIWrap/pch/VBoxAPIWrap-precomp_gcc.h: No such file or directory
compilation terminated.
kmk: *** [/usr/share/kBuild/footer-pass2-compiling-targets.kmk:277: /builddir/build/BUILD/VirtualBox-6.1.4/obj/obj/VBoxAPIWrap/gen/CloudNetworkWrap.o] Error 1
kmk: *** Waiting for unfinished jobs....

Comment 38 Vít Ondruch 2020-03-19 08:33:43 UTC
(In reply to Sergio Basto from comment #37)
> Just to remind you, that I got a similar issue with VirtualBox [1], which
> don't use any ruby , so my first guess is that it's not a ruby problem . 

I think you should apply these parts of Jeff's reply to you case:

(In reply to Jeff Law from comment #34)
> The failure in RHEL 8 is due to GCC being compiled as a Position Independent
> Executable (PIE).  PIE is fundamentally incompatible with GCC's PCH
> implementation.  GCC's PCH implementation is essentially a memory dump after
> reading the header files that can be read back in to restore the compiler's
> state.  PIE scrambles addresses and as a result nothing from the memory dump
> is where it's expected to be when GCC reads the PCH back in.
> 
> PIE for all executable was introduced as part of an across-the-board effort
> to improve our security stance for RHEL 8.  It was initially driven by common
> criteria, but it's not solely for common criteria.

That means that GCC cannot reasonably handle PCH when GCC is hardened. Nevertheless GCC hardening will be kept on RHEL, because it is perceived as higher value for users. Moreover there are also possible security issues in GCC due to PCH, therefore it will be probably completely disabled.

Other than that, I think your case is a bit different, because there is just build time impact, not runtime impact as for Ruby. There should be possible to disable the PCH somehow in VB build scripts. You can also try to build VB using Clang, as I was suggested. Saying that, I think you should open separate ticket to address your case.

Comment 39 Jeff Law 2020-03-19 19:27:26 UTC
Vit is absolutely correct here.

Whomever is building vbox needs to going to have to go through a similar evaluation as Ruby.  ISTM the possibilities are drop PCH, but keep using GCC or switch to clang.

Comment 40 Jakub Jelinek 2020-03-19 19:32:52 UTC
One can also invoke the compiler with setarch x86_64 -R, if both the PCH compilation and PCH using are done that way, it should work.

Comment 41 Jun Aruga 2020-03-23 15:39:03 UTC
> One can also invoke the compiler with setarch x86_64 -R, if both the PCH compilation and PCH using are done that way, it should work.

The `setarch x86_64 -R` works well without error on my x86_64 mock environment of RHEL 8, Ruby 2.6.

```
$ ruby -v
ruby 2.6.3p62 (2019-04-16 revision 67580) [x86_64-linux]

$ TMP="$(pwd)" setarch "$(arch)" -R ruby --disable-gems --jit-verbose=2 --jit-save-temps --jit-min-calls=1 --jit-wait -e '1.times { puts "Hello" }'
MJIT: CC defaults to /usr/bin/gcc
MJIT: tmp_dir is /builddir/work
Creating precompiled header
Starting process: /usr/bin/gcc /usr/bin/gcc -m64 -w -Wfatal-errors -fPIC -shared -w -pipe -O3 -nodefaultlibs -nostdlib -o /builddir/work/_ruby_mjit_hp1627u0.h.gch /usr/include/rb_mjit_min_header-2.6.3.h
start compilation: block in <main>@-e:1 -> /builddir/work/_ruby_mjit_p1627u0.c
Starting process: /usr/bin/gcc /usr/bin/gcc -m64 -w -Wfatal-errors -fPIC -shared -w -pipe -O3 -o /builddir/work/_ruby_mjit_p1627u0.o /builddir/work/_ruby_mjit_p1627u0.c -c -lgcc -m64 -Wl,-z,relro -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -nostartfiles -nodefaultlibs -nostdlib
Starting process: /usr/bin/gcc /usr/bin/gcc -shared -Wfatal-errors -fPIC -shared -w -pipe -O3 -o /builddir/work/_ruby_mjit_p1627u0.so /builddir/work/_ruby_mjit_p1627u0.o -lgcc -m64 -Wl,-z,relro -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -nostartfiles -nodefaultlibs -nostdlib
JIT success (57.5ms): block in <main>@-e:1 -> /builddir/work/_ruby_mjit_p1627u0.c
Hello
Stopping worker thread
Successful MJIT finish

$ ls
_ruby_mjit_hp1627u0.h.gch  _ruby_mjit_p1627u0.c  _ruby_mjit_p1627u0.o  _ruby_mjit_p1627u0.so
```

I added a new case "RHEL 8.1 Module Ruby 2.6 on `setarch [arch] -R`" to my above Ruby jit testing repository here.
https://github.com/junaruga/ruby-jit-test

The performance of "RHEL 8.1 Ruby 2.6" is same level with the "Fedora 30 Ruby 2.6". 
See https://travis-ci.org/github/junaruga/ruby-jit-test/jobs/665918733#L662

Comment 42 Jeff Law 2020-03-23 15:51:02 UTC
setarch turns off a security feature, we should not be encouraging packages to use it to work around this issue.

Comment 43 Sergio Basto 2020-03-27 00:32:43 UTC
Hi,
I understood that is not the same problem , disable hardening  is a good tip but will not solve the problem. Why it just happens in EL 8 ? works on F29+ and EL 7 , are you suggesting that could be the GCC version ? or other security specific of EL 8 ? . 

Thanks for all information ..

Comment 44 Vít Ondruch 2020-03-27 07:50:21 UTC
(In reply to Sergio Basto from comment #43)
> Hi,
> I understood that is not the same problem , disable hardening  is a good tip
> but will not solve the problem. Why it just happens in EL 8 ? works on F29+
> and EL 7 , are you suggesting that could be the GCC version ? or other
> security specific of EL 8 ? . 
> 
> Thanks for all information ..

There were some private comments, which would provide more information (@Jakub, could you please make them visible? I don't think there is anything super secret). But the answer you are looking for is basically in comment #11. Then there is also similar information in third paragraph and following paragraphs of comment 34. Long story short, GCC in RHEL8 is hardened while it is not hardened anywhere else.

Comment 45 Jun Aruga 2020-05-13 12:49:15 UTC
I confirmed that clang built Ruby's JIT worked well on both Fedora and RHEL8. In Fedora, the clang built Ruby is a bit faster than gcc built Ruby. You can see the benchmark result at https://github.com/junaruga/ruby-jit-test/blob/master/doc/benchmark.md .

But there is a difference between upstream and Fedora/RHEL Ruby.

In the upstream Ruby, the C-extension RubyGem package is built on the C/C++ compiler used to build Ruby like this.

```
$ /mnt/share/rhel-8.2.0/ruby-2.7.1-clang/bin/gem install nio4r --verbose --user-install

$ cat /builddir/.gem/ruby/2.7.0/gems/nio4r-2.5.2/ext/nio4r/Makefile
...
CC = clang
CXX = clang++
...
```

But in Fedora/Ruby, even when Ruby is built on clang, the C-extension RubyGem packages have already been built on gcc.
So, I am planning to check if it works with Copr.

Comment 46 Sergio Basto 2020-06-05 17:15:21 UTC
(In reply to Vít Ondruch from comment #44)
> (In reply to Sergio Basto from comment #43)
> > Hi,
> > I understood that is not the same problem , disable hardening  is a good tip
> > but will not solve the problem. Why it just happens in EL 8 ? works on F29+
> > and EL 7 , are you suggesting that could be the GCC version ? or other
> > security specific of EL 8 ? . 
> > 
> > Thanks for all information ..
> 
> There were some private comments, which would provide more information
> (@Jakub, could you please make them visible? I don't think there is anything
> super secret). But the answer you are looking for is basically in comment
> #11. Then there is also similar information in third paragraph and following
> paragraphs of comment 34. Long story short, GCC in RHEL8 is hardened while
> it is not hardened anywhere else.

Hi, first of all thank you for all the information .

Meanwhile we could solve the VirtualBox problem, building it with "pre-compiled headers disabled" option [1].

I understand that is not a problem of GCC, usually "hardening" problems are more about the code that we want compile than GCC, but, IMO, GCC teams should help in resolution of this kind of problems.


[1]
https://pkgs.rpmfusion.org/cgit/free/VirtualBox.git/commit/?id=534430ee02d571456506126482919f390eb40d4f

Comment 50 RHEL Program Management 2021-02-01 07:41:32 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.