Bug 1822934

Summary: g++ 4.8.5 runs out of memory whereas same works with old g++
Product: Red Hat Enterprise Linux 8 Reporter: Piyush Bhoot <pbhoot>
Component: gccAssignee: Marek Polacek <mpolacek>
gcc sub component: system-version QA Contact: qe-baseos-tools-bugs
Status: CLOSED WONTFIX Docs Contact:
Severity: urgent    
Priority: unspecified CC: ahajkova, bgollahe, chhudson, fweimer, jakub, jwright, law, mattn, mpolacek, ohudlick, pandrade, sipoyare, vmukhame
Version: 8.4   
Target Milestone: rc   
Target Release: 8.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-12 16:27:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Minimal reproducer
none
Latest reproducer
none
preprocessor file 4.4.7
none
preprocessed file 4.8
none
new code none

Description Piyush Bhoot 2020-04-10 14:37:24 UTC
Created attachment 1677803 [details]
Minimal reproducer

Description of problem:
gcc verison 4.8.5-39 runs out of memory whereas same works on old version. Tried on gcc 4.4.7-3.

Version-Release number of selected component (if applicable):
gcc 4.8.5-39

How reproducible:
Untar attached tar file and use below mentioned compile command. On old g++ (4.4.7-3) compilation is successful. On 4.8.5-39, it fails.

Steps to Reproduce:

1. tar zxvf gcc_data.tgz
2. cd gcc_data

Using -fstack-reuse=named-vars or -fstack-reuse=none does not allow the latest code to compile:

==============================================================
$ g++ -march=nocona -mfpmath=sse  -include inclall.h  -o s1823_1.o -shared -O0 -fno-strict-aliasing -fstack-reuse=named_vars -fexceptions -fsigned-char -Wno-invalid-offsetof -fpic -fpermissive -DNZDEBUG=0 -DGENCODE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -DFOR_SPU -I./include s4_7.cpp 
==============================================================
Or
==============================================================
$ g++ -march=nocona -mfpmath=sse  -include inclall.h  -o s1823_1.o -shared -O0 -fno-strict-aliasing -fstack-reuse=none -fexceptions -fsigned-char -Wno-invalid-offsetof -fpic -fpermissive -DNZDEBUG=0 -DGENCODE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -DFOR_SPU -I./include s4_7.cpp 
==============================================================
Actual results:

$ g++ -march=nocona -mfpmath=sse  -include inclall.h  -o s1823_1.o -shared -O0 -fno-strict-aliasing -fstack-reuse=named_vars -fexceptions -fsigned-char -Wno-invalid-offsetof -fpic -fpermissive -DNZDEBUG=0 -DGENCODE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -DFOR_SPU -I./include s4_7.cpp 
g++: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://bugzilla.redhat.com/bugzilla> for instructions.


Expected results:


$ g++ -march=nocona -mfpmath=sse  -include inclall.h  -o s1823_1.o -shared -O0 -fno-strict-aliasing -fstack-reuse=named_vars -fexceptions -fsigned-char -Wno-invalid-offsetof -fpic -fpermissive -DNZDEBUG=0 -DGENCODE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -DFOR_SPU -I./include s4_7.cpp 
$ echo $?
0

Additional info:
use of -fstack-reuse=named-vars or -fstack-reuse=none was suggested in similar 
https://bugzilla.redhat.com/show_bug.cgi?id=1787575

Opening fresh bug as customer looking for workaround on priority since earlier suggestions not working in this situation.

Comment 2 Marek Polacek 2020-04-14 15:05:32 UTC
This compiles on my box with trunk, but it ate almost 60GB RAM.  The function GenPlan is just so insanely huge (over 300,000 lines!).  Compiling with -fno-exceptions doesn't seem like an option because the code has throw std::bad_alloc().

I don't think this can be (safely) fixed in GCC 4.8 or GCC 8.

Comment 22 Piyush Bhoot 2020-06-19 11:59:27 UTC
Created attachment 1698079 [details]
Latest reproducer

Considering to todays upcoming call, pushing customer comment and reproducer as it is.

Comment 28 Piyush Bhoot 2020-06-26 12:15:03 UTC
Created attachment 1698914 [details]
preprocessor file 4.4.7

Comment 29 Piyush Bhoot 2020-06-26 12:17:26 UTC
Created attachment 1698915 [details]
preprocessed file 4.8

Comment 30 Jeff Law 2020-06-29 19:31:41 UTC
So as I mentioned in the meeting, I've been investigating -O1 plus some additional -fno-<pass> arguments as a mitigation strategy.  In simplest terms -O1 introduces some optimization, but in general it bypasses optimizations which tend to blow up on large codes (such as we find in the testcases from Netezza).  The hope was that the light optimization provided by -O1 would dramatically reduce the number of blocks, edges and SSA_NAMEs which in turn would dramatically reduce the compile time memory usage, particularly in the coalescing phase of the out-of-ssa pass.

The good news is that -O1 does provide the benefits we hoped for.  The CFG and SSA_NAMEs are dramatically reduced and the peak memory requirements drop from ~50G to around 7G for the testcase in this BZ.

The bad news is -O1 does trigger some significant nonlinear behavior for the original testcase in this BZ as well as in the testcase from BZ1787575.  The non-linear behavior is part of the IPA pipeline and unfortunately the knobs to control behavior of this aspect of the IPA pipeline do not exist in gcc-4.8.

--


The other option I looked at is -Og.  I hadn't mentioned it earlier because I thought -Og wasn't introduced until gcc-4.9.  However it was actually introduced in gcc-4.8.  The advantage -Og has is that it bypasses the IPA pipeline.  The downside is it's not quite as good at reducing the size of the CFG and SSA_NAME tables.  As a result its peak memory usage on s4_7 is 9G (vs 7G for -O1).  That may be enough to avoid the OOM killer for Netezza's customers in the immediate term and give Netezza some time to address their code generator.  -Og also seems to perform reasonably well on the other tests I checked (s8_1 and s1823_1).

So the immediate recommendation I have for Netezza is to replace "-O0" with "-Og -fno-move-loop-invariants" and see if that's sufficient to work around this problem in the immediate term.


Also note, we do not have the "s730_7" test.  So no evaluation could be done with that test.  Given c#7, I would expect that test to probably still fail and is another strong signal that the Netezza code generator needs to be fixed.

Comment 34 Piyush Bhoot 2020-07-24 15:39:08 UTC
Created attachment 1702353 [details]
new code

Comment 36 Jeff Law 2020-10-12 16:27:58 UTC
As has been expressed to Netezza several months ago, this is not something we're going to fix in a gcc-4.8 era compiler.  Even the -Og workaround isn't always going to work -- which is a symptom of the underlying issue -- the size of the generated functions being passed to the compiler needs to be fixed.  Trying to run GCC on functions of this size on highly memory constrained systems just isn't going to work out in the long run.