Bug 503816

Summary: ICE from c++ after eating all memory
Product: [Fedora] Fedora Reporter: Dan Horák <dan>
Component: gccAssignee: Jakub Jelinek <jakub>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 11CC: aoliva, jakub
Target Milestone: ---   
Target Release: ---   
Hardware: s390x   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-06-23 19:16:09 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 467765    
Attachments:
Description Flags
failing file none

Description Dan Horák 2009-06-02 16:28:03 EDT
Created attachment 346314 [details]
failing file

cc1plus eats all available memory and gets killed with

c++: Internal error: Killed (program cc1plus)
Please submit a full bug report.
See <http://bugzilla.redhat.com/bugzilla> for instructions.

when compiling a file on s390x. It occurred when building scribus on Fedora 11 on s390x, the build is https://s390.koji.fedoraproject.org/koji/taskinfo?taskID=72412 and search build.log for "scribus134format.o"

g++ (GCC) 4.4.0 20090506 (Red Hat 4.4.0-4)

Preprocessed file is in attachment, the command used to compile is
g++ -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -march=z9-109 -mtune=z10  -O2 -Wall -fPIC -fPIC -o scribus134format.o -c scribus134format.i

With plain  "g++ -o scribus134format.o -c scribus134format.i" the compile succeeds.

I don't have a minimal test case available yet. A copy of the buildroot is stored on the builder for further investigation.
Comment 1 Dan Horák 2009-06-09 16:50:16 EDT
While creating the minimal test case I have found this:
- the option causing abnormal memory usage is -O2
- run times of g++ are cca 5 sec (w/o -O2) vs. 5 mins (with -O2)
- memory consumption by cc1plus cca 300MB vs. 1.8GB (as seen in "top")
Numbers are from a reduced source file, where the compile is successful.

The builder has 2GB RAM + 0.5GB swap
Comment 2 Dan Horák 2009-06-10 08:16:41 EDT
Hm, I was able to successfully compile the original file, it took 13.5 minutes (user) with free memory about few MB, but it was built. I still think that something is wrong when 140 KB/3600 lines of C++ code using the QT library doesn't always build with 2.5 GB memory.
Comment 3 Jakub Jelinek 2009-06-15 15:25:31 EDT
The source isn't that small, e.g. loadFile method has 13784 basic blocks.
Apparently most of the time is spent in var tracking, without -g or with -fno-var-tracking cc1plus (cross from x86-64) topped at 800GB and was also much faster,
in var-tracking it spends huge amount of time.  Haven't tried to find out how many variables are tracked, but probably many.
Comment 4 Jakub Jelinek 2009-06-18 03:17:44 EDT
I've gathered some statistics on this testcase:
--- var-tracking.c.xx	2009-03-04 12:12:08.000000000 +0100
+++ var-tracking.c	2009-06-18 08:54:34.000000000 +0200
@@ -2143,6 +2143,31 @@ compute_bb_dataflow (basic_block bb)
   return changed;
 }
 
+void
+print_vta_stats (void)
+{
+  static int cnt;
+  basic_block bb;
+  size_t size, elements, collisions;
+  fprintf (stderr, "VTA step %d", ++cnt);
+  fprintf (stderr, " attrs_pool %zd (%zd %zd)", attrs_pool->block_size * attrs_pool->blocks_allocated, attrs_pool->elts_allocated, attrs_pool->elts_free);
+  fprintf (stderr, " var_pool %zd (%zd %zd)", var_pool->block_size * var_pool->blocks_allocated, var_pool->elts_allocated, var_pool->elts_free);
+  fprintf (stderr, " loc_chain_pool %zd (%zd %zd)", loc_chain_pool->block_size * loc_chain_pool->blocks_allocated, loc_chain_pool->elts_allocated, loc_chain_pool->elts_free);
+  size = 0;
+  elements = 0;
+  collisions = 0;
+  FOR_EACH_BB (bb)
+    {
+      size += htab_size (VTI (bb)->in.vars);
+      elements += htab_elements (VTI (bb)->in.vars);
+      collisions += htab_collisions (VTI (bb)->in.vars);
+      size += htab_size (VTI (bb)->out.vars);
+      elements += htab_elements (VTI (bb)->out.vars);
+      collisions += htab_collisions (VTI (bb)->out.vars);
+    }
+  fprintf (stderr, " htab %zd (%zd, %zd, %zd)\n", size * sizeof (void *), size, elements, collisions);
+}
+
 /* Find the locations of variables in the whole function.  */
 
 static void
@@ -2185,6 +2210,9 @@ vt_find_locations (void)
       in_pending = in_worklist;
       in_worklist = sbitmap_swap;
 
+if (n_basic_blocks > 5000)
+print_vta_stats ();
+
       sbitmap_zero (visited);
 
       while (!fibheap_empty (worklist))

printed:

VTA step 1 attrs_pool 32776 (1024 1022) var_pool 25608 (64 62) loc_chain_pool 32776 (1024 1022) htab 1543248 (192906, 0, 0)
VTA step 2 attrs_pool 721072 (22528 269) var_pool 5889840 (14720 44) loc_chain_pool 917728 (28672 875) htab 597983248 (74747906, 41276836, 22763)
VTA step 3 attrs_pool 983280 (30720 110) var_pool 8783544 (21952 3224) loc_chain_pool 1311040 (40960 5862) htab 1006302720 (125787840, 67735133, 7017)
VTA step 4 attrs_pool 2064888 (64512 688) var_pool 9014016 (22528 391) loc_chain_pool 1343816 (41984 105) htab 1527233520 (190904190, 93031713, 1047)
VTA step 5 attrs_pool 2589304 (80896 22) var_pool 10729752 (26816 3658) loc_chain_pool 1606024 (50176 6183) htab 1790226768 (223778346, 103611403, 325)
VTA step 6 attrs_pool 4064224 (126976 94) var_pool 11011440 (27520 4342) loc_chain_pool 1638800 (51200 7202) htab 1790226768 (223778346, 107777443, 325)
VTA step 7 attrs_pool 4064224 (126976 94) var_pool 11011440 (27520 3369) loc_chain_pool 1638800 (51200 5767) htab 1790226768 (223778346, 109835059, 325)
VTA step 8 attrs_pool 32776 (1024 1019) var_pool 25608 (64 59) loc_chain_pool 32776 (1024 1019) htab 618576 (77322, 0, 0)
VTA step 9 attrs_pool 884952 (27648 969) var_pool 2227896 (5568 53) loc_chain_pool 360536 (11264 678) htab 70594384 (8824298, 4875009, 8525)
VTA step 10 attrs_pool 1212712 (37888 983) var_pool 3252216 (8128 1069) loc_chain_pool 491640 (15360 1746) htab 159846368 (19980796, 9505894, 2727)
VTA step 11 attrs_pool 1737128 (54272 56) var_pool 3380256 (8448 165) loc_chain_pool 524416 (16384 236) htab 173239072 (21654884, 12905301, 460)
VTA step 12 attrs_pool 1835456 (57344 262) var_pool 3841200 (9600 308) loc_chain_pool 622744 (19456 1293) htab 174586496 (21823312, 14869139, 181)
VTA step 13 attrs_pool 2327096 (72704 538) var_pool 4148496 (10368 1061) loc_chain_pool 655520 (20480 2219) htab 174619360 (21827420, 15376847, 181)
VTA step 14 attrs_pool 2327096 (72704 538) var_pool 4148496 (10368 394) loc_chain_pool 655520 (20480 1397) htab 184610016 (23076252, 16048639, 181)

step 1-7 are in the largest function, which shows that the 3 alloc pools are really uninterestingly small, but all VTA memory (1.7GB) is in the hash tables.
There are 13784 basic blocks in loadFile, each basic block has 2 hash tables (in.vars and out.vars), so on average each hash table has 3984 occupied elements and 8117 allocated elements.
Comment 5 Jakub Jelinek 2009-06-18 04:20:29 EDT
I've gathered another thing, for each bb accounted in print_vta_stats check if
dataflow_set_different (&VTI (bb)->in, &VTI (bb)->out).  On the loadFile functions, out of the 13779 bbs processed, first step obviously had 0 differences,
next step 4083, then the remaining steps ranging from 2239 to 2250 basic blocks where in and out actually differed.  That's just 16% when the memory consumption jumps through the roof, which leads to the question how many bbs have also the !dataflow_set_different between out of predecessors and their in.
Perhaps we could use refcounted copy on write hash tables instead of emptying them and filling again all the time.  If a hash table is shared (refcount > 1), then we'd just use the NO_INSERT lookups and only if we find out we want to modify it, we'd allocate/pick from a free list a htab_t (with a refcount), vars_copy into it the original htab_t, set refcount to 1 and then start doing normal INSERT lookups in it.
Comment 6 Jakub Jelinek 2009-06-23 19:16:09 EDT
Please try gcc-4.4.0-10 in rawhide.
Comment 7 Dan Horák 2009-07-29 06:31:24 EDT
I have updated the buildroot with

gcc-c++-4.4.0-4.s390x.rpm
binutils-2.19.51.0.14-29.1.fc12.s390x.rpm

and this time the build crashes with

{standard input}: Assembler messages:
{standard input}:224817: Warning: end of file not at end of a line; newline inserted
{standard input}:226237: Error: unknown pseudo-op: `.stri'
c++: Internal error: Killed (program cc1plus)
Please submit a full bug report.
See <http://bugzilla.redhat.com/bugzilla> for instructions.
make[2]: *** [scribus/plugins/fileloader/scribus134format/CMakeFiles/scribus134format.dir/scribus134format.o] Error 1
make[1]: *** [scribus/plugins/fileloader/scribus134format/CMakeFiles/scribus134format.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....

but it's most likely because the memory requirements during a parallel build (make -j2). With sequential build, it's very tight, but fine.