Bug 427700 - "Error: operand out of range" while building openbabel python bindings
Summary: "Error: operand out of range" while building openbabel python bindings
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: gcc
Version: rawhide
Hardware: ppc64
OS: Linux
low
low
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact: Fedora Extras Quality Assurance
URL: http://koji.fedoraproject.org/koji/ta...
Whiteboard:
Depends On:
Blocks: FE-ExcludeArch-ppc64, F-ExcludeArch-ppc64
TreeView+ depends on / blocked
 
Reported: 2008-01-06 21:46 UTC by Dominik 'Rathann' Mierzejewski
Modified: 2008-06-13 13:12 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-01-07 21:27:59 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
ppc64 build log from koji (gzipped) (33.12 KB, application/x-gzip)
2008-01-06 21:46 UTC, Dominik 'Rathann' Mierzejewski
no flags Details


Links
System ID Private Priority Status Summary Last Updated
GNU Compiler Collection 34708 0 None None None Never

Description Dominik 'Rathann' Mierzejewski 2008-01-06 21:46:16 UTC
Description of problem:
openbabel fails to build with gcc-4.3 on ppc64.

Version-Release number of selected component (if applicable):
gcc-4.3.0-0.4

How reproducible:
Always

Steps to Reproduce:
1. koji build --scratch dist-f9-gcc43
cvs://cvs.fedoraproject.org/cvs/pkgs?rpms/openbabel/devel#HEAD
  
Actual results:
[...]
openbabel_perl.cpp:113918: warning: unused variable 'items'
{standard input}: Assembler messages:
{standard input}:1216558: Error: operand out of range (0x0000000000008000 is not
between 0xffffffffffff8000 and 0x0000000000007ffc)
(many repeated similar lines)

Comment 1 Dominik 'Rathann' Mierzejewski 2008-01-06 21:46:16 UTC
Created attachment 290922 [details]
ppc64 build log from koji (gzipped)

Comment 2 Dominik 'Rathann' Mierzejewski 2008-01-06 23:19:57 UTC
For the record: builds fine on i386, x86_64 and ppc.

Comment 3 Jakub Jelinek 2008-01-07 21:27:59 UTC
This means .toc1 section overflow - on ppc64 .toc1 for one CU can have at most
64KB in size.  The generated source is really huge, even with g++ 4.1 .toc1 size
is over 55KB, so close to the limit.  To that adds an issue in inlining heuristic
function size estimation, filed http://gcc.gnu.org/PR34708 for that and the newly
added -finline-small-functions which is default at -O2.  The inlining estimation
causes the size of SWIG_Perl_ErrorType to be incorrectly estimated, much smaller
than it really is, and then it is inlined thousands of times, in each case needing
a jump table which eats one .toc1 entry.

In the mean time, the best change to make this build with current 4.3 is
IMHO add __attribute__((noinline)) to SWIG_Perl_ErrorType to prevent inlining it.

Comment 4 Dominik 'Rathann' Mierzejewski 2008-01-07 22:50:12 UTC
Thanks for the quick response. I did as you suggested, but it still fails, just
a little bit further.

http://koji.fedoraproject.org/koji/getfile?taskID=332336&name=build.log

I'd rather wait until this is fixed upstream. I don't like working around
compiler bugs.

Comment 5 Jakub Jelinek 2008-01-07 23:11:31 UTC
In this case it is not really a compiler bug, perhaps not very good inlining
decision.  With -O3 you will overflow .toc1 even with 4.1.  And when reaching
target limitations packages that want to build just need to do some steps to
help it building, which can be spliting the huge source into several smaller ones,
or aggregating some string literal addresses into arrays, etc.
BTW, another alternative which would make inlining SWIG_Perl_ErrorType actually
a win would be to rewrite SWIG_Perl_ErrorType to reference a static array,
indexed by code+13, with all the error codes and only handle the cases where code+13
is out of that static array bounds.  Then no jump table is needed (even 4.1 uses
switch here, just doesn't inline the function containing it).  An optimization
which would do this automatically has been submitted for GCC some time ago, but
has some nits to be still worked on and as such probably won't make it into 4.3.

Comment 6 Dominik 'Rathann' Mierzejewski 2008-01-19 20:23:16 UTC
(In reply to comment #5)
> In this case it is not really a compiler bug, perhaps not very good inlining
> decision.

That's a bug in my book.

> With -O3 you will overflow .toc1 even with 4.1.

Maybe, but we don't use -O3 in Fedora.

Anyway, thanks for the suggestions. I've submitted this to openbabel developers.


Comment 7 Kevin Kofler 2008-05-08 15:30:38 UTC
I'm looking into this because it's blocking updating kdeedu to the 4.1 
snapshots. I tried -fno-inline-small-functions and also -fno-inline-functions, 
neither seems to help. :-(

Comment 8 Kevin Kofler 2008-05-08 15:36:55 UTC
BTW it now fails in the Python bindings, not the Perl ones. Jakub, if you think 
this is a different issue, we can open a different bug.

Comment 9 Kevin Kofler 2008-05-08 15:50:40 UTC
Build log with -fno-inline-functions -fno-inline-small-functions:
http://koji.fedoraproject.org/koji/getfile?taskID=600751&name=build.log

Comment 10 Kevin Kofler 2008-05-08 16:26:10 UTC
I tried adding -fno-inline too, but that just makes the section overflow even 
more:
http://koji.fedoraproject.org/koji/getfile?taskID=600792&name=build.log

Comment 11 Kevin Kofler 2008-05-09 00:39:57 UTC
I finally got this to build for F10 (not tried F9 yet) after lots of trial and 
error, using these 2 hacks:
http://cvs.fedoraproject.org/viewcvs/rpms/openbabel/devel/openbabel.spec?r1=1.18&r2=1.32
(note the huge number of revisions between the original and the attempt which 
finally worked).

The SWIG switch "-fastdispatch" makes the code faster and smaller (and with 
less TOC1 entries) at the expense of error message quality when you pass a bad 
parameter to an overloaded function from Python (not a real issue IMHO and 
better than not having the binding available at all!).

The GCC switch "-mno-sum-in-toc" saves TOC1 entries at the expense of speed. It 
would probably be better to use the GCC switch only for that one file, but 
setting it globally at least gets this thing to build and doesn't need makefile 
hackery.

Unfortunately, this isn't a permanent solution though: I've seen they've added 
even more stuff to their Python binding in their SVN repository after beta 4, 
and I only brought this barely below the TOC1 limit, so I fully expect this to 
blow up again in the near future. :-(

Comment 12 Kevin Kofler 2008-05-29 21:33:23 UTC
As predicted, the problem is back with beta 5 which has even an larger Python 
binding. :-( 353 toc1 entries too many, despite the above tricks. I have no 
idea how to make those fit.

Comment 13 Kevin Kofler 2008-06-06 18:15:33 UTC
Looks like this was finally fixed upstream:
http://openbabel.svn.sourceforge.net/viewvc/openbabel?view=rev&revision=2535

I'm backporting that fix to the openbabel package. (I guess I should really 
become an official comaintainer of that package.)

Comment 14 David Woodhouse 2008-06-13 08:39:14 UTC
I never did understand why we're limited to a single TOC in each relocatable
object file. Each function has its own function descriptor with its own TOC
pointer, after all -- I don't see why they all have to point to the same place.
Why can't each have their _own_ TOC, if necessary?

In fact, if you use -mminimal-toc you get something _similar_ to that, but GCC
uses an extra register as a pointer to its '.toc1' and similar sections, and
loads that pointer from the 'real' TOC instead of just putting it in the
function descriptor. Which seems a bit strange and wastes a register.

When I hit TOC size problems with the ppc64 ocaml back end and nobody could
answer the above, I just stopped using the 'real' TOC altogether, and pointed
each function's descriptor at its own local TOC instead. It seems to work fine,
although strictly speaking it breaks the ABI because the TOC pointer in the
function descriptor is supposed to point to the TOC section.

If we can't do that, can we not at least have the compiler keep track of how big
it's making the TOC, and start enabling -mno-fp-in-toc, -mno-sum-in-toc, or
-mminimal-toc automatically?

Comment 15 Kevin Kofler 2008-06-13 13:12:06 UTC
-mminimal-toc is already used, the problem is that this just means that instead 
of global .toc entries, we have per-compilation-unit .toc1 entries, and it's 
that .toc1 section which overflowed because the compilation unit was too big. 
(OpenBabel upstream "fixed" it by splitting the bindings so there are separate 
compilation units. This was nontrivial because, due to the way SWIG works, this 
means the parts also have to be separate Python extensions, in a separate 
Python namespace - you can't split a single generated binding into multiple 
compilation units. What they did is use the Python "import" statement to bring 
these back into one namespace.)


Note You need to log in before you can comment on or make changes to this bug.