Bug 84500

Summary: linker corrupts result when building crafty
Product: Red Hat Enterprise Linux 2.1 Reporter: Larry Troan <ltroan>
Component: binutilsAssignee: Jakub Jelinek <jakub>
Status: CLOSED WONTFIX QA Contact:
Severity: low Docs Contact:
Priority: low    
Version: 2.1CC: ichute, lwoodman, tao
Target Milestone: ---   
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard: 2.1U4 (sev 4)
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-10-19 19:25:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 106715    

Description Larry Troan 2003-02-18 02:22:42 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0rc1) Gecko/20020424

Description of problem:
When building crafty with the Intel Compiler on Itanium at the -O2 level of
optimization, we have discovered that the linker is emitting an incorrect
executable based on correct object files.  The data I received from our IPF code
generator team is the following:

Through debugging the bad crafty executable, we were able to determine what is
going wrong.

The problem is that the data for knight_value_w, an array of ints, is being
corrupted in the link step.  In the .data section of the object file (main.o) we
have the correct values:

00000000000028f0 <knight_value_w>:
   28f0:       f7 ff ff ff fd ff       [BBB]       data8 0x1ffefffffff
   28f6:       ff ff fd ff ff ff                   data8 0x1fffff7ffff
   28fc:       fe ff ff ff                         data8 0x1fffffffdff
   2900:       fe ff ff ff fd ff       [-f-]       data8 0x1ffefffffff
   2906:       ff ff fd ff ff ff                   data8 0x1fffff7ffff
   290c:       f7 ff ff ff                         data8 0x1ffffffefff
   2910:       fb ff ff ff fd ff       [-d-]       data8 0x1ffefffffff
   2916:       ff ff ff ff ff ff                   data8 0x1ffffffffff
       ...

However, the .data section in a failing crafty executable contains this:

60000000000058b0 <knight_value_w>:
60000000000058b0:       f7 ff ff ff fd ff       [BBB]       data8 0x1ffefffffff
60000000000058b6:       ff ff fd ff ff ff                   data8 0x1fffff7ffff
60000000000058bc:       fe ff ff ff                         data8 0x1fffffffdff
60000000000058c0:       ff ff ff ff fd ff       [-f-]       data8 0x1ffefffffff
60000000000058c6:       ff ff ff ff ff ff                   data8 0x1ffffffffff
60000000000058cc:       f7 ff ff ff                         data8 0x1ffffffefff
60000000000058d0:       fb ff ff ff fd ff       [-d-]       data8 0x1ffefffffff
60000000000058d6:       ff ff ff ff ff ff                   data8 0x1ffffffffff
       ...

Note that the mismatching data is at 2900 and 58c0.

It does not look like we are dealing with a compiler problem.  There are no
relocations in this section, so the linker should be ouputing exactly what the
.o is feeding into it.  Our elf object analyzer tool was run over all the .o
files for crafty, and it reported no relocation errors, so it certainly seems
like the compiler is not at fault.

At this point, we feel it is time for a linker expert to look at the problem and
discern why the link step is altering the data.  We have included a list of 
observations we have made in our effort to debug this in case they are of any use.  

� A bad crafty image will fail on any system it is copied to.  This is true for
images that are linked static and for those that are linked shared.
� The bad images have been produced on two Itanium 2 systems running RH7.2.
� The .o files from a failing build (compile + link)  always produce a good
crafty image when linked on an Itanium RH7.2 systemnht.  Both systems have GNU
ld version 2.11.90.0.8 (with BFD 2.11.90.0.8) installed.
� Linking the .o files on the Itanium 2 RH7.2 system with the ld and libraries
brought from the Itanium RH7.2 system usually produces a failing crafty image.
� This failure is hit-and-miss.  A link command that results in a bad image may
result in a good image the next time it is tried.  To the best of our knowledge,
all links that result in a bad image have the problem described above.
An additional note states the following:

We have attached the image crafty-fail.tar.gz, which contains everything needed
to reproduce this.  To reproduce the failure:
gunzip crafty-fail.tar.gz
tar xvf crafty-fail.tar
sh link.sh
This will create an executable, crafty.exe.  To run the executable type:
crafty.exe < crafty.in > myoutput
diff myoutput crafty.out
A bad executable will result in differences, a good one will not.  The problem
is that the data for knight_value_w (an array of ints) is bad after the link. 
Before the link we have the correct values (from main.o):

00000000000028f0 <knight_value_w>:
   28f0:       f7 ff ff ff fd ff       [BBB]       data8 0x1ffefffffff
   28f6:       ff ff fd ff ff ff                   data8 0x1fffff7ffff
   28fc:       fe ff ff ff                         data8 0x1fffffffdff
   2900:       fe ff ff ff fd ff       [-f-]       data8 0x1ffefffffff
   2906:       ff ff fd ff ff ff                   data8 0x1fffff7ffff
   290c:       f7 ff ff ff                         data8 0x1ffffffefff
   2910:       fb ff ff ff fd ff       [-d-]       data8 0x1ffefffffff
   2916:       ff ff ff ff ff ff                   data8 0x1ffffffffff
       ...

After the link we have (the data at 58c0 is incorrect):

60000000000058b0 <knight_value_w>:
60000000000058b0:       f7 ff ff ff fd ff       [BBB]       data8 0x1ffefffffff
60000000000058b6:       ff ff fd ff ff ff                   data8 0x1fffff7ffff
60000000000058bc:       fe ff ff ff                         data8 0x1fffffffdff
60000000000058c0:       ff ff ff ff fd ff       [-f-]       data8 0x1ffefffffff
60000000000058c6:       ff ff ff ff ff ff                   data8 0x1ffffffffff
60000000000058cc:       f7 ff ff ff                         data8 0x1ffffffefff
60000000000058d0:       fb ff ff ff fd ff       [-d-]       data8 0x1ffefffffff
60000000000058d6:       ff ff ff ff ff ff                   data8 0x1ffffffffff
       ...
I (joe goodman at Intel) have included a bad executable in the tarball,
crafty.bad.  Note that this failure does not consistently reproduce and may be
sensitive to hardware.  This problem has only been reproduced on an Itanium 2
system running RH Adv. Server 2.1.

We would be happy to have a memory of the IPF code generator team work with
someone at Red Hat to help isolate this problem if required.

Version-Release number of selected component (if applicable):


How reproducible:
Sometimes (hit and miss)
Suggest linker expert peruse the data to see if he/she can figure out if it's a
linker problem.


ISSUE TRACKER 11525 opened by Intel as severity 2.

Comment 1 Larry Troan 2003-02-18 02:31:54 UTC
Can not append attachment crafty-fail.tar.gz because 6738KB is too large (1000KB
max size Bugzilla attachment). Will trya to break up file and append pieces...
------
In the meantime refer to Issue Tracker #11525 top attachment (bottom one is
older and replaced by top entry).

Comment 2 Larry Troan 2003-03-10 13:39:35 UTC
Reducing priority/severity to reflect Issue Tracker change to Severity 4. This
bug is now targeted for AS2.1 Q3 per Jason's note in Issue Tracker.

Comment 3 Jakub Jelinek 2003-04-04 16:04:24 UTC
Can anyone please fetch the tarball from IT and store it somewhere where can
I access it (devserv, porkchop, ...)? I have no access to IT.

Comment 5 Larry Troan 2003-07-14 18:30:57 UTC
FROM ISSUE TRACKER........
Event posted 06-17-2003 02:52pm by bernds with duration of 0.00
Do we have an IPF machine which I can access to try this on?



Comment 6 Larry Troan 2003-07-14 18:32:26 UTC
There are 5 Tiger4 machines in Raleigh in the QA lab. I don't know how many
Tiger4 boxes Boston has.... Don't think we have any Tiger2 boxes insdie Red
Hat.We also have another IHV IPF box here in Raleigh.  

Comment 7 Bernd Schmidt 2003-07-21 18:14:21 UTC
Hostnames?

Comment 8 Larry Troan 2003-09-04 17:43:33 UTC
Jeremy Katz has an Intel Tiger 2 box in his office. Don't know if it's connected
to the network.

Centenniel Q/A Lab has 5 Tiger 4's but they're not listed in
http://master.perf.redhat.com/.   jkt will set one up with 2.1+updates and
contact you... 

Comment 9 Suzanne Hillman 2003-10-03 21:01:29 UTC
This needed to be in the blocker list for U3, and not depending on it.

Comment 11 RHEL Program Management 2007-10-19 19:25:22 UTC
This bug is filed against RHEL2.1, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products.  Since
this bug does not meet that criteria, it is now being closed.

For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/

If you feel this bug is indeed mission critical, please contact your
support representative.  You may be asked to provide detailed
information on how this bug is affecting you.