Bug 24524 - Bogus linking with object files generated by g77
Summary: Bogus linking with object files generated by g77
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: binutils
Version: 7.0
Hardware: i386
OS: Linux
high
high
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact: David Lawrence
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2001-01-21 22:10 UTC by Alfredo Ferrari
Modified: 2005-10-31 22:00 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2001-02-11 12:14:47 UTC
Embargoed:


Attachments (Terms of Use)
examples showing the bugs (1.34 KB, text/plain)
2001-01-21 22:11 UTC, Alfredo Ferrari
no flags Details
It supersedes the previous attachment (the problem decsription was incomplete) (1.52 KB, text/plain)
2001-01-22 09:13 UTC, Alfredo Ferrari
no flags Details
sma e as previous ones, but containing the script dobugs as well (1.88 KB, text/plain)
2001-01-24 09:23 UTC, Alfredo Ferrari
no flags Details
updated test cases with map files from HP-UX, IBM-AIX, Compaq TRU UNIX native compilers (see additional comments) (139.56 KB, text/plain)
2001-02-04 18:19 UTC, Alfredo Ferrari
no flags Details

Description Alfredo Ferrari 2001-01-21 22:10:02 UTC
I provide a few fortran test files, a description and a small script to
show two very serious bugs occuring when linking g77 and gcc produced
object files among themselves using libraries. The two bugs effectively
prevents *ANY* use of RH7 together with standard mathematical libraries
used here at CERN and in many other scientific labs around the world.

The first problem is related to a crazy behaviour of the linker when
an object file contains a common with the same name of a routine present in
one of the libraries used for linking even though that routine has no need
to be linked (example ssplit.f). Practically whichever fortran code must
contains common names which are different from whichever routine name found
in whichever user or system library quoted in the linking sequence, even
though such routines should not be used at all in the linking process.
This is a very though and non standard behaviour which broke immediately
a couple of codes  (out of two trials...). It was working properly under
RH6.2. Installing the compat-egcs-g77 and compiling/linking with it does
not solve the issue, perhaps pointing to a linker bug rather than to a
compiler bug.

The second one is even more dangerous. When generic mathematical functions
are used (ie log10) the compiler translates them into the correct specific
function (ie d_lg10 for a double precision variable) which should then be
loaded from the proper system library (libg2c I believe according to the
map file). Under RH7, if one user library contains an entry matching the
INTERNAL compiler name (d_lg10 in this case) it is picked up instead of the
intrinsic function. This is grossly wrong since of course the user has no
control over which internal names are used and cannot be completely unaware
that the same name has been used is some other library he has to link
against. I mean, it would be obviously correct to pick up log10 from a user
library if it contained a log10 function indeed, while it is incorrect to
pickup d_lg10 which was never referenced as such. RH6.2 was behaving
correctly in this respect, againg both gcc/g77-2.96 and egcs-1.1.2 behave
wrongly on a RH7 system perhaps again pointing to a linker issue.
The example is lgtest.f

Both problems occurred on real huge programs used by hundreds of users at
CERN and around the world as soon as I tried to build them under RH7...
The examples are simple stripped down tests for your easeness. A third
less serious problem is documented with the loadcm.f example. I reported
it as well since it could be related to the same bugs (look at the map
file)

The following files can be found in ldbug.tar.gz with anonymous ftp to
pcslbt07.cern.ch:/pub .

bugcm.spiega:

- If a common exists in the object file, block datas with the same common
  are loaded from every referenced library even though no explicit
  EXTERNAL statement or user directive tells the linker to do so
  (loadcm.f) (incorrect!!)

- If a common exists in the object file, and a routine with the same
  name exists in any referenced library even though such routine 
  has not to be linked a clash occurs, and the routine is linked...
(ssplit.f)
  (incorrect!!)

- If a routine exists in the object file, and a common with the same
  name exists in any referenced library and no routine with that
  common has to be linked no clash occurs (ssplit1.f) (correct!!)

- If a routine exists both in the object file and in one of the referenced 
  libraries the one in the object file is loaded (ssplit2.f and ssplit3.f,
  correct!!)

- If a common exists in the object file, and a common with the same
  name exists in any referenced library and no library routine with that
  has to be linked no clash occurs (ssplit4.f) (correct!!)

- If a generic mathematical function is translated by the compiler in the
  specific name of the relevant intrinsic function of the system library,
  if by chance the same routine exists in a referenced library, despite
  that name is NEVER referenced in the original code it gets likned instead
  of the intrinsic one (lgtest.f) (incorrect!!)

dobugs:

#!/bin/sh
# F77=i386-glibc21-linux-g77
  F77=g77
  ${F77} -c *bugs.f
  gcc -c *bugs.c
  ar rv libbugs.a *bugs.o
  ranlib libbugs.a
#
  echo
  echo ${F77} -c --no-silent loadcm.f
  ${F77} -c --no-silent loadcm.f
  echo ${F77} -v -o loadcm -Xlinker -Map -Xlinker loadcm.map loadcm.o -L.
-lbugs
  ${F77} -v -o loadcm -Xlinker -Map -Xlinker loadcm.map loadcm.o -L. -lbugs
#
  echo
  echo ${F77} -c --no-silent ssplit.f
  ${F77} -c --no-silent ssplit.f
  echo ${F77} -v -o ssplit -Xlinker -Map -Xlinker ssplit.map ssplit.o -L.
-lbugs
  ${F77} -v -o ssplit -Xlinker -Map -Xlinker ssplit.map ssplit.o -L. -lbugs
#
  echo
  echo ${F77} -c --no-silent ssplit1.f
  ${F77} -c --no-silent ssplit1.f
  echo ${F77} -v -o ssplit1 -Xlinker -Map -Xlinker ssplit1.map ssplit1.o
-L. -lbugs
  ${F77} -v -o ssplit1 -Xlinker -Map -Xlinker ssplit1.map ssplit1.o -L.
-lbugs
#
  echo
  echo ${F77} -c --no-silent ssplit2.f
  ${F77} -c --no-silent ssplit2.f
  echo ${F77} -v -o ssplit2 -Xlinker -Map -Xlinker ssplit2.map ssplit2.o
-L. -lbugs
  ${F77} -v -o ssplit2 -Xlinker -Map -Xlinker ssplit2.map ssplit2.o -L.
-lbugs
#
  echo
  echo ${F77} -v -c --no-silent ssplit3.f
  ${F77} -v -c --no-silent ssplit3.f
  echo ${F77} -v -o ssplit3 -Xlinker -Map -Xlinker ssplit3.map ssplit3.o
-L. -lbugs
  ${F77} -v -o ssplit3 -Xlinker -Map -Xlinker ssplit3.map ssplit3.o -L.
-lbugs
#
  echo
  echo ${F77} -v -c --no-silent ssplit4.f
  ${F77} -v -c --no-silent ssplit4.f
  echo ${F77} -v -o ssplit4 -Xlinker -Map -Xlinker ssplit4.map ssplit4.o
-L. -lbugs
  ${F77} -v -o ssplit4 -Xlinker -Map -Xlinker ssplit4.map ssplit4.o -L.
-lbugs
#
  echo
  echo ${F77} -v -c --no-silent lgtest.f
  ${F77} -v -c --no-silent lgtest.f
  echo ${F77} -v -o lgtest -Xlinker -Map -Xlinker lgtest.map lgtest.o -L.
-lbugs
  ${F77} -v -o lgtest -Xlinker -Map -Xlinker lgtest.map lgtest.o -L. -lbugs
#
exit 0

bdbugs.f:

      block data bdbugs

      common /ctitle/ abab(100)
      data abab / 100*1. /
      end

cmbugs.f:

      subroutine nosplit(a)

      common / split / b(10)
      do i = 1,10
         b(i) = b(i) / a
      enddo
      return
      end

flbugs.f:

      real function flrndm(a)
      flrndm=1.
      return
      end

blbugs.f:

      subroutine glup(a)
      common / balanc / b
      a=a+b
      return
      end

spbugs.f:

      subroutine split(a)
      a=a/10.
      return
      end

labugs.c:

/* 
   This file contains routines taken from the lapack linear algebra library 
   and used  in MLPfit
*/


#include<stdio.h>
#include<string.h>
#include<stdlib.h>
#include<math.h>

/* MLP_lapack.h is a copy of the LAPACK 'f2c.h' */
/* #include "mlp_lapack.h" */

#define log10e 0.43429448190325182765

/* double d_lg10(doublereal *x) */
double d_lg10(double *x)
{
  /* return( log10e * log(*x) ); */
  return (-(*x));
}

loadcm.f:

      program loadcm

      common /ctitle/ abab(100)
      read(*,*)abab(1),abab(2)
      write(*,*)abab(1),abab(2)
      stop
      end

ssplit.f:

      program ssplit

      common /split/ a,b
      read(*,*)a,b
      call uffa
      write(*,*)a,b
      stop
      end

      subroutine uffa
      common /split/ a,b
      aa=flrndm(aa)
      bb=flrndm(bb)
      a=a+aa
      b=b+bb
      return
      end


ssplit2.f:

      program ssplit2

      read(*,*)a,b
      call uffa(a,b)
      write(*,*)a,b
      stop
      end

      subroutine uffa(a,b)
      aa=flrndm(aa)
      bb=flrndm(bb)
      a=a+aa
      b=b+bb
      return
      end

      real function flrndm(a)
      flrndm = 0.5
      return
      end

ssplit3.f:

      program ssplit3

      read(*,*)a,b
      call uffa(a,b)
      write(*,*)a,b
      stop
      end

      subroutine uffa(a,b)
      aa=flrndm(aa)
      bb=flrndm(bb)
      a=a+aa
      b=b+bb
      return
      end

ssplit4.f:

      program ssplit4

      common / balanc / a,b,n,m
      read(*,*)a,b
      call uffa(a,b)
      write(*,*)a,b
      stop
      end

      subroutine uffa(a,b)
      aa=flrndm(aa)
      bb=flrndm(bb)
      a=a+aa
      b=b+bb
      return
      end

lgtest.f:

      program lgtest

      double precision a,b
      read(*,*) a
      b=log10(a)
      write(*,*) b
      stop
      end

Comment 1 Alfredo Ferrari 2001-01-21 22:11:39 UTC
Created attachment 7948 [details]
examples showing the bugs

Comment 2 Alfredo Ferrari 2001-01-22 09:13:19 UTC
Created attachment 7970 [details]
It supersedes the previous attachment (the problem decsription was incomplete)

Comment 3 Alfredo Ferrari 2001-01-24 09:23:22 UTC
Created attachment 8128 [details]
sma e as previous ones, but containing the script dobugs as well

Comment 4 Andre Rubbia 2001-01-27 16:27:35 UTC
As a big user of FORTRAN, I can only recommend that such bugs be fixed before
any serious
fortran user switches to RH7. I was myself affected by problem with fortran in
RH7. We (unfortunately) had to switch back to RH6.2 to get our work done.



Comment 5 Paola Sala 2001-01-31 11:38:16 UTC
I work heavily with fortran programs, and I was awaiting the new compiler for
serious improvements. Now, I'm obliged to stick to RedHat 6.2 because this kind
of bugs make it impossible to switch to RH7. I'm quite puzzled to see that this
bug has still a NEW status 10 days after submission.

Comment 6 federico.carminati 2001-02-01 09:18:34 UTC
CERN has few thousands PC's under Linux and we are using large FORTRAN
applications. We are in contact with some hundreds research institutes around
the world for High Energy Physics, many of them using Linux. If the g77 bug is
not corrected we cannot move to RH7. This is of the highest urgency for us.
Please help! Thanks,
           Federico Carminati
           ALICE Experiment Software Framework Coordinator

Comment 7 Alfredo Ferrari 2001-02-02 13:25:48 UTC
The bugs are still there in Fisher as well.... Is RedHat looking into the
Bugzilla entries or are they are there just for fun? After almost 15 days the
bugs are still "new", not even assigned, despite the obvious severity of the
issue which is practically preventing any use of RH7 (and maybe RH7.1 since
Fisher is as buggy as RH7) for complex fortran scientific calculations.

Perhaps Linus is right claiming that "RH7==broken gcc"...

Comment 8 Jakub Jelinek 2001-02-02 18:31:06 UTC
I can see the first issue and will look what's ld doing there (gcc has nothing
to do with this) but the behaviour in the second issue is consistent on all
of RHL 6.1, RHL 6.2 and RHL 7.0. Linker works on symbols, it has no additional
information like this needs to be Fortran symbol or C symbol or C++ symbol.
I don't know why Fortran chose such crappy mangling (e.g. g++ 3.0 now uses
only names starting with _Z so they should not clash with C functions
(_ prefix is reserved)), but it uses it for quite a long time and it would need
strong reasons why that should be changed.
You can always pass -lg2c before other libraries on the command line so
fortran code will pick its stuff from libg2c.a.
I'd be really interested in ssplit.map which picks them from libg2c if you
don't pass -lg2c explicitely on the command line (neither 6.1 with binutils
2.9.1.* nor 6.2 with binutils 2.9.5* nor 7.0 with binutils 2.10.* picked it
from libg2c).

Comment 9 Jakub Jelinek 2001-02-03 18:45:59 UTC
Discussion about the first issue can be seen at
http://sources.redhat.com/ml/binutils/1999-12/msg00015.html
and the following quite long thread.
From what I understood from this, the change was made because of
requests from Fortran camp and also to match behaviour of Solaris
and HP native linkers. Currently the way it works is that if you
have a common symbol then an object is fetched from archive if
it contains that symbol unless it is common, so that its actual
definition can be brought in which makes a lot of sense.
Common symbols are a special case of global symbols, so you have to
be careful about namespace issues and never use e.g. the same name
for function and data or function and common. That's the same for
C or Fortran.

Comment 10 Alfredo Ferrari 2001-02-03 23:07:12 UTC
Dear Jakub: sorry but the point about loading subroutine/functions in place
of commons (example ssplit.f) is not the issue of the thread you
indicated to me. That thread is related to my last point, the one about 
automatic loading of block data. Probably the patch for this created the other
problem which is really a no go one. I checked on two HP-UX machines with

HP-UX hpica1e B.10.20 A 9000/780 2016300929 two-user license

HP-UX cveet188 B.11.00 U 9000/800 13704501 unlimited-user license

and both link correctly ssplit.f, see the following map file:

/opt/langtools/lib/crt0.o:
ssplit.o:
/disk1/users/alfredo/bug/libbugs.a(flbugs.o):
/usr/lib/libdld.2:
/usr/lib/libisamstub.1:
/usr/lib/libcl.sl:
/opt/fortran/lib/libisamstub.a(isamstub.o):
/usr/lib/libdld.2:
/usr/lib/libc.sl:
/usr/lib/milli.a(dyncallU.o):
/usr/lib/milli.a:
/opt/langtools/lib/crt0.o:
ssplit.o:
/disk1/users/alfredo/bug/libbugs.a:
/opt/fortran/lib/libisamstub.a:
/opt/langtools/lib/crt0.o:
/disk1/users/alfredo/bug/libbugs.a:
/opt/langtools/lib/crt0.o:

where clearly spbugs is not loaded. So the statement that this behaviour was
requested to get the GNU linker behave like the HP and SOLARIS native ones is
surely wrong at least for the HP one (but I am ready to bed it is wrong for
SOLARIS as well). The thread indicated a DIFFERENT thing: that is that block
data (elements in libraries containing data for a common symbol) were to be
loaded automagically. Apart that this is HIGHLY questionable and it is not the
behaviour of many native compilers/linkers, including IBM-AIX, Digital UNIX,
OpenVMS, the way it was implemented probably caused this absurd bug that we are
experiencing, that is that routines are loaded in place of commons... Again this
is wrong and IT IS NOT the behaviour of any native compiler/linker I tested,
including HP-UX, IBM-AIX, Digital Unix (at least up to version 4.0f), OpenVMS.
I have machines running with these systems and I checked once more just now, if
you like I can send the maps of all of them. I do not have at hands a SOLARIS
machine but I can check on monday. So please correct this! Meanwhile it would be
HIGHLY useful to have a patch for binutils to undo the patch discussed in that
thread if it is really the source of this problem, I can manage to produce a
patched rpm for us while waiting for an official fix.

Assuming this will be quickly corrected and going back  to the block data, data
for common symbols, I don't agree at all with the choice to get them loaded
automagically. I am speaking about fortran and I can make you examples where
this can ingenerate unpredictable behaviours, ie if you give on the linker
command line explicitly an object file with a block data with different values
(typical situation if you want to override the default values given in a
library). Which data get loaded, those of the user, or of the library?
Block data can be named in fortran and is nonsense they get loaded even their
name is never referenced! You could have 10 different block datas with different
names for the same common(s) with different values and select the one you want
to use or by explicitly linking it or with an external statement.... but if the
linker loads all of them it's a mess!!! I'll try to make a simple example and
understand what really goes on on RH7 in a similar situation.


Now the last point, the d_lg10 symbol: you are right, the "wrong" behaviour was
already there but I missed it because the way the map file is specifying it is
somewhat different (in the real life problem the symbol is contained in a
library module with a different name and it was not apparent that it was loaded
just because of this also in the past, I realized it when inspecting RH7
generated map files for the other problems). So this is not RH7 specific, but
still there is a lot of danger in it. Since the compiler/linker never complains
that the user is using a "reserved" internal symbol it is easy to have
situations where ie C programmers producing a general use library use symbols
being unaware they are reserved for fortran or other languages intrinsics, and
then the wrong module get loaded when linking against libraries which are
supposed to do completely different things. Any suggestion?


            Thanks a lot for your help/patience

Comment 11 Alfredo Ferrari 2001-02-04 18:19:00 UTC
Created attachment 8939 [details]
updated test cases with map files from HP-UX, IBM-AIX, Compaq TRU UNIX native compilers (see additional comments)

Comment 12 Alfredo Ferrari 2001-02-04 18:56:29 UTC
I included a new attachment. There are now a few other test cases (tstbd...)
showing possible problems with block data linking. 

I ran the examples on HP-UX, IBM-AIX and Compaq TRU64 UNIX (or whatever it is
the official name). I do not have a Solaris machine in my group, I'll look for
one on monday (surely there are many on the Cern site)
The system and compiler versions tested are:

HP-UX cveet188 B.11.00 U 9000/800 13704501 unlimited-user license
HP FORTRAN 77  Ver: B.11.00

AIX rsplus07 3 4 001360964C00
AIX XL Fortran Compiler  Version 03.02.0005.0004 

OSF1 alf3.cern.ch V4.0 1229 alpha
DIGITAL Fortran 77 V5.2-171

Linux pceet030.cern.ch 2.2.16-3smp #1 SMP Mon Jun 19 19:00:35 EDT 2000 i686
unknown
gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)

Linux pceet215.cern.ch 2.2.16-22 #1 Tue Aug 22 16:49:06 EDT 2000 i686 unknown
g77 version 2.96 20000731 (Red Hat Linux 7.0) (from FSF-g77 version 0.5.26
20000731 (Red Hat Linux 7.0)) (actually gcc...2.96-71 from Fisher)

The map files for all systems are included (_ux, _aix _du _62 and no underscore
for RH7)

Let me summarize the status:

a) the bug exposed by the example ssplit.f (loading of routines in place of   
commons) which could be due to the change discussed in the thread indicated by
Jakub, is clearly a very serious bug specific of RH7 only among the tested
systems. Again I bet that Solaris will not show the bug as well (if not all Cern
Solaris users would have been unable to link against the standard Cern
libraries). From the map files, you can see that all systems but RH7 are
ignoring the routine split. IBM-AIX issues a warning in the map file that the
library contains another definition of the same symbol but correctly skips it.
The other systems do not even issue any message. I hope everybody is
definitively convinced that this problem has to be fixed.

b) Automagically loading block datas: here every system behaves differently
(nice!) with the exception of RH6.2 and Compaq TRU64 Unix which behave the same
and, IMHO , in the more reasonable way. Running tstdb1, tstdb2, tstdb3 I got
           RH7    RH6.2    AIX    UX   Compaq TRU64
tstdb1     1.0     1.0     1.0    3.0      1.0
tstdb2     ---     2.0     1.0    3.0      2.0
tstdb3     ---     3.0     1.0    3.0      3.0

--- means that the linker refused to link because of duplicate symbols.
A quick look at the tstdb... sources should be convincing that the RH6.2 and
Compaq behaviour is the only reasonable one. RH7 at least refuses to link, AIX
and UX give results which are tottaly nonsense... Anyway I expressed my opinion
on this, further supported by the remark that breaking with past RH releases is
the major pain since it can break scripts/linking procedures etc. However one
could live with it provided users are warned with gigantic warnings that RH7 is
breaking with the past on this (and with the behaviour of many native compilers)
in a somewhat unnatural way, and provided the "collateral damage" (=> point a))
is repaired of course.

c) Possible clashes between internal fotran intrinsic symbols and user
libraries. Jakub pointed out correctly that this possibility was already 
there before RH7. I was unable to create similar situations on the other systems
since I do not know the internal symbols used for the intrinc mathematical
procedures (the fortran standard of course dictates only the symbols exposed to
the users). I feel that those used by g77 are a bit too naive to avoid possible
clashes but I understand that this cannot be easily changed.



Comment 13 Jakub Jelinek 2001-02-05 19:42:22 UTC
Here are Solaris 8 test results:
ld -V
ld: Software Generation Utilities - Solaris-ELF (4.0)
loadacm: takes in bdbugs due to ctitle common (like 7.0 and probably AIX)
tstdb1: 1.0 (like 7.0)
tstdb2: fails to link due to multiple definitions (like 7.0)
tstdb3: fails to link due to multiple definitions (like 7.0)
ssplit: does not take in spbugs (unlike 7.0)
lgtest: takes in labugs due to d_lg10 (like 7.0)
So I'll try to do something about ssplit (probably some rule it must
be non-function, or even data object, or even data object of the same size,
will play on Solaris to see what's its exact rule) and will leave the rest
as is, ok? GNU ld always tried to be as close to Solaris ld as possible
AFAIK (see e.g. new -z options in current GNU ld which are taken from Solaris).

Comment 14 Marcus Camen 2001-02-10 23:56:36 UTC
BTW: SuSE 7.1 shows the same behavior as RedHat 7 or Solaris:
tstbd1: 1.0
tstbd2: --- does not link
tstbd3: --- does not link

SuSE 7.1 comes with g77-2.25.2 and glibc-2.2
I'm sys-admin and physicist and we are heavily using the CERNLIBS. So I really 
don't know what to do....

Comment 15 Jakub Jelinek 2001-02-11 12:14:43 UTC
The fix for the ssplit case is in binutils-2.10.1.0.7-1 in rawhide
(built on friday, dunno if it is on ftp already) and in CVS binutils
(note it is not yet in H.J.Lu's latest binutils (2.10.1.0.7)).
It does not matter at all which gcc SuSE uses, the relevant package is binutils.


Note You need to log in before you can comment on or make changes to this bug.