Bug 147532

Summary: Compatibility between k2.4 and AS 2.1
Product: Red Hat Enterprise Linux 2.1 Reporter: Greg Gudenburr <ggudenbu>
Component: glibcAssignee: Jakub Jelinek <jakub>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 2.1CC: drepper, fweimer
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-02-22 08:39:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Greg Gudenburr 2005-02-08 21:06:54 UTC
Description of problem:
We have and application and library which is built on "Linux mashie
2.4.20-20.7smp #1 SMP Mon Aug 18 14:46:14 EDT 2003 i686 unknown". When
this library is combined with a compile and link on AS2.1 "Linux
mallet 2.4.9-e.57smp #1 SMP Thu Dec 2 20:51:12 EST 2004 i686 unknown"
the application crashes in libc in a print routine.  

This application is run on all other Unix platform (Solaris, HP, AIX,
OSF, Windows, OpenVMS) without problems. We have tried using purify on
Solaris with the application and no problem have been reported. 
Valgrind reports:
In this example, the program did a 1-byte read at address 0xA, which
is called from the _IO_vfprintf (../sysdeps/i386/i486/bits/string.h:530).
==19207==  Address 0xA is not stack'd, malloc'd or (recently) free'd
		Also we got four errors of an uninitialised value use. This is
reported when your program uses a value which hasn't been initialized.
==19207== Conditional jump or move depends on uninitialised value(s)
==19207==    at 0x1B9B59C3: _IO_vfprintf (in /lib/i686/libc-2.2.4.so)
==19207==    by 0x1B9BE03B: _IO_printf (printf.c:33)
==19207==    by 0x804B91C: main (emex.c:1157)

Also if the application is completely built on AS2.1 it seems to work
fine, and/or if the optimizer is turn-off on the mixed builds.



Version-Release number of selected component (if applicable):


How reproducible:
I have not been able to reduce the application down to a small test case.

Steps to Reproduce:
1. I can send the entire product if you like.
2.
3.
  
Actual results:
core dump

Expected results:
no core dump


Additional info:

Comment 1 Joe Orton 2005-02-09 15:19:17 UTC
It will be probably necessary to see the code calling printf to
diagnose this further.

Comment 2 Jakub Jelinek 2005-02-09 15:42:11 UTC
By library built on another system you mean shared library (lib*.so*) or ar
library (lib*.a)?
Binary compatibility guarantees are solely for programs and shared libraries,
linking and compiling must happen all on the same system, so only the former
should be gauranteed to work, with the latter you are on your own.

As for "Conditional jump or move depends on uninitialised value" warnings
by valgrind, these have very high number of false positives, be it because of
inlined strlen or similar functions or with various bitfield operations.

If you aren't mixing object files compiled on different systems (just shared
libraries), then we really need a small self-contained testcase to do something
with this.

Comment 3 Greg Gudenburr 2005-02-09 15:48:58 UTC
We are building a lib*.a and then compiling and linking with the .a on another 
system. So what you are saying is we have to have a build for ever version of 
Linux if we want to do this?  This is the only operating system I have ever 
heard having this requirement?

Greg

Comment 4 Greg Gudenburr 2005-02-09 15:50:35 UTC
One more comment, if I write the hello world of c programs and compile and 
link on a version of RedHat. Did I understand you to say, I cannot do this 
because RedHat is not binary compatable between versions?

Comment 5 Jakub Jelinek 2005-02-09 16:00:57 UTC
If the another system has sufficiently different glibc version, then chances are
high that one or more of system interfaces changed.  For programs and shared
libraries glibc handles this via symbol versioning.  Say in glibc 2.2 struct
shmid_ds changed and a new symbol version was introduced for shmctl that uses
this structure.  Older programs and shared libraries that were linked against
older glibc versions keep using shmctl function that provides
compatibility, while new programs/shared libraries use shmctl that
uses the new structure.  Symbol versions are assigned at link time, so if you
link on glibc 2.2+ system object files compiled on < 2.2 glibc that use shmctl,
the program will misbehave.  Similarly with many other symbols.

Within a single RHEL version compatibility is maintained even for object files,
but if you e.g. compile your program on RHAS 2.1 and link it on RHEL 3, then it
might work, or might not, or even might not link at all.

Comment 6 Greg Gudenburr 2005-02-09 16:34:07 UTC
So do you post a compatibility matrix or do I assume I have to compiler and 
link on every version?   This sure seems to raise the cost of ownership? If it 
is at all possible can you call me?

Greg

Comment 7 Jakub Jelinek 2005-02-09 16:38:10 UTC
Well, preferrably just create a shared library instead of .a library, then
you can build it just once and use everywhere.

Comment 8 Greg Gudenburr 2005-02-18 19:35:40 UTC
Hi,
   I am not Greg.....but I am confused....mainly because I whilst the above 
thread is interesting...it does not map to our situation. I will try to explain 
it a little better...

We build a library called foo.a (it is really static) on a box called H1. This 
library simply contains c objects and is vanilla higher level application 
code...simply using c runtime apis (say strcpy, malloc, free etc). Now we 
deploy foo.a to a box called H2 and use it to build an application called 
foo.exe. 

Now H1 is an earlier version of Redhat than H2. We crash deep in the C runtime 
on H2 when we run our application. This is what I do not understand. Now some 
more interesting points....if we build foo.exe on H1 it works on H1 and H2. If 
we build on H2 the resulting foo.exe will run on H1 too (somewhat suprising). 

To be honest I am lost with your points above...can you "draw me a picture" and 
explain it in simpler terms? Thanks for your help.

Comment 9 Ulrich Drepper 2005-02-22 00:49:49 UTC
Jakub explained it sufficiently, you just don't pick the right words out of his
comment, it seems.

Binary compatibility is only ever guaranteed for DSOs (shared libraries) and
executables.  This is why building foo.exe and then using it on either system works.

What is not guaranteed to work is to compile files, which in some way or another
reference system data structures etc (by including system headers), and then use
the object files (an .a file is nothing but a collection of .o files) on another
system with a different version of the OS.  This cannot work since now the
definitions from the headers seen during compilation and the symbols used in
linking are different.

To build a library which can be used across OS versions the symbols must be
bound at the same system used when compiling.  This means, create a DSO.  This
DSO then is usable on any system.

All AS2.1 and RHEL versions are upward compatible.  Any DSO and executable
created on an older version works on a newer version.

There are a couple of things you might want to look at:

~ Red Hat's software development environment which supports compilation
environments for AS2.1, RHEL3, and RHEL4 on the same RHEL4 machine

~ http://people.redhat.com/drepper/no_static_linking.html

If you want to understand more about the technical issues, read
http://people.redhat.com/drepper/dsohowto.pdf and
http://people.redhat.com/drepper/symbol-versioning.

Anyway, this bug should be closed as NOTABUG.  If you have more questions, ask
now, otherwise we assume you understand the issue.