Bug 593675 - [5.4] Unexpected failure of resolving a locally-defined symbol.
Summary: [5.4] Unexpected failure of resolving a locally-defined symbol.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: glibc
Version: 5.5
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Andreas Schwab
QA Contact: qe-baseos-tools-bugs
URL:
Whiteboard:
Depends On:
Blocks: 604191 604192 604193
TreeView+ depends on / blocked
 
Reported: 2010-05-19 13:29 UTC by Alan Matsuoka
Modified: 2018-10-27 12:13 UTC (History)
7 users (show)

Fixed In Version: glibc-2.5-52
Doc Type: Bug Fix
Doc Text:
Under certain circumstances, unloading a module could leave the remaining modules' symbol search list in an inconsistent state. Consequent to this inconsistency, symbol lookups could spuriously fail to find the symbol. This update corrects this: module unloading no longer produces inconsistent state in the symbol search list.
Clone Of:
Environment:
Last Closed: 2011-01-14 00:05:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
reproducer.tar.gz (941 bytes, application/x-gzip)
2010-05-19 13:30 UTC, Alan Matsuoka
no flags Details
reproducer_v2.tar.gz (964 bytes, application/x-gzip)
2010-05-19 13:33 UTC, Alan Matsuoka
no flags Details
sosreport-localhost-824332-cb491f.tar.bz2 (833.43 KB, application/x-bzip2)
2010-05-19 13:58 UTC, Alan Matsuoka
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0109 0 normal SHIPPED_LIVE glibc bug fix and enhancement update 2011-01-12 17:29:09 UTC

Description Alan Matsuoka 2010-05-19 13:29:56 UTC
Description of Problem:
We hit a failure of resolving a symbol which is locally defined in a library,
in a complex case that I'll describe below.
If the dynamic linker, glibc or something else has a bug in it, please fix it.
If we did something wrong on creating a program, please point it out.

We wrote a program that consists of an executable file and four libraries.
The executable file a.out requires two libraries, libA and libX.
The library libA requires another library libB.
The library libB requires another library libC.

The program operates in the following steps.

+------+      +-------+      +------+    +------+      +-----------------------------------+
| libX |  (2) | a.out | (1)  | libA |----| libB |------| libC  (5)        (7)              |
|      | <=== |  (8)  | ===> |      |    |      | <=== | atexit(libC_fini)--> _libC_fini() |
+------+  (3) +-------+ (6)  +------+    +------+  (4) +-----------------------------------+

(1) a.out calls dlopen() for libA.
    libB and libC shall be loaded, too.
(2) a.out calls dlopen() for libX.
(3) a.out calls dlclose() for libX.
(4) libC calls dlopen() for libB.
(5) libC calls atexit() to register libC_fini().
(6) a.out calls dlclose() for libA.
    libB and libC are no longer needed, so both libraries shall be getting unloaded.
(7) libC_fini() shall be called when libC is getting unloaded.
(8) a.out exits.

But when we run the program, unexpectedly

 a) step (7) is executed _after_ step (8)
 b) libC cannot resolve a locally-defined symbol _libC_fini at executing (7)

Version-Release number of selected component:
- Red Hat Enterprise Linux Version Number: 5
- Release Number: 4
- Architecture: x86_64
- Kernel Version: 2.6.18-164.el5
- Related Package Version: gcc-4.1.2-46.el5
- Related Middleware / Application: None

Drivers or hardware or architecture dependency:
None.

How reproducible:
always.

Step to Reproduce:
1) Extract the reproducer.
   $ tar zxvf reproducer.tar.gz

2) Compile it.
   $ cd reproducer
   $ make

3) Run it.
   $ ./run.sh

Actual Results:
 $ ./run.sh
 1)main:dlopen  libA.so
 2)main:dlopen  libX.so
 3)main:dlclose libX.so
 4)libC:dlopen  libB.so
 5)libC:atexit(libC_fini)
 6)main:dlclose libA.so
 8)main:finish main
 ./main: symbol lookup error: ./libC.so: undefined symbol: _libC_fini

Expected Results:
 $ ./run.sh
 1)main:dlopen  libA.so
 2)main:dlopen  libX.so
 3)main:dlclose libX.so
 4)libC:dlopen  libB.so
 5)libC:atexit(libC_fini)
 6)main:dlclose libA.so
 7)libC:finish - atexit()
 8)main:finish main

Summary of actions taken to resolve issue:
None.

Location of diagnostic data:
None.

Hardware configuration:
Model: PRIMERGY RX300 S5
CPU Info: Xeon(R) 2.27GHz x2
Memory Info: 3GB
Hardware Component Information: None
Configuration Info: None
Guest Configuration Info: None

Business Impact:
We have a middleware application which hit this problem and does not work well.
If we cannot ship it due to this problem, it will cause financial damage to our software business.


Target Release: 5.6

Errata Request: async errata for 5.5, 5.4, and 5.3 LLSS

Hotfix Request: None.

Additional Info:
Sosreport and reproducer are attached.
> 3)
>
> Why does calling libC_fini() succeed even though calling _libC_fini() fails?
>
> Ans:
> For the same reasons as above.

This answer does not satisfy us.
It just answered "Why calling _libC_fini() fails", but here we want to know
"Why calling libC_fini() succeeds" instead. They are completely different questions.

> 4)
>
> Here is another program "reproducer_v2.tar.gz" that works well without any
> "undefined symbol" error. We provide it, too.
> The difference between "reproducer" and "reproducer_v2" is whether to call
> dlopen()/dlclose for libX.so or not. The "reproducer_v2" does not call them.
> According to your explanation, it is a problem if "reproducer_v2" works well
> without any problem, because the RTLD_GLOBAL flag is not specified in it.
> Why does this work well?
>
> Ans:
> I'm looking further into this

We add some information of differences between them.

If a.out does not call dlopen()/dlclose() for libX.so ["reproducer_v2" case]:
- When libA.so is closed, dlclose() updates the l_scope list of libC.so.
- There is the scope for libC.so in the l_scope list (in the linkmap structure) of libC.so itself after a.out calls dlclose() for libA.so.

If a.out calls dlopen()/dlclose() for libX.so ["reproducer" case]:
- When libA.so is closed, dlclose() doesn't update the l_scope list of libC.so.
- There is no scope for libC.so in the l_scope list of any library (even libC.so itself) after a.out calls dlclose() for libA.so.

So obviously there is something wrong in dlclose().

Comment 1 Alan Matsuoka 2010-05-19 13:30:38 UTC
Created attachment 415105 [details]
reproducer.tar.gz

Comment 2 Alan Matsuoka 2010-05-19 13:33:01 UTC
Created attachment 415106 [details]
reproducer_v2.tar.gz

Comment 4 Alan Matsuoka 2010-05-19 13:58:31 UTC
Created attachment 415117 [details]
sosreport-localhost-824332-cb491f.tar.bz2

Comment 18 Jaromir Hradilek 2010-07-19 15:58:59 UTC
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.

New Contents:
Under certain circumstances, unloading a module could leave the remaining modules' symbol search list in an inconsistent state. Consequent to this inconsistency, symbol lookups could spuriously fail to find the symbol. This update corrects this: module unloading no longer produces inconsistent state in the symbol search list.

Comment 34 errata-xmlrpc 2011-01-14 00:05:05 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0109.html


Note You need to log in before you can comment on or make changes to this bug.