Bug 517001 - dlopen/dlclose of im-scim.so causes segfault
Summary: dlopen/dlclose of im-scim.so causes segfault
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: rawhide
Hardware: All
OS: Linux
high
high
Target Milestone: ---
Assignee: Andreas Schwab
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 514720 515350 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-08-12 09:41 UTC by Mamoru TASAKA
Modified: 2009-09-29 05:07 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-09-28 15:03:02 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
test program (464 bytes, text/plain)
2009-08-12 09:41 UTC, Mamoru TASAKA
no flags Details
gdb log for this test case (8.00 KB, text/plain)
2009-08-12 09:43 UTC, Mamoru TASAKA
no flags Details
gdb log for this test case (again) (3.82 KB, text/plain)
2009-08-12 10:05 UTC, Mamoru TASAKA
no flags Details

Description Mamoru TASAKA 2009-08-12 09:41:48 UTC
Created attachment 357137 [details]
test program

Description of problem:
The attached test program causes segfault


Version-Release number of selected component (if applicable):
scim-gtk-1.4.9-2.fc12.i686

How reproducible:
100%

Steps to Reproduce:
1. Compile the attached test program with -ldl -g
2. execute
3.
  
Actual results:
The test program causes segfault

Expected results:
Shouldn't segfault

Additional info:
It seems that some nasty exit handler is executed (gdb log
is not useful, though)

Comment 1 Mamoru TASAKA 2009-08-12 09:43:30 UTC
Created attachment 357138 [details]
gdb log for this test case

Comment 2 Mamoru TASAKA 2009-08-12 09:46:18 UTC
Note:

I guess the main cause of bug 515350 and bug 514720
is this bug.

Comment 3 Mamoru TASAKA 2009-08-12 10:05:46 UTC
Created attachment 357145 [details]
gdb log for this test case (again)

(Please use this gdb log)

Comment 4 Jens Petersen 2009-08-26 05:36:50 UTC
But normally scim-gtk is not installed?

Comment 5 Jens Petersen 2009-08-26 05:40:53 UTC
So this is not specific to rawhide?

Comment 6 Jens Petersen 2009-08-26 05:51:09 UTC
(If not then I would suggest to report this upstream.)

Comment 7 Mamoru TASAKA 2009-08-26 07:36:37 UTC
(In reply to comment #4)
> But normally scim-gtk is not installed?  

This is a bug report against scim. Whether scim-gtk is installed
by default or not does not matter here.

(In reply to comment #5)
> So this is not specific to rawhide?  

I don't know.

(In reply to comment #6)
> (If not then I would suggest to report this upstream.)  

It is not so easy to determine if this is a bug also in
upstream scim or specific to Fedora because Fedora's scim
contains many patches.

Comment 8 Akira TAGOH 2009-08-27 01:50:02 UTC
(In reply to comment #5)
> So this is not specific to rawhide?  

I can see this issue on rawhide only, but the testing code works fine on F-11 say.

Comment 9 Mamoru TASAKA 2009-08-27 05:47:41 UTC
Well, I unpacked F-11 scim{-libs,-gtk}-1.4.8-3.fc11.i586
on my rawhide machine and tried the testing code and it does NOT
segfault.
However when I recompile scim-1.4.8-3.fc11 on my rawhide machine,
it DOES seem to segfault.

By the way when I recompile scim-1.4.9-2.fc12 on my rawhide machine
with 's/-O2/-O0/', it does NOT segfault, however with -O1 it segfaults.

Comment 10 Peng Huang 2009-08-27 06:05:53 UTC
So the root problem is probably in gcc or ld.
I created a workaround to fix this problem. I changed the link argument to make im-scim.so unloadable. Please try 
https://koji.fedoraproject.org/koji/taskinfo?taskID=1637285

Comment 11 Mamoru TASAKA 2009-08-27 06:45:56 UTC
I tried 1.4.9-3.fc12 and test program does not segfault anymore.

Comment 12 Mamoru TASAKA 2009-08-27 06:46:49 UTC
*** Bug 514720 has been marked as a duplicate of this bug. ***

Comment 13 Mamoru TASAKA 2009-08-27 06:47:53 UTC
*** Bug 515350 has been marked as a duplicate of this bug. ***

Comment 14 Mamoru TASAKA 2009-08-27 13:58:08 UTC
CC-ing to gcc maintainer.

Jakub, would you investigate what is the real cause?

Comment 15 Peng Huang 2009-08-28 06:01:56 UTC
Move this bug to gcc

Comment 16 Jakub Jelinek 2009-09-22 16:11:58 UTC
This has clearly nothing to do with gcc, looks like a glibc bug to me so far.

im-scim.so is dlopened, has DT_NEEDED on libscim-1.0.so.8.
In LD_DEBUG=all I see:
...
      4525:     symbol=_ZN4scim7PointerINS_10ConfigBaseEED1Ev;  lookup in file=/tmp/x [0]
      4525:     symbol=_ZN4scim7PointerINS_10ConfigBaseEED1Ev;  lookup in file=/lib/libdl.so.2 [0]
      4525:     symbol=_ZN4scim7PointerINS_10ConfigBaseEED1Ev;  lookup in file=/lib/libc.so.6 [0]
      4525:     symbol=_ZN4scim7PointerINS_10ConfigBaseEED1Ev;  lookup in file=/lib/ld-linux.so.2 [0]
      4525:     symbol=_ZN4scim7PointerINS_10ConfigBaseEED1Ev;  lookup in file=/usr/lib/gtk-2.0/immodules/im-scim.so [0]
      4525:     binding file /usr/lib/libscim-1.0.so.8 [0] to /usr/lib/gtk-2.0/immodules/im-scim.so [0]: normal symbol `_ZN4scim7PointerINS_10ConfigBaseEED1Ev' [LIBSCIM_1.0]
...
      4525:     symbol=_ZN4scim7PointerINS_10ConfigBaseEED1Ev;  lookup in file=/tmp/x [0]
      4525:     symbol=_ZN4scim7PointerINS_10ConfigBaseEED1Ev;  lookup in file=/lib/libdl.so.2 [0]
      4525:     symbol=_ZN4scim7PointerINS_10ConfigBaseEED1Ev;  lookup in file=/lib/libc.so.6 [0]
      4525:     symbol=_ZN4scim7PointerINS_10ConfigBaseEED1Ev;  lookup in file=/lib/ld-linux.so.2 [0]
      4525:     symbol=_ZN4scim7PointerINS_10ConfigBaseEED1Ev;  lookup in file=/usr/lib/gtk-2.0/immodules/im-scim.so [0]
      4525:     binding file /usr/lib/gtk-2.0/immodules/im-scim.so [0] to /usr/lib/gtk-2.0/immodules/im-scim.so [0]: normal symbol `_ZN4scim7PointerINS_10ConfigBaseEED1Ev'
...
      4525:     file=/usr/lib/gtk-2.0/immodules/im-scim.so [0];  destroying link map
but note that libscim-1.0.so.8 wasn't unloaded (presumably STB_GNU_UNIQUE in action).  __cxa_atexit was called twice with _ZN4scim7PointerINS_10ConfigBaseEED1Ev function (which resolved to the im-scim.so copy, libscim-1.0.so.8 has its own too), the first time with libscim-1.0.so.8's __dso_handle, the second time with im-scim.so's __dso_handle.
When im-scim.so was unloaded, __cxa_finalize removed the second dtor for that function, but as libscim-1.0.so.8 wasn't unloaded until exit, exit tries to call _ZN4scim7PointerINS_10ConfigBaseEED1Ev from im-scim.so, which no longer exists.  The questions are:
1) why isn't a relocation dependency generated
2) how could be im-scim.so unloaded when libscim-1.0.so.8 that has a relocation
   dependency on it and couldn't be unloaded.

You need to remove DF_1_NODELETE flag from im-scim.so to reproduce...

Comment 17 Jakub Jelinek 2009-09-25 09:10:07 UTC
Self-contained testcase:

#!/bin/sh
sed 's/_TAB_/\t/g' > Makefile <<\EOF
CXXFLAGS += -fpic -O2
n1: n1.o n2.so
_TAB_$(CC) -o n1 n1.c -ldl
n2.so: n2.o n3.so n4.so
_TAB_$(CXX) -shared -o $@ $< ./n3.so ./n4.so
n3.so: n3.o n3.map
_TAB_$(CXX) -shared -o $@ $< -Wl,--version-script,n3.map
#_TAB_$(CXX) -shared -o $@ $<
n4.so: n4.o
_TAB_$(CXX) -shared -o $@ $<
clean:
_TAB_rm -f *.o *~ *core *.so n1
EOF
cat > n1.c <<\EOF
#include <dlfcn.h>
int
main (void)
{
  void *handle = dlopen ("./n2.so", RTLD_LAZY);
  if (handle)
    dlclose (handle);
  return 0;
}
EOF
cat > n2.C <<\EOF
#include <stdlib.h>
inline void foo (void)
{
}
__attribute__((constructor))
void ctor (void)
{
  atexit (foo);
}
EOF
cat > n3.C <<\EOF
#include <stdlib.h>
inline void foo (void)
{
}
inline int bar (void)
{
  static int barvar;
  return ++barvar;
}
int (*barp) (void) = bar;
__attribute__((constructor))
void ctor (void)
{
  atexit (foo);
}
EOF
cat > n3.map <<\EOF
N3 {
  global:
    _ZZ3barvE6barvar; barp; _Z3foov;
  local:
    *;
};
EOF
cat > n4.C <<\EOF
inline int bar (void)
{
  static int barvar;
  return ++barvar;
}
int (*barp2) (void) = bar;
EOF

Needs to be compiled with F12 gcc, so that _ZZ3barvE6barvar is STB_GNU_UNIQUE.

If _Z3foov isn't versioned in n3.so, it works just fine, supposedly because a relocation dependency is added (or, if that happens after _ZZ3barvE6barvar lookup which marks n3.so as DF_1_NODELETE, just marks the undef_map as DF_1_NODELETE too).  I think the problem is that
dl-reloc.c (RESOLVE_MAP) has:
             int flags = DL_LOOKUP_ADD_DEPENDENCY;                            \
             if ((version) != NULL && (version)->hash != 0)                   \
               {                                                              \
                 v = (version);                                               \
                 flags = 0;                                                   \
               }                                                              \
             _lr = _dl_lookup_symbol_x (strtab + (*ref)->st_name, l, (ref),   \
                                        scope, v, _tc, flags, NULL);          \
In the testcase version != NULL && version->hash != 0 and so it doesn't add a relocation dependency, even when it resolves to a completely different library.

Comment 18 Andreas Schwab 2009-09-25 14:22:10 UTC
When DL_LOOKUP_ADD_DEPENDENCY was introduced not all callers of _dl_lookup_versioned_symbol were properly adjusted.

Comment 19 Andreas Schwab 2009-09-28 15:03:02 UTC
Fixed in 2.10.90-24.

Comment 20 Mamoru TASAKA 2009-09-28 19:01:02 UTC
Actually I tried to remove Patch33 in devel scim.spec and
I don't see this segfault any more.
So I think it is better to remove Patch33 workaround on scim.spec
and rebuild scim.

Comment 21 Peng Huang 2009-09-29 05:07:53 UTC
(In reply to comment #20)
> Actually I tried to remove Patch33 in devel scim.spec and
> I don't see this segfault any more.
> So I think it is better to remove Patch33 workaround on scim.spec
> and rebuild scim.  

I have removed the workaround patch added in scim-1.4.9-3.


Note You need to log in before you can comment on or make changes to this bug.