Bug 517001

Summary:

dlopen/dlclose of im-scim.so causes segfault

Product:

[Fedora] Fedora

Reporter:

Mamoru TASAKA <mtasaka>

Component:

glibc

Assignee:

Andreas Schwab <schwab>

Status:

CLOSED RAWHIDE

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

high

Docs Contact:

Priority:

high

Version:

rawhide

CC:

drepper, i18n-bugs, jakub, petersen, phuang, schwab, tagoh, zaitcev

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2009-09-28 15:03:02 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
test program	none
gdb log for this test case	none
gdb log for this test case (again)	none

Description Mamoru TASAKA 2009-08-12 09:41:48 UTC

Created attachment 357137 [details]
test program

Description of problem:
The attached test program causes segfault


Version-Release number of selected component (if applicable):
scim-gtk-1.4.9-2.fc12.i686

How reproducible:
100%

Steps to Reproduce:
1. Compile the attached test program with -ldl -g
2. execute
3.
  
Actual results:
The test program causes segfault

Expected results:
Shouldn't segfault

Additional info:
It seems that some nasty exit handler is executed (gdb log
is not useful, though)

Comment 1 Mamoru TASAKA 2009-08-12 09:43:30 UTC

Created attachment 357138 [details]
gdb log for this test case

Comment 2 Mamoru TASAKA 2009-08-12 09:46:18 UTC

Note:

I guess the main cause of bug 515350 and bug 514720
is this bug.

Comment 3 Mamoru TASAKA 2009-08-12 10:05:46 UTC

Created attachment 357145 [details]
gdb log for this test case (again)

(Please use this gdb log)

Comment 4 Jens Petersen 2009-08-26 05:36:50 UTC

But normally scim-gtk is not installed?

Comment 5 Jens Petersen 2009-08-26 05:40:53 UTC

So this is not specific to rawhide?

Comment 6 Jens Petersen 2009-08-26 05:51:09 UTC

(If not then I would suggest to report this upstream.)

Comment 7 Mamoru TASAKA 2009-08-26 07:36:37 UTC

(In reply to comment #4)
> But normally scim-gtk is not installed?  

This is a bug report against scim. Whether scim-gtk is installed
by default or not does not matter here.

(In reply to comment #5)
> So this is not specific to rawhide?  

I don't know.

(In reply to comment #6)
> (If not then I would suggest to report this upstream.)  

It is not so easy to determine if this is a bug also in
upstream scim or specific to Fedora because Fedora's scim
contains many patches.

Comment 8 Akira TAGOH 2009-08-27 01:50:02 UTC

(In reply to comment #5)
> So this is not specific to rawhide?  

I can see this issue on rawhide only, but the testing code works fine on F-11 say.

Comment 9 Mamoru TASAKA 2009-08-27 05:47:41 UTC

Well, I unpacked F-11 scim{-libs,-gtk}-1.4.8-3.fc11.i586
on my rawhide machine and tried the testing code and it does NOT
segfault.
However when I recompile scim-1.4.8-3.fc11 on my rawhide machine,
it DOES seem to segfault.

By the way when I recompile scim-1.4.9-2.fc12 on my rawhide machine
with 's/-O2/-O0/', it does NOT segfault, however with -O1 it segfaults.

Comment 10 Peng Huang 2009-08-27 06:05:53 UTC

So the root problem is probably in gcc or ld.
I created a workaround to fix this problem. I changed the link argument to make im-scim.so unloadable. Please try 
https://koji.fedoraproject.org/koji/taskinfo?taskID=1637285

Comment 11 Mamoru TASAKA 2009-08-27 06:45:56 UTC

I tried 1.4.9-3.fc12 and test program does not segfault anymore.

Comment 12 Mamoru TASAKA 2009-08-27 06:46:49 UTC

*** Bug 514720 has been marked as a duplicate of this bug. ***

Comment 13 Mamoru TASAKA 2009-08-27 06:47:53 UTC

*** Bug 515350 has been marked as a duplicate of this bug. ***

Comment 14 Mamoru TASAKA 2009-08-27 13:58:08 UTC

CC-ing to gcc maintainer.

Jakub, would you investigate what is the real cause?

Comment 15 Peng Huang 2009-08-28 06:01:56 UTC

Move this bug to gcc

Comment 16 Jakub Jelinek 2009-09-22 16:11:58 UTC

This has clearly nothing to do with gcc, looks like a glibc bug to me so far.

im-scim.so is dlopened, has DT_NEEDED on libscim-1.0.so.8.
In LD_DEBUG=all I see:
...
      4525:     symbol=_ZN4scim7PointerINS_10ConfigBaseEED1Ev;  lookup in file=/tmp/x [0]
      4525:     symbol=_ZN4scim7PointerINS_10ConfigBaseEED1Ev;  lookup in file=/lib/libdl.so.2 [0]
      4525:     symbol=_ZN4scim7PointerINS_10ConfigBaseEED1Ev;  lookup in file=/lib/libc.so.6 [0]
      4525:     symbol=_ZN4scim7PointerINS_10ConfigBaseEED1Ev;  lookup in file=/lib/ld-linux.so.2 [0]
      4525:     symbol=_ZN4scim7PointerINS_10ConfigBaseEED1Ev;  lookup in file=/usr/lib/gtk-2.0/immodules/im-scim.so [0]
      4525:     binding file /usr/lib/libscim-1.0.so.8 [0] to /usr/lib/gtk-2.0/immodules/im-scim.so [0]: normal symbol `_ZN4scim7PointerINS_10ConfigBaseEED1Ev' [LIBSCIM_1.0]
...
      4525:     symbol=_ZN4scim7PointerINS_10ConfigBaseEED1Ev;  lookup in file=/tmp/x [0]
      4525:     symbol=_ZN4scim7PointerINS_10ConfigBaseEED1Ev;  lookup in file=/lib/libdl.so.2 [0]
      4525:     symbol=_ZN4scim7PointerINS_10ConfigBaseEED1Ev;  lookup in file=/lib/libc.so.6 [0]
      4525:     symbol=_ZN4scim7PointerINS_10ConfigBaseEED1Ev;  lookup in file=/lib/ld-linux.so.2 [0]
      4525:     symbol=_ZN4scim7PointerINS_10ConfigBaseEED1Ev;  lookup in file=/usr/lib/gtk-2.0/immodules/im-scim.so [0]
      4525:     binding file /usr/lib/gtk-2.0/immodules/im-scim.so [0] to /usr/lib/gtk-2.0/immodules/im-scim.so [0]: normal symbol `_ZN4scim7PointerINS_10ConfigBaseEED1Ev'
...
      4525:     file=/usr/lib/gtk-2.0/immodules/im-scim.so [0];  destroying link map
but note that libscim-1.0.so.8 wasn't unloaded (presumably STB_GNU_UNIQUE in action).  __cxa_atexit was called twice with _ZN4scim7PointerINS_10ConfigBaseEED1Ev function (which resolved to the im-scim.so copy, libscim-1.0.so.8 has its own too), the first time with libscim-1.0.so.8's __dso_handle, the second time with im-scim.so's __dso_handle.
When im-scim.so was unloaded, __cxa_finalize removed the second dtor for that function, but as libscim-1.0.so.8 wasn't unloaded until exit, exit tries to call _ZN4scim7PointerINS_10ConfigBaseEED1Ev from im-scim.so, which no longer exists.  The questions are:
1) why isn't a relocation dependency generated
2) how could be im-scim.so unloaded when libscim-1.0.so.8 that has a relocation
   dependency on it and couldn't be unloaded.

You need to remove DF_1_NODELETE flag from im-scim.so to reproduce...

Comment 17 Jakub Jelinek 2009-09-25 09:10:07 UTC

Self-contained testcase:

#!/bin/sh
sed 's/_TAB_/\t/g' > Makefile <<\EOF
CXXFLAGS += -fpic -O2
n1: n1.o n2.so
_TAB_$(CC) -o n1 n1.c -ldl
n2.so: n2.o n3.so n4.so
_TAB_$(CXX) -shared -o $@ $< ./n3.so ./n4.so
n3.so: n3.o n3.map
_TAB_$(CXX) -shared -o $@ $< -Wl,--version-script,n3.map
#_TAB_$(CXX) -shared -o $@ $<
n4.so: n4.o
_TAB_$(CXX) -shared -o $@ $<
clean:
_TAB_rm -f *.o *~ *core *.so n1
EOF
cat > n1.c <<\EOF
#include <dlfcn.h>
int
main (void)
{
  void *handle = dlopen ("./n2.so", RTLD_LAZY);
  if (handle)
    dlclose (handle);
  return 0;
}
EOF
cat > n2.C <<\EOF
#include <stdlib.h>
inline void foo (void)
{
}
__attribute__((constructor))
void ctor (void)
{
  atexit (foo);
}
EOF
cat > n3.C <<\EOF
#include <stdlib.h>
inline void foo (void)
{
}
inline int bar (void)
{
  static int barvar;
  return ++barvar;
}
int (*barp) (void) = bar;
__attribute__((constructor))
void ctor (void)
{
  atexit (foo);
}
EOF
cat > n3.map <<\EOF
N3 {
  global:
    _ZZ3barvE6barvar; barp; _Z3foov;
  local:
    *;
};
EOF
cat > n4.C <<\EOF
inline int bar (void)
{
  static int barvar;
  return ++barvar;
}
int (*barp2) (void) = bar;
EOF

Needs to be compiled with F12 gcc, so that _ZZ3barvE6barvar is STB_GNU_UNIQUE.

If _Z3foov isn't versioned in n3.so, it works just fine, supposedly because a relocation dependency is added (or, if that happens after _ZZ3barvE6barvar lookup which marks n3.so as DF_1_NODELETE, just marks the undef_map as DF_1_NODELETE too).  I think the problem is that
dl-reloc.c (RESOLVE_MAP) has:
             int flags = DL_LOOKUP_ADD_DEPENDENCY;                            \
             if ((version) != NULL && (version)->hash != 0)                   \
               {                                                              \
                 v = (version);                                               \
                 flags = 0;                                                   \
               }                                                              \
             _lr = _dl_lookup_symbol_x (strtab + (*ref)->st_name, l, (ref),   \
                                        scope, v, _tc, flags, NULL);          \
In the testcase version != NULL && version->hash != 0 and so it doesn't add a relocation dependency, even when it resolves to a completely different library.

Comment 18 Andreas Schwab 2009-09-25 14:22:10 UTC

When DL_LOOKUP_ADD_DEPENDENCY was introduced not all callers of _dl_lookup_versioned_symbol were properly adjusted.

Comment 19 Andreas Schwab 2009-09-28 15:03:02 UTC

Fixed in 2.10.90-24.

Comment 20 Mamoru TASAKA 2009-09-28 19:01:02 UTC

Actually I tried to remove Patch33 in devel scim.spec and
I don't see this segfault any more.
So I think it is better to remove Patch33 workaround on scim.spec
and rebuild scim.

Comment 21 Peng Huang 2009-09-29 05:07:53 UTC

(In reply to comment #20)
> Actually I tried to remove Patch33 in devel scim.spec and
> I don't see this segfault any more.
> So I think it is better to remove Patch33 workaround on scim.spec
> and rebuild scim.  

I have removed the workaround patch added in scim-1.4.9-3.