Bug 517001
| Summary: | dlopen/dlclose of im-scim.so causes segfault | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Mamoru TASAKA <mtasaka> | ||||||||
| Component: | glibc | Assignee: | Andreas Schwab <schwab> | ||||||||
| Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | high | ||||||||||
| Version: | rawhide | CC: | drepper, i18n-bugs, jakub, petersen, phuang, schwab, tagoh, zaitcev | ||||||||
| Target Milestone: | --- | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | All | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2009-09-28 15:03:02 UTC | Type: | --- | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
Created attachment 357138 [details]
gdb log for this test case
Note: I guess the main cause of bug 515350 and bug 514720 is this bug. Created attachment 357145 [details]
gdb log for this test case (again)
(Please use this gdb log)
But normally scim-gtk is not installed? So this is not specific to rawhide? (If not then I would suggest to report this upstream.) (In reply to comment #4) > But normally scim-gtk is not installed? This is a bug report against scim. Whether scim-gtk is installed by default or not does not matter here. (In reply to comment #5) > So this is not specific to rawhide? I don't know. (In reply to comment #6) > (If not then I would suggest to report this upstream.) It is not so easy to determine if this is a bug also in upstream scim or specific to Fedora because Fedora's scim contains many patches. (In reply to comment #5) > So this is not specific to rawhide? I can see this issue on rawhide only, but the testing code works fine on F-11 say. Well, I unpacked F-11 scim{-libs,-gtk}-1.4.8-3.fc11.i586
on my rawhide machine and tried the testing code and it does NOT
segfault.
However when I recompile scim-1.4.8-3.fc11 on my rawhide machine,
it DOES seem to segfault.
By the way when I recompile scim-1.4.9-2.fc12 on my rawhide machine
with 's/-O2/-O0/', it does NOT segfault, however with -O1 it segfaults.
So the root problem is probably in gcc or ld. I created a workaround to fix this problem. I changed the link argument to make im-scim.so unloadable. Please try https://koji.fedoraproject.org/koji/taskinfo?taskID=1637285 I tried 1.4.9-3.fc12 and test program does not segfault anymore. *** Bug 514720 has been marked as a duplicate of this bug. *** *** Bug 515350 has been marked as a duplicate of this bug. *** CC-ing to gcc maintainer. Jakub, would you investigate what is the real cause? Move this bug to gcc This has clearly nothing to do with gcc, looks like a glibc bug to me so far.
im-scim.so is dlopened, has DT_NEEDED on libscim-1.0.so.8.
In LD_DEBUG=all I see:
...
4525: symbol=_ZN4scim7PointerINS_10ConfigBaseEED1Ev; lookup in file=/tmp/x [0]
4525: symbol=_ZN4scim7PointerINS_10ConfigBaseEED1Ev; lookup in file=/lib/libdl.so.2 [0]
4525: symbol=_ZN4scim7PointerINS_10ConfigBaseEED1Ev; lookup in file=/lib/libc.so.6 [0]
4525: symbol=_ZN4scim7PointerINS_10ConfigBaseEED1Ev; lookup in file=/lib/ld-linux.so.2 [0]
4525: symbol=_ZN4scim7PointerINS_10ConfigBaseEED1Ev; lookup in file=/usr/lib/gtk-2.0/immodules/im-scim.so [0]
4525: binding file /usr/lib/libscim-1.0.so.8 [0] to /usr/lib/gtk-2.0/immodules/im-scim.so [0]: normal symbol `_ZN4scim7PointerINS_10ConfigBaseEED1Ev' [LIBSCIM_1.0]
...
4525: symbol=_ZN4scim7PointerINS_10ConfigBaseEED1Ev; lookup in file=/tmp/x [0]
4525: symbol=_ZN4scim7PointerINS_10ConfigBaseEED1Ev; lookup in file=/lib/libdl.so.2 [0]
4525: symbol=_ZN4scim7PointerINS_10ConfigBaseEED1Ev; lookup in file=/lib/libc.so.6 [0]
4525: symbol=_ZN4scim7PointerINS_10ConfigBaseEED1Ev; lookup in file=/lib/ld-linux.so.2 [0]
4525: symbol=_ZN4scim7PointerINS_10ConfigBaseEED1Ev; lookup in file=/usr/lib/gtk-2.0/immodules/im-scim.so [0]
4525: binding file /usr/lib/gtk-2.0/immodules/im-scim.so [0] to /usr/lib/gtk-2.0/immodules/im-scim.so [0]: normal symbol `_ZN4scim7PointerINS_10ConfigBaseEED1Ev'
...
4525: file=/usr/lib/gtk-2.0/immodules/im-scim.so [0]; destroying link map
but note that libscim-1.0.so.8 wasn't unloaded (presumably STB_GNU_UNIQUE in action). __cxa_atexit was called twice with _ZN4scim7PointerINS_10ConfigBaseEED1Ev function (which resolved to the im-scim.so copy, libscim-1.0.so.8 has its own too), the first time with libscim-1.0.so.8's __dso_handle, the second time with im-scim.so's __dso_handle.
When im-scim.so was unloaded, __cxa_finalize removed the second dtor for that function, but as libscim-1.0.so.8 wasn't unloaded until exit, exit tries to call _ZN4scim7PointerINS_10ConfigBaseEED1Ev from im-scim.so, which no longer exists. The questions are:
1) why isn't a relocation dependency generated
2) how could be im-scim.so unloaded when libscim-1.0.so.8 that has a relocation
dependency on it and couldn't be unloaded.
You need to remove DF_1_NODELETE flag from im-scim.so to reproduce...
Self-contained testcase:
#!/bin/sh
sed 's/_TAB_/\t/g' > Makefile <<\EOF
CXXFLAGS += -fpic -O2
n1: n1.o n2.so
_TAB_$(CC) -o n1 n1.c -ldl
n2.so: n2.o n3.so n4.so
_TAB_$(CXX) -shared -o $@ $< ./n3.so ./n4.so
n3.so: n3.o n3.map
_TAB_$(CXX) -shared -o $@ $< -Wl,--version-script,n3.map
#_TAB_$(CXX) -shared -o $@ $<
n4.so: n4.o
_TAB_$(CXX) -shared -o $@ $<
clean:
_TAB_rm -f *.o *~ *core *.so n1
EOF
cat > n1.c <<\EOF
#include <dlfcn.h>
int
main (void)
{
void *handle = dlopen ("./n2.so", RTLD_LAZY);
if (handle)
dlclose (handle);
return 0;
}
EOF
cat > n2.C <<\EOF
#include <stdlib.h>
inline void foo (void)
{
}
__attribute__((constructor))
void ctor (void)
{
atexit (foo);
}
EOF
cat > n3.C <<\EOF
#include <stdlib.h>
inline void foo (void)
{
}
inline int bar (void)
{
static int barvar;
return ++barvar;
}
int (*barp) (void) = bar;
__attribute__((constructor))
void ctor (void)
{
atexit (foo);
}
EOF
cat > n3.map <<\EOF
N3 {
global:
_ZZ3barvE6barvar; barp; _Z3foov;
local:
*;
};
EOF
cat > n4.C <<\EOF
inline int bar (void)
{
static int barvar;
return ++barvar;
}
int (*barp2) (void) = bar;
EOF
Needs to be compiled with F12 gcc, so that _ZZ3barvE6barvar is STB_GNU_UNIQUE.
If _Z3foov isn't versioned in n3.so, it works just fine, supposedly because a relocation dependency is added (or, if that happens after _ZZ3barvE6barvar lookup which marks n3.so as DF_1_NODELETE, just marks the undef_map as DF_1_NODELETE too). I think the problem is that
dl-reloc.c (RESOLVE_MAP) has:
int flags = DL_LOOKUP_ADD_DEPENDENCY; \
if ((version) != NULL && (version)->hash != 0) \
{ \
v = (version); \
flags = 0; \
} \
_lr = _dl_lookup_symbol_x (strtab + (*ref)->st_name, l, (ref), \
scope, v, _tc, flags, NULL); \
In the testcase version != NULL && version->hash != 0 and so it doesn't add a relocation dependency, even when it resolves to a completely different library.
When DL_LOOKUP_ADD_DEPENDENCY was introduced not all callers of _dl_lookup_versioned_symbol were properly adjusted. Fixed in 2.10.90-24. Actually I tried to remove Patch33 in devel scim.spec and I don't see this segfault any more. So I think it is better to remove Patch33 workaround on scim.spec and rebuild scim. (In reply to comment #20) > Actually I tried to remove Patch33 in devel scim.spec and > I don't see this segfault any more. > So I think it is better to remove Patch33 workaround on scim.spec > and rebuild scim. I have removed the workaround patch added in scim-1.4.9-3. |
Created attachment 357137 [details] test program Description of problem: The attached test program causes segfault Version-Release number of selected component (if applicable): scim-gtk-1.4.9-2.fc12.i686 How reproducible: 100% Steps to Reproduce: 1. Compile the attached test program with -ldl -g 2. execute 3. Actual results: The test program causes segfault Expected results: Shouldn't segfault Additional info: It seems that some nasty exit handler is executed (gdb log is not useful, though)