Bug 1327623
| Summary: | replacing .so which was opened and closed, leads to segfault on next dlopen/dlsym | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Paulo Andrade <pandrade> |
| Component: | glibc | Assignee: | Carlos O'Donell <codonell> |
| Status: | CLOSED NOTABUG | QA Contact: | qe-baseos-tools-bugs |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 6.7 | CC: | aglotov, ashankar, fweimer, gagriogi, mnewsome, pfrankli, qguo |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-04-27 16:24:47 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Small/simple reproducer is in sourceware related bug report: https://sourceware.org/bugzilla/show_bug.cgi?id=19773 We might revive a 2001 discussion if requiring kernel help: http://yarchive.net/comp/linux/map_copy.html Making a "real" private copy (using read instead of mmap) of every shared object would use too many resources. This for the sake of the library being overwritten, and MAP_DENYWRITE no longer existing, due to being a DoS vector. This would not prevent from when the file is actually changed, but maybe only the relocations could be "truly" private? inotify would be too late, as it would tell about the change after it was overwritten. The issue is caused by a design decision in Linux. Files mmap'ed will have memory contents changed if the backing file is overwritten in disk. In this case, the file is being overwritten with an identical copy, and the side effect are some address relocations being undone. While the Linux behavior looks wrong for loaded shared libraries that are overwritten, the required feature to make this special condition work, would easily be "abused", and, in most cases it is indeed desired to have disk content changes mapped back to the memory, that is, the mmap is just a fast way to access the file contents, otherwise, just copy the file contents to a memory buffer. Shared libraries should only very seldomly be changed in disk. So, the mechanism used is very fast and cheap on resource usage. If disk changes were to not be mirrored in the memory image, it should not use a backing file. The solution is to use the same logic used when updating shared libraries, that is to not use the 'cp' command. The simplest and safest way is to use a temporary file, and use the 'mv' command. If you cannot find where the shared object is being changed, one quick workaround is to use the 'chattr +i' command. A more complete solution is to check when/where/why the shared object is being changed, and use the 'mv' command, or if in pure C, first 'creat' the new, temporary file, then 'rename' the target to another name (possibly to a temporary backup name), then 'rename' the new file to the target, and finally 'unlink' the original file. One quick example of detecting when a file contents is changed is to use the at script https://sourceware.org/systemtap/examples/io/inodewatch.stp Another option is to use audit. See 'man audit.rules' for details. Before opening a new bug report, is this scenario expected?
The top command does:
Libnuma_handle = dlopen("libnuma.so", RTLD_LAZY)
and at exit it does dlclose(Libnuma_handle)
A user has a segfault like this:
#0 _dl_fini () at dl-fini.c:249
249 + l->l_info[DT_FINI_ARRAY]->d_un.d_ptr);
(gdb) bt
#0 _dl_fini () at dl-fini.c:249
#1 0x00007fdb9ecf8b69 in __run_exit_handlers (status=status@entry=0, listp=0x7fdb9f0856c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#2 0x00007fdb9ecf8bb7 in __GI_exit (status=status@entry=0) at exit.c:99
#3 0x0000000000405528 in bye_bye (str=str@entry=0x0) at top.c:564
#4 0x0000000000406aaf in sig_endpgm (dont_care_sig=<optimized out>) at top.c:625
#5 <signal handler called>
#6 0x00007fdb9fb52787 in munmap () at ../sysdeps/unix/syscall-template.S:81
#7 0x00007fdb9fb506ed in _dl_unmap (map=map@entry=0x7873e0) at ../sysdeps/x86_64/tlsdesc.c:139
#8 0x00007fdb9fb4e317 in _dl_close_worker (map=map@entry=0x7873e0) at dl-close.c:634
#9 0x00007fdb9fb4ecec in _dl_close (_map=0x7873e0) at dl-close.c:776
#10 0x00007fdb9fb48714 in _dl_catch_error (objname=0x7873c0, errstring=0x7873c8, mallocedp=0x7873b8, operate=0x7fdb9f08cfa0 <dlclose_doit>, args=0x7873e0) at dl-error.c:177
#11 0x00007fdb9f08d4ed in _dlerror_run (operate=operate@entry=0x7fdb9f08cfa0 <dlclose_doit>, args=0x7873e0) at dlerror.c:163
#12 0x00007fdb9f08cfcf in __dlclose (handle=<optimized out>) at dlclose.c:47
#13 0x00000000004054fa in bye_bye (str=str@entry=0x0) at top.c:557
#14 0x0000000000407ce3 in ioch (ech=0, cnt=127, buf=0x6193e0 <buf.9827> "") at top.c:996
#15 0x0000000000407f97 in iokey (action=action@entry=1) at top.c:1069
#16 0x00000000004034d9 in main (dont_care_argc=<optimized out>, argv=<optimized out>) at top.c:5731
checking logs, the numactl-libs package was updated while top had a
replaced library dlopen'ened.
The numactl-libs installation does not do anything special, so I presume it
does the standard procedures to not overwrite in the same inode.
Maybe a special condition with dlopen'ed shared objects?
(the issue commented above is for rhel7)
(In reply to Paulo Andrade from comment #18) > #2 0x00007fdb9ecf8bb7 in __GI_exit (status=status@entry=0) at exit.c:99 It is not safe to call exit() from a signal handler because exit() is not AS-safe. This is a bug in top.c. > #3 0x0000000000405528 in bye_bye (str=str@entry=0x0) at top.c:564 > #4 0x0000000000406aaf in sig_endpgm (dont_care_sig=<optimized out>) at > top.c:625 > #5 <signal handler called> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Signal handler called. You may only call AS-safe functions from a signal handler. You may call _exit()/_Exit() but not exit() (which runs destructors). Thanks. You are right. This is somewhat of a variant of bz#1737552 Sorry for being too quick and not properly checking it. It is not uncommon to exit the 'top' command with Ctrl-C, just that apparently, when updating the package with the shared library it did dlopen, the crash might be easy to reproduce. (In reply to Paulo Andrade from comment #20) > It is not uncommon to exit the 'top' command with Ctrl-C, just > that apparently, when updating the package with the shared library > it did dlopen, the crash might be easy to reproduce. If you're shutting down the process you don't need to dlclose() since the entire VMA will be unmapped by the kernel on exit, but I can understand how some generic code paths are called on shutdown. You have to take special measures in the signal handler, or set a global flag, return, and let the event loop shutdown the process. Yes. The code also did reenter the 'bye_bye' function. What I understand is that 'q' was pressed, then it was too slow to exit and the user pressed Ctrl-C. Issue reported at bz#1820357 |
This bug has been recently also reported upstream. For the moment, the workaround is to "chattr+ i" the file, or use "rm $path/$file; cp $otherpath/$file $path/#file. When starting a java environment, it works, but when starting a second one, it copies the an identical file over, what causes the first jvm to crash, in a dlsym call to corrupted symbol table. #11 <signal handler called> #12 check_match (sym=0xdbc0) at dl-lookup.c:134 (gdb) p *sym Cannot access memory at address 0xdbc0 (gdb) frame 13 #13 0x0000003998609c82 in do_lookup_x (new_hash=1733714232, old_hash=0x7ff0bc9e7828, ref=<value optimized out>, result=0x7ff0bc9e7810, scope=<value optimized out>, i=0, flags=2, skip=0x0, undef_map=0x7ff1382e3070) at dl-lookup.c:251 251 sym = check_match (&symtab[symidx]); (gdb) p symtab $1 = (const Elf64_Sym *) 0x5718