RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1327623 - replacing .so which was opened and closed, leads to segfault on next dlopen/dlsym
Summary: replacing .so which was opened and closed, leads to segfault on next dlopen/d...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: glibc
Version: 6.7
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Carlos O'Donell
QA Contact: qe-baseos-tools-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-15 13:39 UTC by Paulo Andrade
Modified: 2023-09-13 05:21 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-04-27 16:24:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Sourceware 19773 0 P2 RESOLVED replacing .so which was opened and closed, leads to segfault on next dlopen/dlsym 2020-04-02 18:21:44 UTC

Description Paulo Andrade 2016-04-15 13:39:48 UTC
This bug has been recently also reported upstream.

  For the moment, the workaround is to "chattr+ i" the file, or
use "rm $path/$file; cp $otherpath/$file $path/#file.

  When starting a java environment, it works, but when starting
a second one, it copies the an identical file over, what causes
the first jvm to crash, in a dlsym call to corrupted symbol table.

#11 <signal handler called>
#12 check_match (sym=0xdbc0) at dl-lookup.c:134
(gdb) p *sym
Cannot access memory at address 0xdbc0
(gdb) frame 13
#13 0x0000003998609c82 in do_lookup_x (new_hash=1733714232, old_hash=0x7ff0bc9e7828, ref=<value optimized out>, result=0x7ff0bc9e7810, 
    scope=<value optimized out>, i=0, flags=2, skip=0x0, undef_map=0x7ff1382e3070) at dl-lookup.c:251
251				sym = check_match (&symtab[symidx]);
(gdb) p symtab
$1 = (const Elf64_Sym *) 0x5718

Comment 1 Paulo Andrade 2016-04-15 13:41:03 UTC
Small/simple reproducer is in sourceware related bug report:
https://sourceware.org/bugzilla/show_bug.cgi?id=19773

Comment 6 Paulo Andrade 2016-04-15 18:26:45 UTC
We might revive a 2001 discussion if requiring kernel help:

http://yarchive.net/comp/linux/map_copy.html

Making a "real" private copy (using read instead of mmap) of every
shared object would use too many resources. This for the sake of the
library being overwritten, and MAP_DENYWRITE no longer existing,
due to being a DoS vector.

This would not prevent from when the file is actually changed,
but maybe only the relocations could be "truly" private?

inotify would be too late, as it would tell about the change
after it was overwritten.

Comment 7 Paulo Andrade 2016-04-19 14:23:41 UTC
  The issue is caused by a design decision in Linux.

  Files mmap'ed will have memory contents changed if the backing file is
overwritten in disk.

  In this case, the file is being overwritten with an identical copy, and
the side effect are some address relocations being undone.

  While the Linux behavior looks wrong for loaded shared libraries that
are overwritten, the required feature to make this special condition work,
 would easily be "abused", and, in most cases it is indeed desired to have
disk content changes mapped back to the memory, that is, the mmap is just
a fast way to access the file contents, otherwise, just copy the file
contents to a memory buffer.

  Shared libraries should only very seldomly be changed in disk. So, the
mechanism used is very fast and cheap on resource usage. If disk changes
were to not be mirrored in the memory image, it should not use a backing
file.

  The solution is to use the same logic used when updating shared libraries,
that is to not use the 'cp' command. The simplest and safest way is to use
a temporary file, and use the 'mv' command.

  If you cannot find where the shared object is being changed, one quick
workaround is to use the 'chattr +i' command. A more complete solution
is to check when/where/why the shared object is being changed, and use
the 'mv' command, or if in pure C, first 'creat' the new, temporary file,
then 'rename' the target to another name (possibly to a temporary backup
name), then 'rename' the new file to the target, and finally 'unlink' the
original file.

  One quick example of detecting when a file contents is changed is to use
the at  script https://sourceware.org/systemtap/examples/io/inodewatch.stp
Another option is to use audit. See 'man audit.rules' for details.

Comment 18 Paulo Andrade 2020-04-02 18:30:25 UTC
Before opening a new bug report, is this scenario expected?

The top command does:

Libnuma_handle = dlopen("libnuma.so", RTLD_LAZY)

and at exit it does dlclose(Libnuma_handle)

A user has a segfault like this:

 #0  _dl_fini () at dl-fini.c:249
249						+ l->l_info[DT_FINI_ARRAY]->d_un.d_ptr);
(gdb) bt
#0  _dl_fini () at dl-fini.c:249
#1  0x00007fdb9ecf8b69 in __run_exit_handlers (status=status@entry=0, listp=0x7fdb9f0856c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#2  0x00007fdb9ecf8bb7 in __GI_exit (status=status@entry=0) at exit.c:99
#3  0x0000000000405528 in bye_bye (str=str@entry=0x0) at top.c:564
#4  0x0000000000406aaf in sig_endpgm (dont_care_sig=<optimized out>) at top.c:625
#5  <signal handler called>
#6  0x00007fdb9fb52787 in munmap () at ../sysdeps/unix/syscall-template.S:81
#7  0x00007fdb9fb506ed in _dl_unmap (map=map@entry=0x7873e0) at ../sysdeps/x86_64/tlsdesc.c:139
#8  0x00007fdb9fb4e317 in _dl_close_worker (map=map@entry=0x7873e0) at dl-close.c:634
#9  0x00007fdb9fb4ecec in _dl_close (_map=0x7873e0) at dl-close.c:776
#10 0x00007fdb9fb48714 in _dl_catch_error (objname=0x7873c0, errstring=0x7873c8, mallocedp=0x7873b8, operate=0x7fdb9f08cfa0 <dlclose_doit>, args=0x7873e0) at dl-error.c:177
#11 0x00007fdb9f08d4ed in _dlerror_run (operate=operate@entry=0x7fdb9f08cfa0 <dlclose_doit>, args=0x7873e0) at dlerror.c:163
#12 0x00007fdb9f08cfcf in __dlclose (handle=<optimized out>) at dlclose.c:47
#13 0x00000000004054fa in bye_bye (str=str@entry=0x0) at top.c:557
#14 0x0000000000407ce3 in ioch (ech=0, cnt=127, buf=0x6193e0 <buf.9827> "") at top.c:996
#15 0x0000000000407f97 in iokey (action=action@entry=1) at top.c:1069
#16 0x00000000004034d9 in main (dont_care_argc=<optimized out>, argv=<optimized out>) at top.c:5731

checking logs, the numactl-libs package was updated while top had a
replaced library dlopen'ened.

  The numactl-libs installation does not do anything special, so I presume it
does the standard procedures to not overwrite in the same inode.

  Maybe a special condition with dlopen'ed shared objects?

(the issue commented above is for rhel7)

Comment 19 Carlos O'Donell 2020-04-02 19:34:46 UTC
(In reply to Paulo Andrade from comment #18)
> #2  0x00007fdb9ecf8bb7 in __GI_exit (status=status@entry=0) at exit.c:99

It is not safe to call exit() from a signal handler because exit() is not AS-safe. This is a bug in top.c.

> #3  0x0000000000405528 in bye_bye (str=str@entry=0x0) at top.c:564
> #4  0x0000000000406aaf in sig_endpgm (dont_care_sig=<optimized out>) at
> top.c:625
> #5  <signal handler called>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Signal handler called.

You may only call AS-safe functions from a signal handler.

You may call _exit()/_Exit() but not exit() (which runs destructors).

Comment 20 Paulo Andrade 2020-04-02 19:46:45 UTC
Thanks. You are right. This is somewhat of a variant of bz#1737552
Sorry for being too quick and not properly checking it.

It is not uncommon to exit the 'top' command with Ctrl-C, just
that apparently, when updating the package with the shared library
it did dlopen, the crash might be easy to reproduce.

Comment 21 Carlos O'Donell 2020-04-02 19:51:12 UTC
(In reply to Paulo Andrade from comment #20)
> It is not uncommon to exit the 'top' command with Ctrl-C, just
> that apparently, when updating the package with the shared library
> it did dlopen, the crash might be easy to reproduce.

If you're shutting down the process you don't need to dlclose() since the entire VMA will be unmapped by the kernel on exit, but I can understand how some generic code paths are called on shutdown. You have to take special measures in the signal handler, or set a global flag, return, and let the event loop shutdown the process.

Comment 22 Paulo Andrade 2020-04-02 20:08:22 UTC
Yes. The code also did reenter the 'bye_bye' function.
What I understand is that 'q' was pressed, then it was too slow to exit
and the user pressed Ctrl-C.
Issue reported at bz#1820357


Note You need to log in before you can comment on or make changes to this bug.