Bug 1624387

Summary: dlopen(), dlsym() allocate 32bytes never released
Product: [Fedora] Fedora Reporter: Yann Droneaud <yann>
Component: glibcAssignee: Carlos O'Donell <codonell>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: unspecified    
Version: 28CC: aoliva, arjun.is, codonell, dj, fweimer, law, mfabian, pfrankli, rth, siddhesh
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-08-31 16:38:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
A test program to reproduce the issue
none
Associated makefile none

Description Yann Droneaud 2018-08-31 12:33:31 UTC
Created attachment 1480086 [details]
A test program to reproduce the issue

Description of problem:

Using dlopen() and/or dlsym() in the main thread of a program linked to -lpthread lead to 32bytes never released, which are reported by valgrind:

==783108== HEAP SUMMARY:
==783108==     in use at exit: 32 bytes in 1 blocks
==783108==   total heap usage: 1 allocs, 0 frees, 32 bytes allocated
==783108== 
==783108== 32 bytes in 1 blocks are still reachable in loss record 1 of 1
==783108==    at 0x4C30B06: calloc (vg_replace_malloc.c:711)
==783108==    by 0x4E3C7D4: _dlerror_run (dlerror.c:140)
==783108==    by 0x4E3C095: dlopen@@GLIBC_2.2.5 (dlopen.c:87)
==783108==    by 0x400585: test (test.c:13)
==783108==    by 0x4005A8: main (test.c:44)
==783108== 
==783108== LEAK SUMMARY:
==783108==    definitely lost: 0 bytes in 0 blocks
==783108==    indirectly lost: 0 bytes in 0 blocks
==783108==      possibly lost: 0 bytes in 0 blocks
==783108==    still reachable: 32 bytes in 1 blocks
==783108==         suppressed: 0 bytes in 0 blocks

If dlopen() and/or dlsym() is called from another thread, there's no "leaked"* memory reported by valgrind.

When not linking against -lpthread, calling dlopen() and/or dlsym() doesn't "leak"* memory.


* I don't like calling this a leak, as it's only and only one block of 32bytes which is still reachable, but some people disagree and want *0* bytes reported by valgrind.

BTW, this issue is well known:
- https://bugzilla.mozilla.org/show_bug.cgi?id=793535
- https://bugs.kde.org/show_bug.cgi?id=358980
- https://stackoverflow.com/questions/6503661/dlopen-dlsym-dlclose-dlfcn-h-causes-memory-leak
- https://stackoverflow.com/questions/1542457/memory-leak-reported-by-valgrind-in-dlopen
- https://sourceware.org/bugzilla/show_bug.cgi?id=14015

For me, it's an issue because I had to explain my code doesn't leak resources, glibc was "leaking" the memory for me.

It's a rather strange issue, because it happen only when linked with -lpthread, but not when actually doing dlopen() and/or dlsym() in a thread. If a subthread could release the memory, why the main thread cannot do the same ?

Also note CentOS 7.5 exhibit the same issue.

Comment 1 Yann Droneaud 2018-08-31 12:34:11 UTC
Created attachment 1480087 [details]
Associated makefile

Comment 2 Carlos O'Donell 2018-08-31 16:38:39 UTC
(In reply to Yann Droneaud from comment #1)
> Created attachment 1480087 [details]
> Associated makefile

Thanks for the report!

As coincidence would have it we ran into this issue earlier this year when working on some malloc reporting.

This is fixed by the fix for bug 23329 which was in glibc 2.28, in Fedora 29 and Rawhide (F30).

I have no plans to backport this to Fedora 28 because it's just a theoretical leak only.

commit 2827ab990aefbb0e53374199b875d98f116d6390
Author: Carlos O'Donell <carlos>
Date:   Fri Jun 22 09:28:47 2018 -0400

    libc: Extend __libc_freeres framework (Bug 23329).
    
    The __libc_freeres framework does not extend to non-libc.so objects.
    This causes problems in general for valgrind and mtrace detecting
    unfreed objects in both libdl.so and libpthread.so.  This change is
    a pre-requisite to properly moving the malloc hooks out of malloc
    since such a move now requires precise accounting of all allocated
    data before destructors are run.
    
    This commit adds a proper hook in libc.so.6 for both libdl.so and
    for libpthread.so, this ensures that shm-directory.c which uses
    freeit () to free memory is called properly.  We also remove the
    nptl_freeres hook and fall back to using weak-ref-and-check idiom
    for a loaded libpthread.so, thus making this process similar for
    all DSOs.
    
    Lastly we follow best practice and use explicit free calls for
    both libdl.so and libpthread.so instead of the generic hook process
    which has undefined order.
    
    Tested on x86_64 with no regressions.
    
    Signed-off-by: DJ Delorie <dj>
    Signed-off-by: Carlos O'Donell <carlos>

Valgrind shows __libc_freeres can free the resources:

valgrind --leak-check=full --show-leak-kinds=all ./test-pthread-nothread-dlopen
==7834== Memcheck, a memory error detector
==7834== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==7834== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==7834== Command: ./test-pthread-nothread-dlopen
==7834== 
==7834== 
==7834== HEAP SUMMARY:
==7834==     in use at exit: 0 bytes in 0 blocks
==7834==   total heap usage: 1 allocs, 1 frees, 32 bytes allocated
==7834== 
==7834== All heap blocks were freed -- no leaks are possible
==7834== 
==7834== For counts of detected and suppressed errors, rerun with: -v
==7834== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

I'm marking this CLOSED/RAWHIDE for now.

Comment 3 Yann Droneaud 2018-08-31 16:51:25 UTC
(In reply to Carlos O'Donell from comment #2)
> (In reply to Yann Droneaud from comment #1)
> 
> As coincidence would have it we ran into this issue earlier this year when
> working on some malloc reporting.
> 

:)

> This is fixed by the fix for bug 23329 which was in glibc 2.28, in Fedora 29
> and Rawhide (F30).
> 

Great, thanks !

(Would be great if the next RHEL/CentOS major version have the fix too).