Bug 694195 - GSSAPI/SASL request memory leak
Summary: GSSAPI/SASL request memory leak
Keywords:
Status: CLOSED DUPLICATE of bug 695440
Alias: None
Product: 389
Classification: Retired
Component: Directory Server
Version: 1.2.8
Hardware: Unspecified
OS: Linux
unspecified
low
Target Milestone: ---
Assignee: Noriko Hosoi
QA Contact: Chandrasekar Kannan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-04-06 17:31 UTC by Jeremy Mates
Modified: 2015-01-04 23:47 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-04-11 21:23:10 UTC


Attachments (Terms of Use)
valgrind info (14.35 KB, application/x-tar)
2011-04-07 00:12 UTC, Jeremy Mates
no flags Details
more valgrind results (12.60 KB, application/x-tar)
2011-04-08 22:19 UTC, Jeremy Mates
no flags Details

Description Jeremy Mates 2011-04-06 17:31:40 UTC
Description of problem: ns-slapd leaks memory in response to SASL/GSSAPI traffic, eventually crashing when the OS runs out of memory. The memory leak is slow, so can be worked around by periodically restarting ns-slapd, or by an automatic restart script should ns-slapd be found not running or otherwise non-responsive.

Version-Release number of selected component (if applicable): 1.2.8 rc2 on RHEL 5.

How reproducible: 100% for GSSAPI traffic. Could not reproduce for non-GSSAPI traffic.

Steps to Reproduce:
1. Standalone 1.2.8 rc2 setup on RHEL 5.6. MIT Kerberos already setup in department.
2. Keytab installed per http://docs.redhat.com/docs/en-US/Red_Hat_Directory_Server/8.2/html/Administration_Guide/SASL.html#Configuring_Kerberos-Configuring_the_KDC_Server
3. `kinit` used to acquire principal; simple `ldapsearch` run repeatedly in five different terminals. This was a slow test to see whether the rate of requests influenced the leak:

while sleep 5; do ldapsearch -Y GSSAPI -h server.example.edu 'uid=myuid' uid; done

Actual results: steady growth (*never* a drop) in ns-slapd memory RSS and VSZ, collected by `ps axo pid,rss,vsz` every 60 seconds. Over 14 hours, RSS grew by 9356 Kilobytes, and VSZ by 10060 Kilobytes in response to 139270 connection attempts (2.76 requests/s), or a leak of ~70 bytes per request.

Expected results: no memory leak. Under non-GSSAPI traffic, the memory usage of ns-slapd swings up and down, and remains roughly level over time.

Additional info: large amounts of GSSAPI traffic used to segfault ns-slapd (see bug 683250), so I am not sure whether this memory leak is new to 1.2.8.rc2 or exists in older versions.

Comment 1 Noriko Hosoi 2011-04-06 17:51:07 UTC
[memory leak report from valgrind]

==16342== 8 bytes in 1 blocks are definitely lost in loss record 112 of 1,831
==16342==    at 0x4C2415D: malloc (vg_replace_malloc.c:195)
==16342==    by 0x95A6C03: ??? (in /lib64/libgssapi_krb5.so.2.2)
==16342==    by 0x95A6D79: ??? (in /lib64/libgssapi_krb5.so.2.2)
==16342==    by 0x95A6DE0: ??? (in /lib64/libgssapi_krb5.so.2.2)
==16342==    by 0x95B41DB: ??? (in /lib64/libgssapi_krb5.so.2.2)
==16342==    by 0x95A2D34: ??? (in /lib64/libgssapi_krb5.so.2.2)
==16342==    by 0x95A490C: ??? (in /lib64/libgssapi_krb5.so.2.2)
==16342==    by 0x959A52A: gss_accept_sec_context (in /lib64/libgssapi_krb5.so.2.2)
==16342==    by 0x938D8E3: ??? (in /usr/lib64/sasl2/libgssapiv2.so.2.0.23)
==16342==    by 0x34F820F8A0: sasl_server_step (in /usr/lib64/libsasl2.so.2.0.23)
==16342==    by 0x34F8210A73: sasl_server_start (in /usr/lib64/libsasl2.so.2.0.23)
==16342==    by 0x42A59B: ids_sasl_check_bind (saslbind.c:889)
==16342==    by 0x40EB01: do_bind (bind.c:424)
==16342==    by 0x413A05: connection_dispatch_operation (connection.c:558)
==16342==    by 0x4152BF: connection_threadmain (connection.c:2328)
==16342==    by 0x5FCD632: _pt_root (ptthread.c:187)
==16342==    by 0x61E8760: start_thread (pthread_create.c:301)
==16342==    by 0x64DD98C: clone (clone.S:115)


==16342== 41 bytes in 1 blocks are definitely lost in loss record 648 of 1,831
==16342==    at 0x4C2415D: malloc (vg_replace_malloc.c:195)
==16342==    by 0x647C821: strdup (strdup.c:43)
==16342==    by 0x34E8213B8F: selinux_trans_to_raw_context (in /lib64/libselinux
.so.1)
==16342==    by 0x34E8212444: ??? (in /lib64/libselinux.so.1)
==16342==    by 0x677E429: ??? (in /lib64/libkrb5support.so.0.1)
==16342==    by 0x677EC90: krb5int_labeled_fopen (in /lib64/libkrb5support.so.0.
1)
==16342==    by 0x5180F2E: ??? (in /lib64/libkrb5.so.3.3)
==16342==    by 0x5182EA8: ??? (in /lib64/libkrb5.so.3.3)
==16342==    by 0x51808D5: krb5_kt_get_entry (in /lib64/libkrb5.so.3.3)
==16342==    by 0x95A538C: ??? (in /lib64/libgssapi_krb5.so.2.2)
==16342==    by 0x959AE3A: gss_add_cred (in /lib64/libgssapi_krb5.so.2.2)
==16342==    by 0x959B4A9: gss_acquire_cred (in /lib64/libgssapi_krb5.so.2.2)
==16342==    by 0x938D6CB: ??? (in /usr/lib64/sasl2/libgssapiv2.so.2.0.23)
==16342==    by 0x34F820F8A0: sasl_server_step (in /usr/lib64/libsasl2.so.2.0.23)
==16342==    by 0x34F8210A73: sasl_server_start (in /usr/lib64/libsasl2.so.2.0.23)
==16342==    by 0x42A59B: ids_sasl_check_bind (saslbind.c:889)
==16342==    by 0x40EB01: do_bind (bind.c:424)
==16342==    by 0x413A05: connection_dispatch_operation (connection.c:558)
==16342==    by 0x4152BF: connection_threadmain (connection.c:2328)
==16342==    by 0x5FCD632: _pt_root (ptthread.c:187)
==16342==    by 0x61E8760: start_thread (pthread_create.c:301)
==16342==    by 0x64DD98C: clone (clone.S:115)


==16342== 41 bytes in 1 blocks are definitely lost in loss record 649 of 1,831
==16342==    at 0x4C2415D: malloc (vg_replace_malloc.c:195)
==16342==    by 0x647C821: strdup (strdup.c:43)
==16342==    by 0x34E8213BB6: selinux_trans_to_raw_context (in /lib64/libselinux.so.1)
==16342==    by 0x34E8212444: ??? (in /lib64/libselinux.so.1)
==16342==    by 0x677E429: ??? (in /lib64/libkrb5support.so.0.1)
==16342==    by 0x677EC90: krb5int_labeled_fopen (in /lib64/libkrb5support.so.0.1)
==16342==    by 0x5180F2E: ??? (in /lib64/libkrb5.so.3.3)
==16342==    by 0x5182EA8: ??? (in /lib64/libkrb5.so.3.3)
==16342==    by 0x51808D5: krb5_kt_get_entry (in /lib64/libkrb5.so.3.3)
==16342==    by 0x95A538C: ??? (in /lib64/libgssapi_krb5.so.2.2)
==16342==    by 0x959AE3A: gss_add_cred (in /lib64/libgssapi_krb5.so.2.2)
==16342==    by 0x959B4A9: gss_acquire_cred (in /lib64/libgssapi_krb5.so.2.2)
==16342==    by 0x938D6CB: ??? (in /usr/lib64/sasl2/libgssapiv2.so.2.0.23)
==16342==    by 0x34F820F8A0: sasl_server_step (in /usr/lib64/libsasl2.so.2.0.23)
==16342==    by 0x34F8210A73: sasl_server_start (in /usr/lib64/libsasl2.so.2.0.23)
==16342==    by 0x42A59B: ids_sasl_check_bind (saslbind.c:889)
==16342==    by 0x40EB01: do_bind (bind.c:424)
==16342==    by 0x413A05: connection_dispatch_operation (connection.c:558)
==16342==    by 0x4152BF: connection_threadmain (connection.c:2328)
==16342==    by 0x5FCD632: _pt_root (ptthread.c:187)
==16342==    by 0x61E8760: start_thread (pthread_create.c:301)
==16342==    by 0x64DD98C: clone (clone.S:115)



==16342== 46 bytes in 1 blocks are definitely lost in loss record 666 of 1,831
==16342==    at 0x4C2415D: malloc (vg_replace_malloc.c:195)
==16342==    by 0x647C821: strdup (strdup.c:43)
==16342==    by 0x34E8213DCF: selinux_raw_to_trans_context (in /lib64/libselinux.so.1)
==16342==    by 0x34E82121E2: ??? (in /lib64/libselinux.so.1)
==16342==    by 0x677E3B9: ??? (in /lib64/libkrb5support.so.0.1)
==16342==    by 0x677EC90: krb5int_labeled_fopen (in /lib64/libkrb5support.so.0.1)
==16342==    by 0x5180F2E: ??? (in /lib64/libkrb5.so.3.3)
==16342==    by 0x5182EA8: ??? (in /lib64/libkrb5.so.3.3)
==16342==    by 0x51808D5: krb5_kt_get_entry (in /lib64/libkrb5.so.3.3)
==16342==    by 0x519D56E: ??? (in /lib64/libkrb5.so.3.3)
==16342==    by 0x519DFC9: krb5_rd_req_decoded (in /lib64/libkrb5.so.3.3)
==16342==    by 0x519D226: krb5_rd_req (in /lib64/libkrb5.so.3.3)
==16342==    by 0x95A3574: ??? (in /lib64/libgssapi_krb5.so.2.2)
==16342==    by 0x95A490C: ??? (in /lib64/libgssapi_krb5.so.2.2)
==16342==    by 0x959A52A: gss_accept_sec_context (in /lib64/libgssapi_krb5.so.2.2)
==16342==    by 0x938D8E3: ??? (in /usr/lib64/sasl2/libgssapiv2.so.2.0.23)
==16342==    by 0x34F820F8A0: sasl_server_step (in /usr/lib64/libsasl2.so.2.0.23)
==16342==    by 0x34F8210A73: sasl_server_start (in /usr/lib64/libsasl2.so.2.0.23)
==16342==    by 0x42A59B: ids_sasl_check_bind (saslbind.c:889)
==16342==    by 0x40EB01: do_bind (bind.c:424)
==16342==    by 0x413A05: connection_dispatch_operation (connection.c:558)
==16342==    by 0x4152BF: connection_threadmain (connection.c:2328)
==16342==    by 0x5FCD632: _pt_root (ptthread.c:187)
==16342==    by 0x61E8760: start_thread (pthread_create.c:301)
==16342==    by 0x64DD98C: clone (clone.S:115)

Comment 2 Noriko Hosoi 2011-04-06 18:55:07 UTC
Thanks for your report, Jeremy.

Could you tell us the environment you are facing the problem?  Such as platform and selinux mode?  I'd like to verify if what I'm seeing is the same problem as yours or not...

Also, if you could run the server with valgrind and run ldapsearch and share the result with us
  ldapsearch -Y GSSAPI -h server.example.edu 'uid=myuid' uid
it'll be a great help for us.

To run valgrind, I usually create /usr/sbin/start-dirsrv.val with this modify:

# diff /usr/sbin/start-dirsrv /usr/sbin/start-dirsrv.val
63c63
<     cd $SERVERBIN_DIR; ./ns-slapd -D $CONFIG_DIR -i $PIDFILE -w $STARTPIDFILE "$@"
---
>     cd $SERVERBIN_DIR; valgrind --num-callers=32 --tool=memcheck --leak-check=full --leak-resolution=high ./ns-slapd -D $CONFIG_DIR -i $PIDFILE -w $STARTPIDFILE "$@" -d 0

And /usr/lib[64]/dirsrv/slapd-YOURID/start-slapd.val

# diff /usr/lib64/dirsrv/slapd-YOURID/start-slapd /usr/lib64/dirsrv/slapd-YOURID/start-slapd.val
10c10
< /usr/sbin/start-dirsrv -d /etc/sysconfig $INSTANCE "$@"
---
> /usr/sbin/start-dirsrv.val -d /etc/sysconfig $INSTANCE "$@"

Then, start the server as follows:
/usr/lib64/dirsrv/slapd-YOURID/start-slapd.val > /tmp/valgrind.out 2>&1 &

Comment 4 Noriko Hosoi 2011-04-06 21:23:36 UTC
Note: This is not a leak.  Regarding the source code, the allocated memory for an error is cached and reused.
==3336== 8 bytes in 1 blocks are definitely lost in loss record 114 of 1,597
==3336==    at 0x4A0515D: malloc (vg_replace_malloc.c:195)
==3336==    by 0x325F015C03: gss_krb5_save_error_string_nocopy (disp_status.c:81)
==3336==    by 0x325F015D79: krb5_gss_save_error_string (disp_status.c:111)
==3336==    by 0x325F015DE0: krb5_gss_save_error_info (disp_status.c:142)
==3336==    by 0x325F0231DB: krb5_gss_validate_cred (val_cred.c:93)
==3336==    by 0x325F011D34: kg_accept_krb5 (accept_sec_context.c:439)
==3336==    by 0x325F01390C: krb5_gss_accept_sec_context (accept_sec_context.c:1234)
==3336==    by 0x325F00952A: gss_accept_sec_context (g_accept_sec_context.c:196)
==3336==    by 0x63368E3: gssapi_server_mech_step (gssapi.c:724)
==3336==    by 0x3263C0F8A0: sasl_server_step (server.c:1430)
==3336==    by 0x3263C10A73: sasl_server_start (server.c:1350)
==3336==    by 0x42A59B: ids_sasl_check_bind (saslbind.c:889)
==3336==    by 0x40EB01: do_bind (bind.c:424)
==3336==    by 0x413A05: connection_dispatch_operation (connection.c:558)
==3336==    by 0x4152BF: connection_threadmain (connection.c:2328)
==3336==    by 0x3262429632: _pt_root (ptthread.c:187)
==3336==    by 0x3252807760: start_thread (pthread_create.c:301)
==3336==    by 0x32520E098C: clone (clone.S:115)

./lib/gssapi/krb5/disp_status.c
static int save_error_string_nocopy(OM_uint32 minor_code, char *msg)
{
    gsserrmap *p;
    int ret;

#ifdef DEBUG
    fprintf(stderr, "%s(%lu, %s)", __func__, (unsigned long) minor_code, msg);
#endif
    p = k5_getspecific(K5_KEY_GSS_KRB5_ERROR_MESSAGE);
    if (!p) {
        p = malloc(sizeof(*p));
        if (p == NULL) {
            ret = 1;
            goto fail;
        }
        if (gsserrmap_init(p) != 0) {
            free(p);
            p = NULL;
            ret = 1;
            goto fail;
        }
        if (k5_setspecific(K5_KEY_GSS_KRB5_ERROR_MESSAGE, p) != 0) {
            gsserrmap_destroy(p);
            free(p);
            p = NULL;
            ret = 1;
            goto fail;
        }
    }
    ret = gsserrmap_replace_or_insert(p, minor_code, msg);
fail:
#ifdef DEBUG
    fprintf(stderr, " p=%p %s\n", (void *)p, ret ? "FAIL" : "SUCCESS");
#endif
    return ret;
}

Comment 6 Jeremy Mates 2011-04-07 00:12:20 UTC
Created attachment 490439 [details]
valgrind info

valgrind output from ns-slapd, one with replayed traffic (invalid GSSAPI) and one with ldapsearch requests (valid).

389-ds 1.2.8 rc4 installed from the epel-testing repository; RHEL 5.6 is
up-to-date as of April 6th.

# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.6 (Tikanga)

# getenforce
Permissive

# rpm -qa | grep 389
389-ds-base-1.2.8-0.8.rc4.el5
389-dsgw-1.1.6-1.el5
389-adminutil-1.1.13-1.el5
389-console-1.1.4-1.el5
389-admin-1.1.16-1.el5
389-ds-console-1.2.5-1.el5
389-ds-console-doc-1.2.5-1.el5
389-ds-1.2.1-1.el5
389-ds-base-libs-1.2.8-0.8.rc4.el5
389-admin-console-1.1.7-1.el5
389-admin-console-doc-1.1.7-1.el5

Comment 7 Noriko Hosoi 2011-04-07 00:26:03 UTC
Jeremy: Thank you so much for gathering the information.

I have a question.  When you took the valgrind output, was the directory server still running?  It looks to me the output is showing the snapshot of the memory.  That is, the output is not a leak but being used.  Could it be possible to shutdown the server and take the output?

# /usr/lib64/dirsrv/slapd-YOURID/start-slapd.val > /tmp/valgrind.out 2>&1 &

# ldapsearch -Y GSSAPI -h server.example.edu 'uid=myuid' uid

# service dirsrv stop

Thanks, again.
--noriko

Comment 9 Jeremy Mates 2011-04-08 16:21:51 UTC
Valgrind output after ns-slapd eventually run out of memory by the persistent GSSAPI requests (memory usage *never* declines, and holds level if the GSSAPI requests are stopped). Could this be due to an underlying glibc malloc problem, if nothing is reported as leaking in ns-slapd? Trying again with just a single request.

==17867== 63 bytes in 1 blocks are definitely lost in loss record 815 of 1,487
==17867==    at 0x4022903: malloc (vg_replace_malloc.c:195)
==17867==    by 0x424319F: strdup (in /lib/libc-2.5.so)
==17867==    by 0x405644A: slapi_ch_strdup (in /usr/lib/dirsrv/libslapd.so.0.0.0)
==17867==    by 0x409B54E: slapi_pblock_set (in /usr/lib/dirsrv/libslapd.so.0.0.0)
==17867==    by 0x40BC4D7: slapi_entry_schema_check (in /usr/lib/dirsrv/libslapd.so.0.0.0)
==17867==    by 0x4060CB7: ??? (in /usr/lib/dirsrv/libslapd.so.0.0.0)
==17867==    by 0x4062B16: ??? (in /usr/lib/dirsrv/libslapd.so.0.0.0)
==17867==    by 0x4062DB9: dse_read_file (in /usr/lib/dirsrv/libslapd.so.0.0.0)
==17867==    by 0x40B6022: ??? (in /usr/lib/dirsrv/libslapd.so.0.0.0)
==17867==    by 0x40B63D9: init_schema_dse (in /usr/lib/dirsrv/libslapd.so.0.0.0)
==17867==    by 0x805EF86: ??? (in /usr/sbin/ns-slapd.orig)
==17867==    by 0x8062912: ??? (in /usr/sbin/ns-slapd.orig)
==17867==    by 0x41E8E9B: (below main) (in /lib/libc-2.5.so)
==17867== 
==17868== Warning: noted but unhandled ioctl 0x5422 with no size/direction hints
==17868==    This could cause spurious value errors to appear.
==17868==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==17868== 
==17868==     Valgrind's memory management: out of memory:
==17868==        newSuperblock's request for 64593920 bytes failed.
==17868==        3086872576 bytes have already been allocated.
==17868==     Valgrind cannot continue.  Sorry.
==17868== 
==17868==     There are several possible reasons for this.
==17868==     - You have some kind of memory limit in place.  Look at the
==17868==       output of 'ulimit -a'.  Is there a limit on the size of
==17868==       virtual memory or address space?
==17868==     - You have run out of swap space.
==17868==     - Valgrind has a bug.  If you think this is the case or you are
==17868==     not sure, please let us know and we'll try to fix it.
==17868==     Please note that programs can take substantially more memory than
==17868==     normal when running under Valgrind tools, eg. up to twice or
==17868==     more, depending on the tool.  On a 64-bit machine, Valgrind
==17868==     should be able to make use of up 32GB memory.  On a 32-bit
==17868==     machine, Valgrind should be able to use all the memory available
==17868==     to a single process, up to 4GB if that's how you have your
==17868==     kernel configured.  Most 32-bit Linux setups allow a maximum of
==17868==     3GB per process.
==17868== 
==17868==     Whatever the reason, Valgrind cannot continue.  Sorry.

Comment 10 Noriko Hosoi 2011-04-08 16:37:21 UTC
(In reply to comment #9)
> Valgrind output after ns-slapd eventually run out of memory by the persistent
> GSSAPI requests (memory usage *never* declines, and holds level if the GSSAPI
> requests are stopped). Could this be due to an underlying glibc malloc problem,
> if nothing is reported as leaking in ns-slapd? Trying again with just a single
> request.
> 
Wow.  Obviously, we are facing a serious memory leak...  Thank you for trying the single request case.  I think we could gather useful info even if the server hasn't reached the out of memory size.

For our info, how long does it take for the server to run out of the memory and quit?

Comment 11 Jeremy Mates 2011-04-08 21:24:15 UTC
A very long time (4181269 requests over 127773 seconds to consume 3G of RAM), so the problem could be worked around by a periodic server restart. These are 32-bit systems. (The leak seems worse if one hits the servers with more requests/second, which might again point to a malloc inefficiency?)

I had valgrind settings from richm@ on the #389 channel messing with things, resetting the servers for a clean install.

Comment 12 Jeremy Mates 2011-04-08 22:19:35 UTC
Created attachment 490876 [details]
more valgrind results

Two valgrind results, one from a ldapsearch, another with a replayed GSSAPI packet.

Comment 13 Noriko Hosoi 2011-04-11 21:23:10 UTC
Jeremy, thanks for running valgrind.  It turned out the leak is in libselinux.  A new bug 695440 was opened for the module.

Let us make this bug a duplicate of 695440.

*** This bug has been marked as a duplicate of bug 695440 ***


Note You need to log in before you can comment on or make changes to this bug.