| Summary: | GSSAPI/SASL request memory leak | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Retired] 389 | Reporter: | Jeremy Mates <jmates> | ||||||
| Component: | Directory Server | Assignee: | Noriko Hosoi <nhosoi> | ||||||
| Status: | CLOSED DUPLICATE | QA Contact: | Chandrasekar Kannan <ckannan> | ||||||
| Severity: | low | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 1.2.8 | CC: | benl, nhosoi | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2011-04-11 21:23:10 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Attachments: |
|
||||||||
|
Description
Jeremy Mates
2011-04-06 17:31:40 UTC
[memory leak report from valgrind] ==16342== 8 bytes in 1 blocks are definitely lost in loss record 112 of 1,831 ==16342== at 0x4C2415D: malloc (vg_replace_malloc.c:195) ==16342== by 0x95A6C03: ??? (in /lib64/libgssapi_krb5.so.2.2) ==16342== by 0x95A6D79: ??? (in /lib64/libgssapi_krb5.so.2.2) ==16342== by 0x95A6DE0: ??? (in /lib64/libgssapi_krb5.so.2.2) ==16342== by 0x95B41DB: ??? (in /lib64/libgssapi_krb5.so.2.2) ==16342== by 0x95A2D34: ??? (in /lib64/libgssapi_krb5.so.2.2) ==16342== by 0x95A490C: ??? (in /lib64/libgssapi_krb5.so.2.2) ==16342== by 0x959A52A: gss_accept_sec_context (in /lib64/libgssapi_krb5.so.2.2) ==16342== by 0x938D8E3: ??? (in /usr/lib64/sasl2/libgssapiv2.so.2.0.23) ==16342== by 0x34F820F8A0: sasl_server_step (in /usr/lib64/libsasl2.so.2.0.23) ==16342== by 0x34F8210A73: sasl_server_start (in /usr/lib64/libsasl2.so.2.0.23) ==16342== by 0x42A59B: ids_sasl_check_bind (saslbind.c:889) ==16342== by 0x40EB01: do_bind (bind.c:424) ==16342== by 0x413A05: connection_dispatch_operation (connection.c:558) ==16342== by 0x4152BF: connection_threadmain (connection.c:2328) ==16342== by 0x5FCD632: _pt_root (ptthread.c:187) ==16342== by 0x61E8760: start_thread (pthread_create.c:301) ==16342== by 0x64DD98C: clone (clone.S:115) ==16342== 41 bytes in 1 blocks are definitely lost in loss record 648 of 1,831 ==16342== at 0x4C2415D: malloc (vg_replace_malloc.c:195) ==16342== by 0x647C821: strdup (strdup.c:43) ==16342== by 0x34E8213B8F: selinux_trans_to_raw_context (in /lib64/libselinux .so.1) ==16342== by 0x34E8212444: ??? (in /lib64/libselinux.so.1) ==16342== by 0x677E429: ??? (in /lib64/libkrb5support.so.0.1) ==16342== by 0x677EC90: krb5int_labeled_fopen (in /lib64/libkrb5support.so.0. 1) ==16342== by 0x5180F2E: ??? (in /lib64/libkrb5.so.3.3) ==16342== by 0x5182EA8: ??? (in /lib64/libkrb5.so.3.3) ==16342== by 0x51808D5: krb5_kt_get_entry (in /lib64/libkrb5.so.3.3) ==16342== by 0x95A538C: ??? (in /lib64/libgssapi_krb5.so.2.2) ==16342== by 0x959AE3A: gss_add_cred (in /lib64/libgssapi_krb5.so.2.2) ==16342== by 0x959B4A9: gss_acquire_cred (in /lib64/libgssapi_krb5.so.2.2) ==16342== by 0x938D6CB: ??? (in /usr/lib64/sasl2/libgssapiv2.so.2.0.23) ==16342== by 0x34F820F8A0: sasl_server_step (in /usr/lib64/libsasl2.so.2.0.23) ==16342== by 0x34F8210A73: sasl_server_start (in /usr/lib64/libsasl2.so.2.0.23) ==16342== by 0x42A59B: ids_sasl_check_bind (saslbind.c:889) ==16342== by 0x40EB01: do_bind (bind.c:424) ==16342== by 0x413A05: connection_dispatch_operation (connection.c:558) ==16342== by 0x4152BF: connection_threadmain (connection.c:2328) ==16342== by 0x5FCD632: _pt_root (ptthread.c:187) ==16342== by 0x61E8760: start_thread (pthread_create.c:301) ==16342== by 0x64DD98C: clone (clone.S:115) ==16342== 41 bytes in 1 blocks are definitely lost in loss record 649 of 1,831 ==16342== at 0x4C2415D: malloc (vg_replace_malloc.c:195) ==16342== by 0x647C821: strdup (strdup.c:43) ==16342== by 0x34E8213BB6: selinux_trans_to_raw_context (in /lib64/libselinux.so.1) ==16342== by 0x34E8212444: ??? (in /lib64/libselinux.so.1) ==16342== by 0x677E429: ??? (in /lib64/libkrb5support.so.0.1) ==16342== by 0x677EC90: krb5int_labeled_fopen (in /lib64/libkrb5support.so.0.1) ==16342== by 0x5180F2E: ??? (in /lib64/libkrb5.so.3.3) ==16342== by 0x5182EA8: ??? (in /lib64/libkrb5.so.3.3) ==16342== by 0x51808D5: krb5_kt_get_entry (in /lib64/libkrb5.so.3.3) ==16342== by 0x95A538C: ??? (in /lib64/libgssapi_krb5.so.2.2) ==16342== by 0x959AE3A: gss_add_cred (in /lib64/libgssapi_krb5.so.2.2) ==16342== by 0x959B4A9: gss_acquire_cred (in /lib64/libgssapi_krb5.so.2.2) ==16342== by 0x938D6CB: ??? (in /usr/lib64/sasl2/libgssapiv2.so.2.0.23) ==16342== by 0x34F820F8A0: sasl_server_step (in /usr/lib64/libsasl2.so.2.0.23) ==16342== by 0x34F8210A73: sasl_server_start (in /usr/lib64/libsasl2.so.2.0.23) ==16342== by 0x42A59B: ids_sasl_check_bind (saslbind.c:889) ==16342== by 0x40EB01: do_bind (bind.c:424) ==16342== by 0x413A05: connection_dispatch_operation (connection.c:558) ==16342== by 0x4152BF: connection_threadmain (connection.c:2328) ==16342== by 0x5FCD632: _pt_root (ptthread.c:187) ==16342== by 0x61E8760: start_thread (pthread_create.c:301) ==16342== by 0x64DD98C: clone (clone.S:115) ==16342== 46 bytes in 1 blocks are definitely lost in loss record 666 of 1,831 ==16342== at 0x4C2415D: malloc (vg_replace_malloc.c:195) ==16342== by 0x647C821: strdup (strdup.c:43) ==16342== by 0x34E8213DCF: selinux_raw_to_trans_context (in /lib64/libselinux.so.1) ==16342== by 0x34E82121E2: ??? (in /lib64/libselinux.so.1) ==16342== by 0x677E3B9: ??? (in /lib64/libkrb5support.so.0.1) ==16342== by 0x677EC90: krb5int_labeled_fopen (in /lib64/libkrb5support.so.0.1) ==16342== by 0x5180F2E: ??? (in /lib64/libkrb5.so.3.3) ==16342== by 0x5182EA8: ??? (in /lib64/libkrb5.so.3.3) ==16342== by 0x51808D5: krb5_kt_get_entry (in /lib64/libkrb5.so.3.3) ==16342== by 0x519D56E: ??? (in /lib64/libkrb5.so.3.3) ==16342== by 0x519DFC9: krb5_rd_req_decoded (in /lib64/libkrb5.so.3.3) ==16342== by 0x519D226: krb5_rd_req (in /lib64/libkrb5.so.3.3) ==16342== by 0x95A3574: ??? (in /lib64/libgssapi_krb5.so.2.2) ==16342== by 0x95A490C: ??? (in /lib64/libgssapi_krb5.so.2.2) ==16342== by 0x959A52A: gss_accept_sec_context (in /lib64/libgssapi_krb5.so.2.2) ==16342== by 0x938D8E3: ??? (in /usr/lib64/sasl2/libgssapiv2.so.2.0.23) ==16342== by 0x34F820F8A0: sasl_server_step (in /usr/lib64/libsasl2.so.2.0.23) ==16342== by 0x34F8210A73: sasl_server_start (in /usr/lib64/libsasl2.so.2.0.23) ==16342== by 0x42A59B: ids_sasl_check_bind (saslbind.c:889) ==16342== by 0x40EB01: do_bind (bind.c:424) ==16342== by 0x413A05: connection_dispatch_operation (connection.c:558) ==16342== by 0x4152BF: connection_threadmain (connection.c:2328) ==16342== by 0x5FCD632: _pt_root (ptthread.c:187) ==16342== by 0x61E8760: start_thread (pthread_create.c:301) ==16342== by 0x64DD98C: clone (clone.S:115) Thanks for your report, Jeremy. Could you tell us the environment you are facing the problem? Such as platform and selinux mode? I'd like to verify if what I'm seeing is the same problem as yours or not... Also, if you could run the server with valgrind and run ldapsearch and share the result with us ldapsearch -Y GSSAPI -h server.example.edu 'uid=myuid' uid it'll be a great help for us. To run valgrind, I usually create /usr/sbin/start-dirsrv.val with this modify: # diff /usr/sbin/start-dirsrv /usr/sbin/start-dirsrv.val 63c63 < cd $SERVERBIN_DIR; ./ns-slapd -D $CONFIG_DIR -i $PIDFILE -w $STARTPIDFILE "$@" --- > cd $SERVERBIN_DIR; valgrind --num-callers=32 --tool=memcheck --leak-check=full --leak-resolution=high ./ns-slapd -D $CONFIG_DIR -i $PIDFILE -w $STARTPIDFILE "$@" -d 0 And /usr/lib[64]/dirsrv/slapd-YOURID/start-slapd.val # diff /usr/lib64/dirsrv/slapd-YOURID/start-slapd /usr/lib64/dirsrv/slapd-YOURID/start-slapd.val 10c10 < /usr/sbin/start-dirsrv -d /etc/sysconfig $INSTANCE "$@" --- > /usr/sbin/start-dirsrv.val -d /etc/sysconfig $INSTANCE "$@" Then, start the server as follows: /usr/lib64/dirsrv/slapd-YOURID/start-slapd.val > /tmp/valgrind.out 2>&1 & Note: This is not a leak. Regarding the source code, the allocated memory for an error is cached and reused.
==3336== 8 bytes in 1 blocks are definitely lost in loss record 114 of 1,597
==3336== at 0x4A0515D: malloc (vg_replace_malloc.c:195)
==3336== by 0x325F015C03: gss_krb5_save_error_string_nocopy (disp_status.c:81)
==3336== by 0x325F015D79: krb5_gss_save_error_string (disp_status.c:111)
==3336== by 0x325F015DE0: krb5_gss_save_error_info (disp_status.c:142)
==3336== by 0x325F0231DB: krb5_gss_validate_cred (val_cred.c:93)
==3336== by 0x325F011D34: kg_accept_krb5 (accept_sec_context.c:439)
==3336== by 0x325F01390C: krb5_gss_accept_sec_context (accept_sec_context.c:1234)
==3336== by 0x325F00952A: gss_accept_sec_context (g_accept_sec_context.c:196)
==3336== by 0x63368E3: gssapi_server_mech_step (gssapi.c:724)
==3336== by 0x3263C0F8A0: sasl_server_step (server.c:1430)
==3336== by 0x3263C10A73: sasl_server_start (server.c:1350)
==3336== by 0x42A59B: ids_sasl_check_bind (saslbind.c:889)
==3336== by 0x40EB01: do_bind (bind.c:424)
==3336== by 0x413A05: connection_dispatch_operation (connection.c:558)
==3336== by 0x4152BF: connection_threadmain (connection.c:2328)
==3336== by 0x3262429632: _pt_root (ptthread.c:187)
==3336== by 0x3252807760: start_thread (pthread_create.c:301)
==3336== by 0x32520E098C: clone (clone.S:115)
./lib/gssapi/krb5/disp_status.c
static int save_error_string_nocopy(OM_uint32 minor_code, char *msg)
{
gsserrmap *p;
int ret;
#ifdef DEBUG
fprintf(stderr, "%s(%lu, %s)", __func__, (unsigned long) minor_code, msg);
#endif
p = k5_getspecific(K5_KEY_GSS_KRB5_ERROR_MESSAGE);
if (!p) {
p = malloc(sizeof(*p));
if (p == NULL) {
ret = 1;
goto fail;
}
if (gsserrmap_init(p) != 0) {
free(p);
p = NULL;
ret = 1;
goto fail;
}
if (k5_setspecific(K5_KEY_GSS_KRB5_ERROR_MESSAGE, p) != 0) {
gsserrmap_destroy(p);
free(p);
p = NULL;
ret = 1;
goto fail;
}
}
ret = gsserrmap_replace_or_insert(p, minor_code, msg);
fail:
#ifdef DEBUG
fprintf(stderr, " p=%p %s\n", (void *)p, ret ? "FAIL" : "SUCCESS");
#endif
return ret;
}
Created attachment 490439 [details]
valgrind info
valgrind output from ns-slapd, one with replayed traffic (invalid GSSAPI) and one with ldapsearch requests (valid).
389-ds 1.2.8 rc4 installed from the epel-testing repository; RHEL 5.6 is
up-to-date as of April 6th.
# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.6 (Tikanga)
# getenforce
Permissive
# rpm -qa | grep 389
389-ds-base-1.2.8-0.8.rc4.el5
389-dsgw-1.1.6-1.el5
389-adminutil-1.1.13-1.el5
389-console-1.1.4-1.el5
389-admin-1.1.16-1.el5
389-ds-console-1.2.5-1.el5
389-ds-console-doc-1.2.5-1.el5
389-ds-1.2.1-1.el5
389-ds-base-libs-1.2.8-0.8.rc4.el5
389-admin-console-1.1.7-1.el5
389-admin-console-doc-1.1.7-1.el5
Jeremy: Thank you so much for gathering the information. I have a question. When you took the valgrind output, was the directory server still running? It looks to me the output is showing the snapshot of the memory. That is, the output is not a leak but being used. Could it be possible to shutdown the server and take the output? # /usr/lib64/dirsrv/slapd-YOURID/start-slapd.val > /tmp/valgrind.out 2>&1 & # ldapsearch -Y GSSAPI -h server.example.edu 'uid=myuid' uid # service dirsrv stop Thanks, again. --noriko Valgrind output after ns-slapd eventually run out of memory by the persistent GSSAPI requests (memory usage *never* declines, and holds level if the GSSAPI requests are stopped). Could this be due to an underlying glibc malloc problem, if nothing is reported as leaking in ns-slapd? Trying again with just a single request. ==17867== 63 bytes in 1 blocks are definitely lost in loss record 815 of 1,487 ==17867== at 0x4022903: malloc (vg_replace_malloc.c:195) ==17867== by 0x424319F: strdup (in /lib/libc-2.5.so) ==17867== by 0x405644A: slapi_ch_strdup (in /usr/lib/dirsrv/libslapd.so.0.0.0) ==17867== by 0x409B54E: slapi_pblock_set (in /usr/lib/dirsrv/libslapd.so.0.0.0) ==17867== by 0x40BC4D7: slapi_entry_schema_check (in /usr/lib/dirsrv/libslapd.so.0.0.0) ==17867== by 0x4060CB7: ??? (in /usr/lib/dirsrv/libslapd.so.0.0.0) ==17867== by 0x4062B16: ??? (in /usr/lib/dirsrv/libslapd.so.0.0.0) ==17867== by 0x4062DB9: dse_read_file (in /usr/lib/dirsrv/libslapd.so.0.0.0) ==17867== by 0x40B6022: ??? (in /usr/lib/dirsrv/libslapd.so.0.0.0) ==17867== by 0x40B63D9: init_schema_dse (in /usr/lib/dirsrv/libslapd.so.0.0.0) ==17867== by 0x805EF86: ??? (in /usr/sbin/ns-slapd.orig) ==17867== by 0x8062912: ??? (in /usr/sbin/ns-slapd.orig) ==17867== by 0x41E8E9B: (below main) (in /lib/libc-2.5.so) ==17867== ==17868== Warning: noted but unhandled ioctl 0x5422 with no size/direction hints ==17868== This could cause spurious value errors to appear. ==17868== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper. ==17868== ==17868== Valgrind's memory management: out of memory: ==17868== newSuperblock's request for 64593920 bytes failed. ==17868== 3086872576 bytes have already been allocated. ==17868== Valgrind cannot continue. Sorry. ==17868== ==17868== There are several possible reasons for this. ==17868== - You have some kind of memory limit in place. Look at the ==17868== output of 'ulimit -a'. Is there a limit on the size of ==17868== virtual memory or address space? ==17868== - You have run out of swap space. ==17868== - Valgrind has a bug. If you think this is the case or you are ==17868== not sure, please let us know and we'll try to fix it. ==17868== Please note that programs can take substantially more memory than ==17868== normal when running under Valgrind tools, eg. up to twice or ==17868== more, depending on the tool. On a 64-bit machine, Valgrind ==17868== should be able to make use of up 32GB memory. On a 32-bit ==17868== machine, Valgrind should be able to use all the memory available ==17868== to a single process, up to 4GB if that's how you have your ==17868== kernel configured. Most 32-bit Linux setups allow a maximum of ==17868== 3GB per process. ==17868== ==17868== Whatever the reason, Valgrind cannot continue. Sorry. (In reply to comment #9) > Valgrind output after ns-slapd eventually run out of memory by the persistent > GSSAPI requests (memory usage *never* declines, and holds level if the GSSAPI > requests are stopped). Could this be due to an underlying glibc malloc problem, > if nothing is reported as leaking in ns-slapd? Trying again with just a single > request. > Wow. Obviously, we are facing a serious memory leak... Thank you for trying the single request case. I think we could gather useful info even if the server hasn't reached the out of memory size. For our info, how long does it take for the server to run out of the memory and quit? A very long time (4181269 requests over 127773 seconds to consume 3G of RAM), so the problem could be worked around by a periodic server restart. These are 32-bit systems. (The leak seems worse if one hits the servers with more requests/second, which might again point to a malloc inefficiency?) I had valgrind settings from richm@ on the #389 channel messing with things, resetting the servers for a clean install. Created attachment 490876 [details]
more valgrind results
Two valgrind results, one from a ldapsearch, another with a replayed GSSAPI packet.
Jeremy, thanks for running valgrind. It turned out the leak is in libselinux. A new bug 695440 was opened for the module. Let us make this bug a duplicate of 695440. *** This bug has been marked as a duplicate of bug 695440 *** |