Bug 717738 - memleak in tlsm_auth_cert_handler
Summary: memleak in tlsm_auth_cert_handler
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: openldap
Version: 6.1
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: rc
: ---
Assignee: Jan Vcelak
QA Contact: BaseOS QE Security Team
URL:
Whiteboard:
Keywords: ZStream
Depends On: 717730
Blocks: 710372 723134 790916
TreeView+ depends on / blocked
 
Reported: 2011-06-29 18:05 UTC by Jan Vcelak
Modified: 2013-03-04 01:29 UTC (History)
10 users (show)

(edit)
- Any tool which uses both OpenLDAP and Mozilla NSS libraries. OpenLDAP validates TLS peer and the certificate is cached by Mozilla NSS library.
- The tool can fail on NSS_Shutdown function call, because the client certificate is not freed and the caches cannot be destroyed.
- Peer certificate is freed in OpenLDAP library after certificate validation is finished.
- All caches can be freed and NSS_Shutdown succeeds.
Clone Of: 717730
(edit)
Last Closed: 2011-12-06 12:12:51 UTC


Attachments (Terms of Use)
Valgrind --leak-check=full output (old version) (63 bytes, text/plain)
2011-07-20 14:25 UTC, Ondrej Moriš
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:1514 normal SHIPPED_LIVE openldap bug fix and enhancement update 2011-12-06 00:51:20 UTC

Description Jan Vcelak 2011-06-29 18:05:07 UTC
+++ This bug was initially created as a clone of Bug #717730 +++

Description of problem:
tlsm_auth_cert_handler calls SSL_PeerCertificate to get the peer's cert from the socket.  This cert must be freed with CERT_DestroyCertificate.

You can see this problem using valgrind with ldapsearch -ZZ.  You will see memory leaks like this:

==23056== 48 bytes in 1 blocks are possibly lost in loss record 45 of 110
==23056==    at 0x4A04A28: calloc (vg_replace_malloc.c:467)
==23056==    by 0x30AF675479: nss_ZAlloc (arena.c:892)
==23056==    by 0x30AE601A8E: PL_HashTableRawAdd (plhash.c:265)
==23056==    by 0x30AF676C53: nssHash_Add (hash.c:259)
==23056==    by 0x30AF66C7C7: nssCertificateStore_FindOrAdd (pkistore.c:192)
==23056==    by 0x30AF6691B1: NSSCryptoContext_FindOrImportCertificate (cryptocontext.c:146)
==23056==    by 0x30AF664A17: CERT_NewTempCertificate (stanpcertdb.c:456)
==23056==    by 0x30B0212ED3: ssl3_HandleHandshakeMessage (ssl3con.c:7850)
==23056==    by 0x30B0213E2F: ssl3_HandleRecord (ssl3con.c:8727)
==23056==    by 0x30B02148CB: ssl3_GatherCompleteHandshake (ssl3gthr.c:209)
==23056==    by 0x30B0217168: ssl_GatherRecord1stHandshake (sslcon.c:1258)
==23056==    by 0x30B021CF14: ssl_Do1stHandshake (sslsecur.c:151)
==23056==    by 0x30B021E67E: SSL_ForceHandshake (sslsecur.c:407)
==23056==    by 0x30B32349E4: tlsm_session_accept_or_connect (tls_m.c:2350)
==23056==    by 0x30B3233571: ldap_int_tls_connect (tls2.c:366)
==23056==    by 0x30B32337DC: ldap_int_tls_start (tls2.c:833)
==23056==    by 0x30B323394D: ldap_start_tls_s (tls2.c:939)
==23056==    by 0x40B798: tool_conn_setup (common.c:1290)
==23056==    by 0x4069A7: main (ldapsearch.c:900)

In other applications that use Mozilla NSS, you will see errors in NSS_Shutdown and NSS_Initialize - NSS_Shutdown will fail because the cert objects are cached, and the cache cannot be freed because there is still an outstanding reference.

--- Additional comment from rmeggins@redhat.com on 2011-06-29 19:39:24 CEST ---

Patch submitted upstream - http://www.openldap.org/its/index.cgi?findid=6980

Comment 3 Jan Vcelak 2011-07-18 15:53:52 UTC
Fix included in openldap-2.4.23-16.el6

Comment 6 Jan Vcelak 2011-07-20 11:03:21 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
- Any tool which uses both OpenLDAP and Mozilla NSS libraries. OpenLDAP validates TLS peer and the certificate is cached by Mozilla NSS library.
- The tool can fail on NSS_Shutdown function call, because the client certificate is not freed and the caches cannot be destroyed.
- Peer certificate is freed in OpenLDAP library after certificate validation is finished.
- All caches can be freed and NSS_Shutdown succeeds.

Comment 7 Ondrej Moriš 2011-07-20 13:16:47 UTC
What exactly I need to reproduce this bug? I set-up slapd on SSL/TLS and there are leaks reported while running valgrind on ldapsearch -ZZ in both new and old version of openldap:

New (15.el6_1.1)

==6603== Warning: invalid file descriptor -1 in syscall close()
==6603== 
==6603== HEAP SUMMARY:
==6603==     in use at exit: 16,829 bytes in 101 blocks
==6603==   total heap usage: 5,909 allocs, 5,808 frees, 1,934,615 bytes allocated
==6603== 
==6603== LEAK SUMMARY:
==6603==    definitely lost: 59 bytes in 2 blocks
==6603==    indirectly lost: 0 bytes in 0 blocks
==6603==      possibly lost: 52 bytes in 2 blocks
==6603==    still reachable: 16,718 bytes in 97 blocks
==6603==         suppressed: 0 bytes in 0 blocks
==6603== Rerun with --leak-check=full to see details of leaked memory
==6603== 
==6603== For counts of detected and suppressed errors, rerun with: -v
==6603== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 44 from 13)

Old (15.el6)

==7718== Warning: invalid file descriptor -1 in syscall close()
==7718== 
==7718== HEAP SUMMARY:
==7718==     in use at exit: 24,374 bytes in 125 blocks
==7718==   total heap usage: 5,910 allocs, 5,785 frees, 1,934,687 bytes allocated
==7718== 
==7718== LEAK SUMMARY:
==7718==    definitely lost: 131 bytes in 3 blocks
==7718==    indirectly lost: 0 bytes in 0 blocks
==7718==      possibly lost: 11,667 bytes in 27 blocks
==7718==    still reachable: 12,576 bytes in 95 blocks
==7718==         suppressed: 0 bytes in 0 blocks
==7718== Rerun with --leak-check=full to see details of leaked memory
==7718== 
==7718== For counts of detected and suppressed errors, rerun with: -v
==7718== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 44 from 13)

I can attach valgrind --leak-check=yes outputs if needed, anyway:

# grep tlsm old.valgrind 
==9994==    by 0x4051775: tlsm_init (tls_m.c:2906)
==9994==    by 0x4052D9E: tlsm_deferred_ctx_init (tls_m.c:1598)
==9994==    by 0x404FD7C: tlsm_session_new (tls_m.c:2306)
==9994==    by 0x4052D9E: tlsm_deferred_ctx_init (tls_m.c:1598)
==9994==    by 0x40516CE: tlsm_ctx_free (tls_m.c:1927)
==9994==    by 0x4052D9E: tlsm_deferred_ctx_init (tls_m.c:1598)
==9994==    by 0x4052D9E: tlsm_deferred_ctx_init (tls_m.c:1598)
==9994==    by 0x4052D9E: tlsm_deferred_ctx_init (tls_m.c:1598)

# grep tlsm new.valgrind 
==8855==    by 0x40516A5: tlsm_init (tls_m.c:2908)
==8855==    by 0x4052CCE: tlsm_deferred_ctx_init (tls_m.c:1600)
==8855==    by 0x404FCAC: tlsm_session_new (tls_m.c:2308)

Comment 8 Jan Vcelak 2011-07-20 13:25:26 UTC
Ondrej, I know there are some leaks. And will investigate what can be done with it. I'm sure that all cannot be removed at the moment - one is definitely intended.

Anyway, for the purpose of this bug, I was grepping for NSSCryptoContext_FindOrImportCertificate

Comment 9 Ondrej Moriš 2011-07-20 14:23:15 UTC
Thanks for you reply Jan. However, it seems to be hard to reproduce the leak mentioned in the description.

Rich, could you provide more specific reproducer? Is anything specific needed to hit it? For instance, client certificates, certificate / key pairs in NSS DB or in pem files, certificate parameters, anything else... 

I have set-up openldap server to listen on both 636 and 389, added some sample data into it, told it to use cacert, server certificate and key from pem files, told openldap clients not to use client certificates and to demand server certificate verification and then used ldapsearch -ZZ to require TLS. 

As I have mentioned before, it is leaking, but I do not see the leak you reported. I am attaching valgrind --leak-check=full output.

Comment 10 Ondrej Moriš 2011-07-20 14:25:17 UTC
Created attachment 514017 [details]
Valgrind --leak-check=full output (old version)

Comment 11 Rich Megginson 2011-07-25 18:32:03 UTC
(In reply to comment #10)
> Created attachment 514017 [details]
> Valgrind --leak-check=full output (old version)

What platform is this?  This openldap is using openssl, not moznss.

Comment 12 Ondrej Moriš 2011-07-25 23:11:08 UTC
Hi Rich, it is i686 and it is openldap-2.4.23-15.el6 which uses nss instead of openssl. It should use nss, shouldn't it?

Comment 13 Rich Megginson 2011-07-25 23:44:05 UTC
(In reply to comment #12)
> Hi Rich, it is i686 and it is openldap-2.4.23-15.el6 which uses nss instead of
> openssl. It should use nss, shouldn't it?

Yes.  How is this possible?  How was this built?

Comment 14 Ondrej Moriš 2011-07-26 09:04:37 UTC
Ah, sorry, my fault. I was playing with it probably too much, I am attaching valgrind output of ldapsearch -ZZ for openldap-2.4.23-15.el6 build _with_ NSS (and deprecating previous valgrind output). 

See http://nest.test.redhat.com/mnt/qa/scratch/omoris/old-leaks.txt

Comment 15 Rich Megginson 2011-07-26 15:23:09 UTC
(In reply to comment #14)
> Ah, sorry, my fault. I was playing with it probably too much, I am attaching
> valgrind output of ldapsearch -ZZ for openldap-2.4.23-15.el6 build _with_ NSS
> (and deprecating previous valgrind output). 
> 
> See http://nest.test.redhat.com/mnt/qa/scratch/omoris/old-leaks.txt

Yes, this is the leak.  Note that you should use the valgrind argument --num-callers=50 to get the full stack.

Comment 16 Ondrej Moriš 2011-07-27 08:47:11 UTC
Is this an example of a leak we are hunting for?

==3112== 24 bytes in 1 blocks are possibly lost in loss record 39 of 111
==3112==    at 0x40053B3: calloc (vg_replace_malloc.c:467)
==3112==    by 0xC1324B: PR_Calloc (prmem.c:475)
==3112==    by 0x5B50D7: nss_ZAlloc (arena.c:892)
==3112==    by 0x5B5806: nss_arena_hash_alloc_entry (hashops.c:98)
==3112==    by 0xC48785: PL_HashTableRawAdd (plhash.c:265)
==3112==    by 0xC4886D: PL_HashTableAdd (plhash.c:296)
==3112==    by 0x5B6D71: nssHash_Add (hash.c:259)
==3112==    by 0x5ABBB5: nssCertificateStore_FindOrAdd (pkistore.c:192)
==3112==    by 0x5A7C73: NSSCryptoContext_FindOrImportCertificate (cryptocontext.c:146)
==3112==    by 0x5A26EF: CERT_NewTempCertificate (stanpcertdb.c:471)
==3112==    by 0xC89610: ssl3_HandleHandshakeMessage (ssl3con.c:7848)
==3112==    by 0xC8A904: ssl3_HandleRecord (ssl3con.c:8725)
==3112==    by 0xC8B471: ssl3_GatherCompleteHandshake (ssl3gthr.c:209)
==3112==    by 0xC8E0EB: ssl_GatherRecord1stHandshake (sslcon.c:1258)
==3112==    by 0xC94C54: ssl_Do1stHandshake (sslsecur.c:151)
==3112==    by 0xC96576: SSL_ForceHandshake (sslsecur.c:407)
==3112==    by 0x404F79D: tlsm_session_accept_or_connect (tls_m.c:2350)
==3112==    by 0x404DEDF: ldap_int_tls_connect (tls2.c:366)
==3112==    by 0x404E1E1: ldap_int_tls_start (tls2.c:833)
==3112==    by 0x404E36B: ldap_start_tls_s (tls2.c:939)
==3112==    by 0x8052F62: tool_conn_setup (common.c:1290)
==3112==    by 0x804DB80: main (ldapsearch.c:900)

Is this bug about leaks in NSSCryptoContext*?

Comment 17 Rich Megginson 2011-07-27 14:48:08 UTC
(In reply to comment #16)
> Is this an example of a leak we are hunting for?

Yes.
 
> Is this bug about leaks in NSSCryptoContext*?

Not necessarily.

Comment 18 Ondrej Moriš 2011-07-27 22:11:09 UTC
(In reply to comment #17)
> (In reply to comment #16)
> > Is this an example of a leak we are hunting for?
> 
> Yes.
> 
> > Is this bug about leaks in NSSCryptoContext*?
> 
> Not necessarily.

Ok, let's reformulate it a bit - is this bug about leaks in *NSS*? Since there is still the following leak in the new version:

==5127== 32 bytes in 1 blocks are definitely lost in loss record 46 of 87
==5127==    at 0x400682F: malloc (vg_replace_malloc.c:236)
==5127==    by 0xC13008: PR_Malloc (prmem.c:467)
==5127==    by 0xC0FBC1: GrowStuff (prprf.c:1076)
==5127==    by 0xC0F5E6: dosprintf (prprf.c:137)
==5127==    by 0xC0FA81: PR_vsmprintf (prprf.c:1127)
==5127==    by 0xC0FC74: PR_smprintf (prprf.c:1105)
==5127==    by 0x551C85: nss_MkConfigString (nssinit.c:205)
==5127==    by 0x551D7B: nss_Init (nssinit.c:597)
==5127==    by 0x552AD7: NSS_InitContext (nssinit.c:800)
==5127==    by 0x4052CCE: tlsm_deferred_ctx_init (tls_m.c:1600)
==5127==    by 0xC18FD0: PR_CallOnceWithArg (prinit.c:832)
==5127==    by 0x404FCAC: tlsm_session_new (tls_m.c:2308)
==5127==    by 0x404DCFB: alloc_handle (tls2.c:296)
==5127==    by 0x404DE8B: ldap_int_tls_connect (tls2.c:341)
==5127==    by 0x404E111: ldap_int_tls_start (tls2.c:833)
==5127==    by 0x404E29B: ldap_start_tls_s (tls2.c:939)
==5127==    by 0x8052F52: tool_conn_setup (common.c:1290)
==5127==    by 0x804DB80: main (ldapsearch.c:900)

Comment 19 Rich Megginson 2011-07-27 22:25:23 UTC
(In reply to comment #18)
> (In reply to comment #17)
> > (In reply to comment #16)
> > > Is this an example of a leak we are hunting for?
> > 
> > Yes.
> > 
> > > Is this bug about leaks in NSSCryptoContext*?
> > 
> > Not necessarily.
> 
> Ok, let's reformulate it a bit - is this bug about leaks in *NSS*?

no.  It is only about this specific leak, about freeing the cert returned by SSL_PeerCertificate.

> Since there
> is still the following leak in the new version:

This is a leak in NSS - please file a bug against nss for this.

> ==5127== 32 bytes in 1 blocks are definitely lost in loss record 46 of 87
> ==5127==    at 0x400682F: malloc (vg_replace_malloc.c:236)
> ==5127==    by 0xC13008: PR_Malloc (prmem.c:467)
> ==5127==    by 0xC0FBC1: GrowStuff (prprf.c:1076)
> ==5127==    by 0xC0F5E6: dosprintf (prprf.c:137)
> ==5127==    by 0xC0FA81: PR_vsmprintf (prprf.c:1127)
> ==5127==    by 0xC0FC74: PR_smprintf (prprf.c:1105)
> ==5127==    by 0x551C85: nss_MkConfigString (nssinit.c:205)
> ==5127==    by 0x551D7B: nss_Init (nssinit.c:597)
> ==5127==    by 0x552AD7: NSS_InitContext (nssinit.c:800)
> ==5127==    by 0x4052CCE: tlsm_deferred_ctx_init (tls_m.c:1600)
> ==5127==    by 0xC18FD0: PR_CallOnceWithArg (prinit.c:832)
> ==5127==    by 0x404FCAC: tlsm_session_new (tls_m.c:2308)
> ==5127==    by 0x404DCFB: alloc_handle (tls2.c:296)
> ==5127==    by 0x404DE8B: ldap_int_tls_connect (tls2.c:341)
> ==5127==    by 0x404E111: ldap_int_tls_start (tls2.c:833)
> ==5127==    by 0x404E29B: ldap_start_tls_s (tls2.c:939)
> ==5127==    by 0x8052F52: tool_conn_setup (common.c:1290)
> ==5127==    by 0x804DB80: main (ldapsearch.c:900)

Comment 20 Bob Relyea 2011-07-28 00:25:15 UTC
> This is a leak in NSS - please file a bug against nss for this.

Looks like Openldap is not calling the clear session cache function.

bob

Comment 21 Rich Megginson 2011-07-28 00:34:10 UTC
(In reply to comment #20)
> > This is a leak in NSS - please file a bug against nss for this.
> 
> Looks like Openldap is not calling the clear session cache function.
> 
> bob

Then please open a different bug against openldap.

Comment 23 errata-xmlrpc 2011-12-06 12:12:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1514.html


Note You need to log in before you can comment on or make changes to this bug.