Bug 660292 - memory leak in the attached script
Summary: memory leak in the attached script
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: m2crypto
Version: 5.5
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: ---
Assignee: Miloslav Trmač
QA Contact: BaseOS QE Security Team
URL:
Whiteboard:
Depends On:
Blocks: 626177 640580 661359 661869 662716
TreeView+ depends on / blocked
 
Reported: 2010-12-06 11:59 UTC by Dan Kenigsberg
Modified: 2018-11-14 16:56 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
When an "M2Crypto.SSL.Connection" object was created without setting up a connection, an SSL object was allocated and never freed. Over time, this could result in a significant memory leak. This update ensures the memory is correctly deallocated.
Clone Of:
: 661359 (view as bug list)
Environment:
Last Closed: 2011-01-13 22:05:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
client/server reproducer (6.71 KB, text/x-python)
2010-12-06 11:59 UTC, Dan Kenigsberg
no flags Details
Patch to swig to add __callcount__ instrumentation dict to generated .c code for a module (3.01 KB, patch)
2010-12-06 21:54 UTC, Dave Malcolm
no flags Details | Diff
Proposed patch (7.57 KB, patch)
2010-12-06 22:21 UTC, Miloslav Trmač
no flags Details | Diff
Dump the GC-tracked object list on each request (7.16 KB, application/octet-stream)
2010-12-07 07:33 UTC, Miloslav Trmač
no flags Details
"heap" analysis as per comment #22 (12.01 KB, text/plain)
2010-12-08 00:19 UTC, Dave Malcolm
no flags Details
Patch to gdb-heap to get it work on RHEL5 (and add some SWIG support) (14.02 KB, patch)
2010-12-08 01:46 UTC, Dave Malcolm
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0019 0 normal SHIPPED_LIVE m2crypto bug fix and enhancement update 2011-01-12 16:04:47 UTC

Description Dan Kenigsberg 2010-12-06 11:59:26 UTC
Created attachment 464961 [details]
client/server reproducer

Description of problem:
RHEL5's m2crypto-0.16-6.el5.6 has a serious memory leak, manifested in the attached script. The problem is not visible in RHEL6's m2crypto-0.20.2-7.el6.

Please backport whatever needed to fix this.

reproduction:
1. put the attached script in /tmp
2. make -C /etc/pki/tls/certs /tmp/selfsign.pem
3. python SecureXMLRPCServer.py -s &

4. run client several times: python SecureXMLRPCServer.py
5. watch server's rsz grow.

Comment 5 Dave Malcolm 2010-12-06 21:54:00 UTC
Created attachment 465075 [details]
Patch to swig to add __callcount__ instrumentation dict to generated .c code for a module

The issue here appears to be that the SWIG-generated .c code for M2Crypto.__m2crypto wraps pointers to OpenSSL objects, directly exposing the allocation/deallocation functions.  This means that the .py code has to directly manage these objects by calling alloc/dealloc correctly.

I don't know why upstream is doing this - it seems highly prone to memory leaks.  A more sane design approach would be to have the python wrapper objects "own" the OpenSSL pointers, automatically calling deallocation functions when they themselves are deallocated.

To try to help debug things, I've created a patch for SWIG which adds instrumentation to the generated .c code.

This patch adds a __callcounts__ dict to the (generated) .c code for a SWIG-generated python module, containing a mapping from wrapped function name to number-of-times-called.  It probably ought to be wrapped in some of "debug mode" before it could go upstream, but hopefully is good enough for the purposes of tracking this issue down.

Upon rebuilding swig with the patch, and then rebuilding m2crypto with the new swig, then adding this code to the reproducer:

  import M2Crypto.__m2crypto
  from pprint import pprint
  def stats():
      pprint(M2Crypto.__m2crypto.__callcounts__)

inserting as call to stats() into SecureXMLRPCServer.finish_request 
and running the client in a loop with e.g.:
   while true; do { python SecureXMLRPCServer.py ; } done

leads to stats like this:
{'bio_f_ssl': 2331,
 'bio_free': 4662,
 'bio_init': 1,
 'bio_new': 2331,
 'bio_new_socket': 2331,
 'bio_set_ssl': 2331,
 'dh_init': 1,
 'dsa_init': 1,
 'lib_init': 2,
 'pkcs7_init': 1,
 'rsa_init': 1,
 'smime_init': 1,
 'ssl_accept': 2331,
 'ssl_ctx_check_privkey': 1,
 'ssl_ctx_load_verify_locations': 1,
 'ssl_ctx_new': 1,
 'ssl_ctx_passphrase_callback': 1,
 'ssl_ctx_set_cache_size': 1,
 'ssl_ctx_set_cipher_list': 1,
 'ssl_ctx_set_client_CA_list_from_file': 1,
 'ssl_ctx_set_options': 1,
 'ssl_ctx_set_session_id_context': 1,
 'ssl_ctx_set_verify_default': 1,
 'ssl_ctx_set_verify_depth': 1,
 'ssl_ctx_use_cert_chain': 1,
 'ssl_ctx_use_privkey': 1,
 'ssl_get_peer_cert': 2331,
 'ssl_init': 1,
 'ssl_new': 4664,
 'ssl_read': 2331,
 'ssl_set_accept_state': 2331,
 'ssl_set_bio': 2331,
 'ssl_shutdown': 4662,
 'sslv23_method': 1,
 'util_init': 1,
 'x509_init': 1}

Hopefully this can be used to highlight call to an allocator that don't have a matching call to a deallocator.

Comment 6 Dave Malcolm 2010-12-06 22:02:03 UTC
FWIW, the above stats were done with a mismatching "commonName" in the selfsigned cert, leading to tracebacks.  I fixed this, and reran the test, this time without tracebacks.  The stats now look like this:

{'bio_f_ssl': 765,
 'bio_free': 1530,
 'bio_init': 1,
 'bio_new': 765,
 'bio_new_socket': 765,
 'bio_set_ssl': 765,
 'dh_init': 1,
 'dsa_init': 1,
 'lib_init': 2,
 'pkcs7_init': 1,
 'rsa_init': 1,
 'smime_init': 1,
 'ssl_accept': 765,
 'ssl_ctx_check_privkey': 1,
 'ssl_ctx_load_verify_locations': 1,
 'ssl_ctx_new': 1,
 'ssl_ctx_passphrase_callback': 1,
 'ssl_ctx_set_cache_size': 1,
 'ssl_ctx_set_cipher_list': 1,
 'ssl_ctx_set_client_CA_list_from_file': 1,
 'ssl_ctx_set_options': 1,
 'ssl_ctx_set_session_id_context': 1,
 'ssl_ctx_set_verify_default': 1,
 'ssl_ctx_set_verify_depth': 1,
 'ssl_ctx_use_cert_chain': 1,
 'ssl_ctx_use_privkey': 1,
 'ssl_get_peer_cert': 765,
 'ssl_init': 1,
 'ssl_new': 1532,
 'ssl_read': 1530,
 'ssl_set_accept_state': 765,
 'ssl_set_bio': 765,
 'ssl_set_shutdown': 1530,
 'ssl_shutdown': 2295,
 'ssl_write': 765,
 'sslv23_method': 1,
 'util_init': 1,
 'x509_init': 1}

Comment 7 Miloslav Trmač 2010-12-06 22:21:06 UTC
Created attachment 465087 [details]
Proposed patch

Test build and comments to follow...

Comment 14 Miloslav Trmač 2010-12-07 07:33:49 UTC
Created attachment 465183 [details]
Dump the GC-tracked object list on each request

> Dumping GC-known objects inside add() in the test case quickly crashes
> valgrind, so this is really shouldn't go to customers.

This is the crashing version

Comment 19 Dave Malcolm 2010-12-07 19:58:27 UTC
I've rebuilt my copy of m2crypto using the patch from comment #7 (attachment 465087 [details]), using my patched SWIG that adds call count instrumentation (as per comment #5).

I also changed the client part of the reproducer (from the original posting) to:
        for i in range(1000):
            __Test().client()
to precisely check 1000 client connections.

With this, M2Crypto.__m2crypto.__callcounts__ is (with the print at the top of "finish_request"):
{'bio_f_ssl': 999,
 'bio_free': 1998,
 'bio_init': 1,
 'bio_new': 999,
 'bio_new_socket': 999,
 'bio_set_ssl': 999,
 'dh_init': 1,
 'dsa_init': 1,
 'lib_init': 2,
 'pkcs7_init': 1,
 'rsa_init': 1,
 'smime_init': 1,
 'ssl_accept': 999,
 'ssl_ctx_check_privkey': 1,
 'ssl_ctx_load_verify_locations': 1,
 'ssl_ctx_new': 1,
 'ssl_ctx_passphrase_callback': 1, 
 'ssl_ctx_set_cache_size': 1,
 'ssl_ctx_set_cipher_list': 1,
 'ssl_ctx_set_client_CA_list_from_file': 1,
 'ssl_ctx_set_options': 1,
 'ssl_ctx_set_session_id_context': 1,
 'ssl_ctx_set_verify_default': 1,
 'ssl_ctx_set_verify_depth': 1,
 'ssl_ctx_use_cert_chain': 1,
 'ssl_ctx_use_privkey': 1,
 'ssl_free': 1998,
 'ssl_get_peer_cert': 999,
 'ssl_init': 1,
 'ssl_new': 2000,
 'ssl_read': 1998,
 'ssl_set_accept_state': 999,
 'ssl_set_bio': 999,
 'ssl_set_shutdown': 1998,
 'ssl_shutdown': 2997,
 'ssl_write': 999,
 'sslv23_method': 1,
 'util_init': 1,
 'x509_init': 1}

Comment 22 Dave Malcolm 2010-12-08 00:17:33 UTC
I attempted to use my gdb-heap tool:
    https://fedorahosted.org/gdb-heap/
to further analyze memory usage within the server process, but unfortunately, the results don't seem to be particularly illuminating.

Procedure:
I rebuilt gdb-7.0.1-28.el5 (see bug 609157) against my python26 package from EPEL5, and built a custom version of python-ply ("python26-ply", by setting "%define __python python26" in the specfile).

I had to apply various fixes to gdb-heap to get it to work upon RHEL5's python 2.4

I attached gdb to a server process that had serviced around 28500 requests (with versions as per comment #19), with PYTHONPATH set to the gdb-heap code, and ran:
  (gdb) python import gdbheap
  (gdb) heap

Am about to attach the output.

Comment 23 Dave Malcolm 2010-12-08 00:19:40 UTC
Created attachment 467333 [details]
"heap" analysis as per comment #22

Comment 24 Dave Malcolm 2010-12-08 01:46:24 UTC
Created attachment 467342 [details]
Patch to gdb-heap to get it work on RHEL5 (and add some SWIG support)

For my own reference, here's the patch for gdb-heap I used to get it to work on RHEL5 upon python-2.4.3 (with a build of gdb linked against EPEL5's python26).

I added PySwigObject handling, but for some reason the SWIG objects aren't showing up in the analysis.  Not sure why at this stage.

Comment 27 Dan Kenigsberg 2010-12-08 09:55:34 UTC
Actually, I should be even sorrier. My statement of comment 0 regarding m2crypto-0.20.2-7.el6 is wrong and indefensible.

On RHEL5 with m2crypto-0.16-6.el5.6, I see 3960 bytes leaked per query.
On RHEL6 with m2crypto-0.20.2-7.el6, I see "only" 1499 bytes leaked per query.

Comment 36 Jaromir Hradilek 2010-12-14 13:39:00 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
When an "M2Crypto.SSL.Connection" object was created without setting up a connection, an SSL object was allocated and never freed. Over time, this could result in a significant memory leak. This update ensures the memory is correctly deallocated.

Comment 38 errata-xmlrpc 2011-01-13 22:05:55 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0019.html


Note You need to log in before you can comment on or make changes to this bug.