Bug 1707227 - glusterfsd memory leak after enable tls/ssl
Summary: glusterfsd memory leak after enable tls/ssl
Keywords:
Status: CLOSED DUPLICATE of bug 1768339
Alias: None
Product: GlusterFS
Classification: Community
Component: rpc
Version: 4.1
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1708047
TreeView+ depends on / blocked
 
Reported: 2019-05-07 05:56 UTC by zhou lin
Modified: 2020-01-27 10:53 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
: 1708047 (view as bug list)
Environment:
Last Closed: 2020-01-27 10:53:52 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description zhou lin 2019-05-07 05:56:57 UTC
Description of problem:

glusterfsd memory leak found
Version-Release number of selected component (if applicable):

3.12.15
How reproducible:
while true;do gluster v heal <vol-name> info;done
and open another session to check the memory usage of the <vol-name> related glusterfsd process, the memory will keep increasing until around 370M then increase will stop

Steps to Reproduce:
1.while true;do gluster v heal <vol-name> info;done
2.check the memory usage of the <vol-name> related glusterfsd process
3.

Actual results:
the memory will keep increasing until around 370M then increase will stop

Expected results:
memory stable

Additional info:
with memory scan tool vlagrand attached to glusterfsd process and libleak attached to glusterfsd process seems ssl_accept is suspicious, not sure it is caused by ssl_accept or glusterfs mis-use of ssl:
==16673== 198,720 bytes in 12 blocks are definitely lost in loss record 1,114 of 1,123
==16673== at 0x4C2EB7B: malloc (vg_replace_malloc.c:299)
==16673== by 0x63E1977: CRYPTO_malloc (in /usr/lib64/libcrypto.so.1.0.2p)
==16673== by 0xA855E0C: ssl3_setup_write_buffer (in /usr/lib64/libssl.so.1.0.2p)
==16673== by 0xA855E77: ssl3_setup_buffers (in /usr/lib64/libssl.so.1.0.2p)
==16673== by 0xA8485D9: ssl3_accept (in /usr/lib64/libssl.so.1.0.2p)
==16673== by 0xA610DDF: ssl_complete_connection (socket.c:400)
==16673== by 0xA617F38: ssl_handle_server_connection_attempt (socket.c:2409)
==16673== by 0xA618420: socket_complete_connection (socket.c:2554)
==16673== by 0xA618788: socket_event_handler (socket.c:2613)
==16673== by 0x4ED6983: event_dispatch_epoll_handler (event-epoll.c:587)
==16673== by 0x4ED6C5A: event_dispatch_epoll_worker (event-epoll.c:663)
==16673== by 0x615C5D9: start_thread (in /usr/lib64/libpthread-2.27.so)
==16673==
==16673== 200,544 bytes in 12 blocks are definitely lost in loss record 1,115 of 1,123
==16673== at 0x4C2EB7B: malloc (vg_replace_malloc.c:299)
==16673== by 0x63E1977: CRYPTO_malloc (in /usr/lib64/libcrypto.so.1.0.2p)
==16673== by 0xA855D12: ssl3_setup_read_buffer (in /usr/lib64/libssl.so.1.0.2p)
==16673== by 0xA855E68: ssl3_setup_buffers (in /usr/lib64/libssl.so.1.0.2p)
==16673== by 0xA8485D9: ssl3_accept (in /usr/lib64/libssl.so.1.0.2p)
==16673== by 0xA610DDF: ssl_complete_connection (socket.c:400)
==16673== by 0xA617F38: ssl_handle_server_connection_attempt (socket.c:2409)
==16673== by 0xA618420: socket_complete_connection (socket.c:2554)
==16673== by 0xA618788: socket_event_handler (socket.c:2613)
==16673== by 0x4ED6983: event_dispatch_epoll_handler (event-epoll.c:587)
==16673== by 0x4ED6C5A: event_dispatch_epoll_worker (event-epoll.c:663)
==16673== by 0x615C5D9: start_thread (in /usr/lib64/libpthread-2.27.so)
==16673==
valgrind --leak-check=f

also, with another memory leak scan tool libleak:
callstack[2419] expires. count=1 size=224/224 alloc=362 free=350
/home/robot/libleak/libleak.so(malloc+0x25) [0x7f1460604065]
/lib64/libcrypto.so.10(CRYPTO_malloc+0x58) [0x7f145ecd9978]
/lib64/libcrypto.so.10(EVP_DigestInit_ex+0x2a9) [0x7f145ed95749]
/lib64/libssl.so.10(ssl3_digest_cached_records+0x11d) [0x7f145abb6ced]
/lib64/libssl.so.10(ssl3_accept+0xc8f) [0x7f145abadc4f]
/usr/lib64/glusterfs/3.12.15/rpc-transport/socket.so(ssl_complete_connection+0x5e) [0x7f145ae00f3a]
/usr/lib64/glusterfs/3.12.15/rpc-transport/socket.so(+0xc16d) [0x7f145ae0816d]
/usr/lib64/glusterfs/3.12.15/rpc-transport/socket.so(+0xc68a) [0x7f145ae0868a]
/usr/lib64/glusterfs/3.12.15/rpc-transport/socket.so(+0xc9f2) [0x7f145ae089f2]
/lib64/libglusterfs.so.0(+0x9b96f) [0x7f146038596f]
/lib64/libglusterfs.so.0(+0x9bc46) [0x7f1460385c46]
/lib64/libpthread.so.0(+0x75da) [0x7f145f0d15da]
/lib64/libc.so.6(clone+0x3f) [0x7f145e9a7eaf]
callstack[2432] expires. count=1 size=104/104 alloc=362 free=0
/home/robot/libleak/libleak.so(malloc+0x25) [0x7f1460604065]
/lib64/libcrypto.so.10(CRYPTO_malloc+0x58) [0x7f145ecd9978]
/lib64/libcrypto.so.10(BN_MONT_CTX_new+0x17) [0x7f145ed48627]
/lib64/libcrypto.so.10(BN_MONT_CTX_set_locked+0x6d) [0x7f145ed489fd]
/lib64/libcrypto.so.10(+0xff4d9) [0x7f145ed6a4d9]
/lib64/libcrypto.so.10(int_rsa_verify+0x1cd) [0x7f145ed6d41d]
/lib64/libcrypto.so.10(RSA_verify+0x32) [0x7f145ed6d972]
/lib64/libcrypto.so.10(+0x107ff5) [0x7f145ed72ff5]
/lib64/libcrypto.so.10(EVP_VerifyFinal+0x211) [0x7f145ed9dd51]
/lib64/libssl.so.10(ssl3_get_cert_verify+0x5bb) [0x7f145abac06b]
/lib64/libssl.so.10(ssl3_accept+0x988) [0x7f145abad948]
/usr/lib64/glusterfs/3.12.15/rpc-transport/socket.so(ssl_complete_connection+0x5e) [0x7f145ae00f3a]
/usr/lib64/glusterfs/3.12.15/rpc-transport/socket.so(+0xc16d) [0x7f145ae0816d]
/usr/lib64/glusterfs/3.12.15/rpc-transport/socket.so(+0xc68a) [0x7f145ae0868a]
/usr/lib64/glusterfs/3.12.15/rpc-transport/socket.so(+0xc9f2) [0x7f145ae089f2]
/lib64/libglusterfs.so.0(+0x9b96f) [0x7f146038596f]
/lib64/libglusterfs.so.0(+0x9bc46) [0x7f1460385c46]
/lib64/libpthread.so.0(+0x75da) [0x7f145f0d15da]
/lib64/libc.so.6(clone+0x3f) [0x7f145e9a7eaf]

Comment 1 zhou lin 2019-05-08 07:49:03 UTC
thanks for your respond!
glusterfsd process does call SSL_free interface, however, the ssl context is a shared one between many ssl object. do you think it is possible that if we keep the shared ssl context will cause this memory leak?

Comment 2 Worker Ant 2019-05-09 02:59:43 UTC
REVIEW: https://review.gluster.org/22687 (After enabling TLS, glusterfsd memory leak found) posted (#1) for review on master by None

Comment 3 Worker Ant 2019-05-09 04:18:50 UTC
REVISION POSTED: https://review.gluster.org/22687 (rpc/socket: After enabling TLS, glusterfsd memory leak found) posted (#2) for review on master by Raghavendra G

Comment 4 Amar Tumballi 2019-07-05 07:55:14 UTC
Zhou Lin, is the issue fixed? I see that this patch is Abandon'd by you. Feel free to close it if this is working for you in later releases/branches.

Comment 5 zhou lin 2019-07-25 05:25:31 UTC
unfortunately , the version i use is already different from the master branch , the ssl_ctx is shared one in my version and in master branch each connection has seperate ssl_ctx, so the method i use to fix the memory leak does not apply to the master branch

although my test shows even in master branch this memory leak also exists.
the patch i use in my version can fix the memory leak issue:

--- a/rpc/rpc-transport/socket/src/socket.c
+++ b/rpc/rpc-transport/socket/src/socket.c
@@ -367,6 +367,7 @@ static char *ssl_setup_connection_postfix(rpc_transport_t *this) {
   gf_log(this->name, GF_LOG_DEBUG,
          "SSL verification succeeded (client: %s) (server: %s)",
          this->peerinfo.identifier, this->myinfo.identifier);
+  X509_free(peer);
   return gf_strdup(peer_CN);
 
   /* Error paths. */
@@ -1019,7 +1020,16 @@ static void __socket_reset(rpc_transport_t *this) {
   memset(&priv->incoming, 0, sizeof(priv->incoming));
 
   event_unregister_close(this->ctx->event_pool, priv->sock, priv->idx);
-
+  if(priv->use_ssl&& priv->ssl_ssl)
+  {
+    gf_log(this->name, GF_LOG_INFO,
+           "clear and reset for socket(%d), free ssl ",
+           priv->sock);
+    // SSL_shutdown(priv->ssl_ssl);
+    SSL_clear(priv->ssl_ssl);
+    SSL_free(priv->ssl_ssl);
+    priv->ssl_ssl = NULL;
+  }
   priv->sock = -1;
   priv->idx = -1;
   priv->connected = -1;
@@ -4238,6 +4248,16 @@ void fini(rpc_transport_t *this) {
     pthread_mutex_destroy(&priv->out_lock);
     pthread_mutex_destroy(&priv->cond_lock);
     pthread_cond_destroy(&priv->cond);
+    if(priv->use_ssl&& priv->ssl_ssl)
+    {
+      gf_log(this->name, GF_LOG_TRACE,
+           "clear and reset for socket(%d), free ssl ",
+           priv->sock);
+      // SSL_shutdown(priv->ssl_ssl);
+      SSL_clear(priv->ssl_ssl);
+      SSL_free(priv->ssl_ssl);
+      priv->ssl_ssl = NULL;
+    }
     if (priv->ssl_private_key) {
       GF_FREE(priv->ssl_private_key);
     }

Comment 6 Xavi Hernandez 2020-01-27 10:53:52 UTC
This is a duplicate of 1768339

*** This bug has been marked as a duplicate of bug 1768339 ***


Note You need to log in before you can comment on or make changes to this bug.