Bug 2128412
Summary: | stunnel consumes high amount of memory when pestered with TCP connections without a TLS handshake | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Sven Hoexter <sven> | |
Component: | openssl | Assignee: | Dmitry Belyavskiy <dbelyavs> | |
Status: | CLOSED ERRATA | QA Contact: | Alicja Kario <hkario> | |
Severity: | medium | Docs Contact: | ||
Priority: | low | |||
Version: | CentOS Stream | CC: | bstinson, cllang, dbelyavs, hkario, jwboyer, quentin, ruben, ssorce | |
Target Milestone: | rc | Keywords: | Triaged, ZStream | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | openssl-3.0.7-1.el9 | Doc Type: | Bug Fix | |
Doc Text: |
Cause: A flag value conflict in the OpenSSL headers caused a memory leak in TLS services with the OpenSSL library if a TCP connection was opened and closed without a TLS handshake.
Consequence: A small amount of memory leaked for every connection without a TLS handshake.
Fix: Backport the fix for value conflict.
Result: No memory leaks when TCP connections are closed without a TLS handshake.
|
Story Points: | --- | |
Clone Of: | ||||
: | 2144008 2144009 (view as bug list) | Environment: | ||
Last Closed: | 2023-05-09 08:20:36 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2144008, 2144009 |
Description
Sven Hoexter
2022-09-20 13:27:01 UTC
Thank you for the report. Just to make sure: Do you have PKCS#11 configured in OpenSSL? I'm not 100% sure what you mean by configured in this context. Installed are: openssl-pkcs11-0.4.11-7.el9.x86_64 openssl-libs-3.0.1-41.el9.x86_64 openssl-3.0.1-41.el9.x86_64 The pkcs11 engine in openssl is available $ openssl engine pkcs11 -t (pkcs11) pkcs11 engine [ available ] That's true for the system we originally experienced the issue on, and the one I used to reproduce it. When stunnel crashes I see the following in dmesg: stunnel[8093] general protection fault ip:7fd0ab7aa0e0 sp:7fd0ab2db6c0 error:0 in libcrypto.so.3.0.1[7fd0ab5af000+257000] Right now I fail to record a core dump with with systemd-coredump. According to coredumpctl list it's noticing it but does not write a core file: Wed 2022-09-21 07:58:49 UTC 8092 0 0 SIGSEGV none /usr/bin/stunnel n/a By configured I mean changes to /etc/pki/tls/openssl.cnf that would cause any user of OpenSSL to load a pkcs11 implementation, e.g.: > openssl_conf = pkcs11_conf > > [...] > > [pkcs11_conf] > engines = engine_section > ssl_conf = ssl_module > > [engine_section] > pkcs11 = pkcs11_section > > [pkcs11_section] > engine_id = pkcs11 > MODULE_PATH = /usr/lib64/pkcs11/libsofthsm2.so > init = 0 The reason I'm asking is that we have previously seen memory leaks for each request handled by stunnel when pkcs11 modules are used with OpenSSL. I'd just like to check whether this is potentially the same issue. Ok, none of that is configured. [vagrant@centos9s ~]$ grep pkcs11 /etc/pki/tls/openssl.cnf [vagrant@centos9s ~]$ Not sure if it helps, I got out of my comfort zone and installed gdb + debug headers. Removed along the way the openssl-pkcs11 package to be sure. I got gdb attached to stunnel and provoked a SIGSEGV and that is the backtrace. Thread 2 "stunnel" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fbe07487640 (LWP 184215)] 0x00007fbe07956dc4 in bn_sqr8x_internal () at crypto/bn/x86_64-mont5.s:1746 1746 leaq (%rdi,%r9,1),%rdi (gdb) bt #0 0x00007fbe07956dc4 in bn_sqr8x_internal () at crypto/bn/x86_64-mont5.s:1746 #1 0x00007fbe079550c5 in bn_sqr8x_mont () at crypto/bn/x86_64-mont.s:795 #2 0x00007fbe07793b7b in bn_mul_mont_fixed_top (r=r@entry=0x7fbe00003298, a=a@entry=0x7fbe00003298, b=b@entry=0x7fbe00003298, mont=mont@entry=0x7fbe00007ae0, ctx=ctx@entry=0x7fbe00001fb0) at crypto/bn/bn_mont.c:48 #3 0x00007fbe07799f4b in BN_mod_exp_mont (rr=rr@entry=0x7fbe00003250, a=a@entry=0x7fbe00003268, p=p@entry=0x7fbe00003238, m=m@entry=0x7fbe00001dd0, ctx=ctx@entry=0x7fbe00001fb0, in_mont=in_mont@entry=0x7fbe00007ae0) at crypto/bn/bn_exp.c:427 #4 0x00007fbe0779bf0e in ossl_bn_miller_rabin_is_prime (w=w@entry=0x7fbe00001dd0, iterations=<optimized out>, iterations@entry=1, ctx=ctx@entry=0x7fbe00001fb0, cb=cb@entry=0x7fbe00001340, enhanced=enhanced@entry=0, status=status@entry=0x7fbe07486c14) at crypto/bn/bn_prime.c:405 #5 0x00007fbe0779c2d2 in ossl_bn_miller_rabin_is_prime (status=0x7fbe07486c14, enhanced=0, cb=0x7fbe00001340, ctx=0x7fbe00001fb0, iterations=1, w=0x7fbe00001dd0) at crypto/bn/bn_prime.c:345 #6 bn_is_prime_int (cb=0x7fbe00001340, do_trial_division=0, ctx=0x7fbe00001fb0, checks=1, w=0x7fbe00001dd0) at crypto/bn/bn_prime.c:311 #7 bn_is_prime_int (w=w@entry=0x7fbe00001dd0, checks=checks@entry=1, ctx=ctx@entry=0x7fbe00001fb0, do_trial_division=do_trial_division@entry=0, cb=cb@entry=0x7fbe00001340) at crypto/bn/bn_prime.c:266 #8 0x00007fbe0779cac5 in BN_generate_prime_ex2 (ret=ret@entry=0x7fbe00001dd0, bits=bits@entry=2048, safe=safe@entry=1, add=add@entry=0x7fbe00001bd0, rem=rem@entry=0x7fbe00001be8, cb=cb@entry=0x7fbe00001340, ctx=<optimized out>) at crypto/bn/bn_prime.c:186 #9 0x00007fbe0779ce87 in BN_generate_prime_ex (ret=0x7fbe00001dd0, bits=2048, safe=1, add=0x7fbe00001bd0, rem=0x7fbe00001be8, cb=0x7fbe00001340) at crypto/bn/bn_prime.c:222 #10 0x00007fbe077d50e6 in dh_builtin_genparams (cb=0x7fbe00001340, generator=2, prime_len=2048, ret=0x7fbe00001860) at crypto/dh/dh_gen.c:216 #11 DH_generate_parameters_ex (ret=0x7fbe00001860, prime_len=2048, generator=2, cb=0x7fbe00001340) at crypto/dh/dh_gen.c:124 #12 0x000055b676dc8c89 in cron_dh_param (bn_gencb=0x7fbe00001340) at /usr/src/debug/stunnel-5.62-2.el9.x86_64/src/cron.c:205 #13 cron_worker () at /usr/src/debug/stunnel-5.62-2.el9.x86_64/src/cron.c:158 #14 cron_thread (arg=<optimized out>) at /usr/src/debug/stunnel-5.62-2.el9.x86_64/src/cron.c:100 #15 0x00007fbe07543802 in start_thread (arg=<optimized out>) at pthread_create.c:443 #16 0x00007fbe074e3450 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 Actually I've done it twice and it always happened in bn_sqr8x_internal(). I believe the memory leak and the crash are two separate issues. Generating dh parameter and attaching them to the certificate, thus preventing stunnel to generate them, seems to prevent the crashes from happening. The memory leak stays. I also only see the crashes in the local Virtualbox based VM, and could not reproduce those inside a google cloud compute engine instance. I found the easiest way to identify where the memory leaks in stunnel is running valgrind with gdb stub support and periodically interrupting stunnel to trigger a valgrind leak scan: 1. Run valgrind with gdbserver support using valgrind --vgdb=yes --vgdb-error=0 --leak-check=full --num-callers=60 --track-origins=yes /usr/bin/stunnel stunnel.conf 2. Start gdb /usr/bin/stunnel 3. Attach gdb to the running valgrind instance using "target remote | vgdb" 4. Issue a gdb "continue" command until stunnel has successfully started up 5. Interrupt program execution using ^C, and issue a valgrind leak check using "monitor leak_check full possibleleak changed" (most of these leaks happen during startup and are not the ones we're after) 6. Issue a gdb "continue" command and send 100 netcat requests to stunnel in a separate shell 7. Repeat steps (5) and (6) to ignore leaks that happen when a particular stunnel server is first used 8. Repeat step (5); this now shows only leaks that happened in relation to the additional requests, i.e., this memory likely leaks for every request. I had to use stunnel with "foreground = yes", not sure in how far that alters the behaviour. Everything executed on a fresh google cloud compute instance. (gdb) monitor leak_check full possibleleak changed ==194994== 314 (+120) bytes in 1 (+0) blocks are possibly lost in loss record 893 of 1,215 ==194994== at 0x48496AF: realloc (vg_replace_malloc.c:1437) ==194994== by 0x115501: str_realloc_internal_debug.lto_priv.0 (str.c:340) ==194994== by 0x4AFDFC1: sk_reserve (stack.c:210) ==194994== by 0x4AFE262: OPENSSL_sk_insert (stack.c:254) ==194994== by 0x4AB2223: UnknownInlinedFun (initthread.c:45) ==194994== by 0x4AB2223: UnknownInlinedFun (initthread.c:164) ==194994== by 0x4AB2223: UnknownInlinedFun (initthread.c:109) ==194994== by 0x4AB2223: UnknownInlinedFun (initthread.c:93) ==194994== by 0x4AB2223: ossl_init_thread_start (initthread.c:378) ==194994== by 0x4A6B5B4: ossl_err_get_state_int (err.c:667) ==194994== by 0x4A6125C: ERR_clear_error (err.c:319) ==194994== by 0x48BC26C: state_machine.part.0 (statem.c:326) ==194994== by 0x116AA4: ssl_start (client.c:580) ==194994== by 0x11A7B2: UnknownInlinedFun (client.c:404) ==194994== by 0x11A7B2: client_run (client.c:301) ==194994== by 0x1230B0: client_thread (client.c:130) ==194994== by 0x4DCA801: start_thread (pthread_create.c:443) ==194994== by 0x4D6A313: clone (clone.S:100) ==194994== ==194994== 5,440 (+2,720) bytes in 20 (+10) blocks are possibly lost in loss record 1,159 of 1,215 ==194994== at 0x4849464: calloc (vg_replace_malloc.c:1328) ==194994== by 0x4016732: UnknownInlinedFun (rtld-malloc.h:44) ==194994== by 0x4016732: allocate_dtv (dl-tls.c:375) ==194994== by 0x4017151: _dl_allocate_tls (dl-tls.c:634) ==194994== by 0x4DCB4C4: allocate_stack (allocatestack.c:429) ==194994== by 0x4DCB4C4: pthread_create@@GLIBC_2.34 (pthread_create.c:648) ==194994== by 0x12E55A: create_client.constprop.0 (sthreads.c:599) ==194994== by 0x113EBF: UnknownInlinedFun (stunnel.c:447) ==194994== by 0x113EBF: UnknownInlinedFun (stunnel.c:382) ==194994== by 0x113EBF: UnknownInlinedFun (stunnel.c:356) ==194994== by 0x113EBF: UnknownInlinedFun (ui_unix.c:114) ==194994== by 0x113EBF: main (ui_unix.c:58) ==194994== ==194994== 3,331,600 (+1,665,800) bytes in 200 (+100) blocks are definitely lost in loss record 1,215 of 1,215 ==194994== at 0x4849464: calloc (vg_replace_malloc.c:1328) ==194994== by 0x1152F4: str_alloc_detached_debug (str.c:295) ==194994== by 0x48A79BF: ssl3_setup_write_buffer (ssl3_buffer.c:119) ==194994== by 0x48BD1FE: UnknownInlinedFun (ssl3_buffer.c:148) ==194994== by 0x48BD1FE: UnknownInlinedFun (ssl3_buffer.c:142) ==194994== by 0x48BD1FE: state_machine.part.0 (statem.c:402) ==194994== by 0x116AA4: ssl_start (client.c:580) ==194994== by 0x11A7B2: UnknownInlinedFun (client.c:404) ==194994== by 0x11A7B2: client_run (client.c:301) ==194994== by 0x1230B0: client_thread (client.c:130) ==194994== by 0x4DCA801: start_thread (pthread_create.c:443) ==194994== by 0x4D6A313: clone (clone.S:100) ==194994== ==194994== LEAK SUMMARY: ==194994== definitely lost: 3,331,600 (+1,665,800) bytes in 200 (+100) blocks ==194994== indirectly lost: 0 (+0) bytes in 0 (+0) blocks ==194994== possibly lost: 1,775,537 (+2,840) bytes in 7,647 (+10) blocks ==194994== still reachable: 5,745 (+0) bytes in 23 (+0) blocks ==194994== suppressed: 0 (+0) bytes in 0 (+0) blocks ==194994== Reachable blocks (those to which a pointer was found) are not shown. ==194994== To see them, add 'reachable any' args to leak_check ==194994== This may actually be a problem in OpenSSL. See bug 2134754 and https://github.com/acassen/keepalived/issues/2199#issuecomment-1277175404. I can confirm that this problem does occur with openssl-libs-1:3.0.1-41.el9_0.x86_64, but is not reproducible with openssl-libs-1:3.0.5-5.fc38.x86_64. I'm moving this bug to openssl. *** Bug 2134754 has been marked as a duplicate of this bug. *** 3d046c4d047a55123beeceffe9f8bae09159445e is the first fixed commit commit 3d046c4d047a55123beeceffe9f8bae09159445e Author: yangyangtiantianlonglong <yangtianlong1224> Date: Wed Jan 19 11:19:52 2022 +0800 Fix the same BIO_FLAGS macro definition Also add comment to the public header to avoid making another conflict in future. Fixes #17545 Reviewed-by: Paul Dale <pauli> Reviewed-by: Tomas Mraz <tomas> (Merged from https://github.com/openssl/openssl/pull/17546) (cherry picked from commit e278f18563dd3dd67c00200ee30402f48023c6ef) include/internal/bio.h | 2 +- include/openssl/bio.h.in | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) See https://github.com/openssl/openssl/commit/3d046c4d047a55123beeceffe9f8bae09159445e and https://github.com/openssl/openssl/issues/17545. I guess might not actually be aware that this fixed a memory leak. I confirmed in bug 2134754 comment 3 that backporting this change fixes the leak. *** Bug 2134754 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Low: openssl security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:2523 |