My OpenConnect builds are failing on rawhide. It turns out that when softhsm is used, it prevents the process from exiting cleanly. Reproduce by just rebuilding the openconnect RPM. Or checkout out from git://git.infradead.org/users/dwmw2/openconnect.git and running 'make check'... make check-TESTS make[2]: Entering directory '/home/dwmw2/git/openconnect/gtls/tests' make[3]: Entering directory '/home/dwmw2/git/openconnect/gtls/tests' PASS: autocompletion PASS: auth-username-pass PASS: auth-certificate PASS: auth-nonascii PASS: cert-fingerprint PASS: id-test The test it's stuck on, and hasn't printed yet, is auth-pkcs11. If I connect to the stuck openconnect process with gdb, I see this backtrace: 919 if ((atomic_load_acquire (&rwlock->__data.__readers) (gdb) bt #0 0x00007f3be4ed14b7 in __pthread_rwlock_wrlock_full (abstime=0x0, rwlock=0x134f7b0) at pthread_rwlock_common.c:919 #1 __GI___pthread_rwlock_wrlock (rwlock=0x134f7b0) at pthread_rwlock_wrlock.c:27 #2 0x00007f3be4bfbe7d in CRYPTO_THREAD_write_lock (lock=<optimized out>) at crypto/threads_pthread.c:78 #3 0x00007f3be4b61745 in ENGINE_finish (e=0x134f7f0) at crypto/engine/eng_init.c:101 #4 0x00007f3bd4b55e8b in OSSLCryptoFactory::~OSSLCryptoFactory() () from /usr/lib64/pkcs11/libsofthsm2.so #5 0x00007f3bd4b55f1d in OSSLCryptoFactory::~OSSLCryptoFactory() () from /usr/lib64/pkcs11/libsofthsm2.so #6 0x00007f3be4d38680 in __run_exit_handlers (status=status@entry=0, listp=0x7f3be4ebe738 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108 #7 0x00007f3be4d387c0 in __GI_exit (status=status@entry=0) at exit.c:139 #8 0x0000000000403f4c in main (argc=<optimized out>, argv=<optimized out>) at ../main.c:1906
==228434== Invalid read of size 4 ==228434== at 0x54D9316: __pthread_rwlock_wrlock_full (pthread_rwlock_common.c:581) ==228434== by 0x54D9316: pthread_rwlock_wrlock (pthread_rwlock_wrlock.c:27) ==228434== by 0x58ADE7C: CRYPTO_THREAD_write_lock (threads_pthread.c:78) ==228434== by 0x5813744: ENGINE_finish (eng_init.c:101) ==228434== by 0x134794D6: OSSLCryptoFactory::~OSSLCryptoFactory() (OSSLCryptoFactory.cpp:240) ==228434== by 0x13479558: OSSLCryptoFactory::~OSSLCryptoFactory() (OSSLCryptoFactory.cpp:259) ==228434== by 0x552767F: __run_exit_handlers (exit.c:108) ==228434== by 0x55277BF: exit (exit.c:139) ==228434== by 0x404043: main (main.c:1553) ==228434== Address 0x14bb74c8 is 24 bytes inside a block of size 56 free'd ==228434== at 0x4839A0C: free (vg_replace_malloc.c:540) ==228434== by 0x583FBE1: OPENSSL_cleanup (init.c:601) ==228434== by 0x583FBE1: OPENSSL_cleanup (init.c:497) ==228434== by 0x552767F: __run_exit_handlers (exit.c:108) ==228434== by 0x55277BF: exit (exit.c:139) ==228434== by 0x404043: main (main.c:1553) ==228434== Block was alloc'd at ==228434== at 0x483880B: malloc (vg_replace_malloc.c:309) ==228434== by 0x584859D: CRYPTO_zalloc (mem.c:230) ==228434== by 0x58ADE0B: CRYPTO_THREAD_lock_new (threads_pthread.c:29) ==228434== by 0x58137B9: do_engine_lock_init (eng_lib.c:25) ==228434== by 0x58137B9: do_engine_lock_init_ossl_ (eng_lib.c:21) ==228434== by 0x54DBD7E: __pthread_once_slow (pthread_once.c:116) ==228434== by 0x58ADEFC: CRYPTO_THREAD_run_once (threads_pthread.c:118) ==228434== by 0x5813878: ENGINE_new (eng_lib.c:33) ==228434== by 0x58152FD: ENGINE_rdrand (eng_rdrand.c:70) ==228434== by 0x58152FD: engine_load_rdrand_int (eng_rdrand.c:85) ==228434== by 0x583F8BC: ossl_init_engine_rdrand (init.c:353) ==228434== by 0x583F8BC: ossl_init_engine_rdrand_ossl_ (init.c:347) ==228434== by 0x54DBD7E: __pthread_once_slow (pthread_once.c:116) ==228434== by 0x58ADEFC: CRYPTO_THREAD_run_once (threads_pthread.c:118) ==228434== by 0x584004C: OPENSSL_init_crypto (init.c:723) ==228434== by 0x584004C: OPENSSL_init_crypto (init.c:620)
This is because SoftHSM's atexit hander (well, the destructor for the OSSLCryptoFactory object) is trying to clean up the rdrand engine after OpenSSL's own atexit handler has run and cleaned up *everything*. Proof of concept fix at https://github.com/dwmw2/SoftHSMv2/commit/f9c24e04f21e241ee93dd862f5ebb56dd54461c5 I've worked around it for now in my OpenConnect builds by exporting OPENSSL_ia32cap=~0x4000000000000000 when running the tests, to disable RDRAND.
Fix merged: https://github.com/opendnssec/SoftHSMv2/commit/c2cc0652b4c4829fc6ba186469f4e324af77dfe8
FEDORA-2020-3735365352 has been submitted as an update to Fedora 32. https://bodhi.fedoraproject.org/updates/FEDORA-2020-3735365352
FEDORA-2020-3735365352 has been pushed to the Fedora 32 testing repository. In short time you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-3735365352` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-3735365352 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2020-7dfd335e05 has been pushed to the Fedora 31 testing repository. In short time you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-7dfd335e05` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-7dfd335e05 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
It is even worse with the latest version. sh-5.0$ rm -rf /tmp/softhsm2_none* sh-5.0$ mkdir /tmp/softhsm2_none sh-5.0$ cat > /tmp/softhsm2_none.conf <<EOF directories.tokendir = /tmp/softhsm2_none objectstore.backend = file slots.removable = true EOF sh-5.0$ SOFTHSM2_CONF=/tmp/softhsm2_none.conf /usr/bin/softhsm2-util --init-token --label "Test Token" --pin 123456 --so-pin 123456 --free Slot 0 has a free/uninitialized token. The token has been initialized and is reassigned to slot 1837621079 Segmentation fault (core dumped) sh-5.0$ echo $? 139 sh-5.0$ rpm -q softhsm softhsm-2.6.1-2.fc33.x86_64
sh-5.0$ SOFTHSM2_CONF=/tmp/softhsm2_none.conf valgrind /usr/bin/softhsm2-util --init-token --label "Test Token" --pin 123456 --so-pin 123456 --free ==196== Memcheck, a memory error detector ==196== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==196== Using Valgrind-3.16.0.GIT and LibVEX; rerun with -h for copyright info ==196== Command: /usr/bin/softhsm2-util --init-token --label Test\ Token --pin 123456 --so-pin 123456 --free ==196== Slot 0 has a free/uninitialized token. The token has been initialized and is reassigned to slot 806032075 ==196== Jump to the invalid address stated on the next line ==196== at 0x5518E90: ??? ==196== by 0x49D9399: OPENSSL_cleanup (in /usr/lib64/libcrypto.so.1.1.1g) ==196== by 0x4D91316: __run_exit_handlers (in /usr/lib64/libc-2.31.9000.so) ==196== by 0x4D914BF: exit (in /usr/lib64/libc-2.31.9000.so) ==196== by 0x4D79158: (below main) (in /usr/lib64/libc-2.31.9000.so) ==196== Address 0x5518e90 is not stack'd, malloc'd or (recently) free'd ==196== ==196== ==196== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==196== Access not within mapped region at address 0x5518E90 ==196== at 0x5518E90: ??? ==196== by 0x49D9399: OPENSSL_cleanup (in /usr/lib64/libcrypto.so.1.1.1g) ==196== by 0x4D91316: __run_exit_handlers (in /usr/lib64/libc-2.31.9000.so) ==196== by 0x4D914BF: exit (in /usr/lib64/libc-2.31.9000.so) ==196== by 0x4D79158: (below main) (in /usr/lib64/libc-2.31.9000.so) ==196== If you believe this happened as a result of a stack ==196== overflow in your program's main thread (unlikely but ==196== possible), you can try to increase the size of the ==196== main thread stack using the --main-stacksize= flag. ==196== The main thread stack size used in this run was 8388608. ==196== ==196== HEAP SUMMARY: ==196== in use at exit: 91,224 bytes in 2,995 blocks ==196== total heap usage: 14,625 allocs, 11,630 frees, 1,680,726 bytes allocated ==196== ==196== LEAK SUMMARY: ==196== definitely lost: 0 bytes in 0 blocks ==196== indirectly lost: 0 bytes in 0 blocks ==196== possibly lost: 0 bytes in 0 blocks ==196== still reachable: 91,224 bytes in 2,995 blocks ==196== suppressed: 0 bytes in 0 blocks ==196== Rerun with --leak-check=full to see details of leaked memory ==196== ==196== For lists of detected and suppressed errors, rerun with: -s ==196== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) Segmentation fault (core dumped)
I cannot see crash with older version. sh-5.0$ rm -rf /tmp/softhsm2_none* sh-5.0$ mkdir /tmp/softhsm2_none sh-5.0$ cat > /tmp/softhsm2_none.conf <<EOF > directories.tokendir = /tmp/softhsm2_none > objectstore.backend = file > slots.removable = true > EOF sh-5.0$ rpm -q softhsm softhsm-2.6.1-1.fc33.x86_64 sh-5.0$ sh-5.0$ SOFTHSM2_CONF=/tmp/softhsm2_none.conf valgrind /usr/bin/softhsm2-util --init-token --label "Test Token" --pin 123456 --so-pin 123456 --free ==34== Memcheck, a memory error detector ==34== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==34== Using Valgrind-3.16.0.GIT and LibVEX; rerun with -h for copyright info ==34== Command: /usr/bin/softhsm2-util --init-token --label Test\ Token --pin 123456 --so-pin 123456 --free ==34== Slot 0 has a free/uninitialized token. The token has been initialized and is reassigned to slot 2117858883 ==34== ==34== HEAP SUMMARY: ==34== in use at exit: 0 bytes in 0 blocks ==34== total heap usage: 14,224 allocs, 14,224 frees, 1,647,246 bytes allocated ==34== ==34== All heap blocks were freed -- no leaks are possible ==34== ==34== For lists of detected and suppressed errors, rerun with: -s ==34== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Thanks for the report. I think I see the problem. https://github.com/opendnssec/SoftHSMv2/issues/548#issuecomment-627919265
https://github.com/opendnssec/SoftHSMv2/pull/551 Please could you test the build at https://koji.fedoraproject.org/koji/taskinfo?taskID=44446665
(In reply to David Woodhouse from comment #11) > https://github.com/opendnssec/SoftHSMv2/pull/551 > > Please could you test the build at > https://koji.fedoraproject.org/koji/taskinfo?taskID=44446665 Thank you. it works for me
FreeIPA server deployment broke in today's Rawhide, it fails due to a crash in softhsm2-util . I don't have a backtrace ATM as it seems like Rawhide's default config prevents systemd capturing the core dump when the crash happens, but from timing, looks quite likely to be this bug. Proposing as an F33 Beta blocker, violates Basic criterion "It must be possible to configure a Fedora Server system installed according to the above criteria as a FreeIPA domain controller, using the official deployment tools provided in the distribution FreeIPA packages" - https://fedoraproject.org/wiki/Basic_Release_Criteria#FreeIPA_server_requirements
I've taken the liberty of submitting a 2.6.1-3 build for rawhide. I'll let Paul handle F31/F32 updates, perhaps waiting until the latest fix is actually merged into SoftHSM upstream (which hopefully won't take long as I broke them too). Sorry :)
I've made buildroot overrides for softhsm-2.6.1-3 in F31 and F32 so that I can build the new OpenConnect 8.10 release for CVE-2020-12823.
David, after you've done with the rebuild, feel free to add OpenConnect builds to the Bodhi requests of softhsm so they get pushed together: F32: https://bodhi.fedoraproject.org/updates/FEDORA-2020-77604f07ae F31: https://bodhi.fedoraproject.org/updates/FEDORA-2020-901be5b51e
Thanks. In fact I think I'll keep them separate though; the bugs are mostly orthogonal except that the self-tests for OpenConnect were tripping over the SoftHSM one.
*** Bug 1838487 has been marked as a duplicate of this bug. ***
IMHO it would be desirable if openssl did more robust cleanup to avoid the danger of use after free. At the very least setting "global_engine_lock = NULL" in engine_cleanup_int should have caused this to crash immediately rather proceed to access free'd memory and burn 100% CPU.
Agreed.
FreeIPA tests passed in Rawhide in the last few composes we got, so I'd say from that perspective this is fixed. Do we need to keep the bug open till F31 and F32 updates go stable?
Dropping F33 blocker tags, at least.
This bug appears to have been reported against 'rawhide' during the Fedora 33 development cycle. Changing version to 33.
this was fixed in f32/f33 a while ago