Originally found in: openssl-1.0.1e-30.el6_6.4.src.rpm Apparently introduced in: openssl-1.0.1e-25.el7.src.rpm openssl-1.0.1e-34.fc20.src.rpm Accidentally(?) fixed in: openssl-1.0.1i-1.fc21.src.rpm Still present (I believe) in RH6/RH7/FC20. (Raising against Fedora because I currently use FC20 as my main desktop OS, though others will consider RH the more important packages.) This patch from Tomáš Mráz in November 2013, amongst other things, refactored the PRNG locking into a single function, private_RAND_lock(): https://lists.fedoraproject.org/pipermail/scm-commits/Week-of-Mon-20131118/1146865.html Neither the provenance, nor need, for this particular part of the patch is entirely clear. It doesn't appear to have ever been part of upstream openssl. AFAICT it originated either in RH or Fedora about then, was duplicated into both distros (and therefore picked up automatically by downstream distros such as CentOS as well). The particular problem is the change from ssleay_rand_bytes(): if (!do_not_lock) { CRYPTO_w_lock(CRYPTO_LOCK_RAND); /* prevent ssleay_rand_bytes() from trying to obtain the lock again */ CRYPTO_w_lock(CRYPTO_LOCK_RAND2); CRYPTO_THREADID_cpy(&locking_threadid, &cur); CRYPTO_w_unlock(CRYPTO_LOCK_RAND2); crypto_lock_rand = 1; } To private_LOCK_rand(): if (do_lock) { CRYPTO_w_lock(CRYPTO_LOCK_RAND); crypto_lock_rand = 1; CRYPTO_w_lock(CRYPTO_LOCK_RAND2); CRYPTO_THREADID_current(&locking_threadid); CRYPTO_w_unlock(CRYPTO_LOCK_RAND2); } (RAND is the main lock that protects the real critical section. RAND2 isn't a real critical section as such, but forces readers/writers of locking_threadid to do so atomically since thread ids are now a multi-word struct. crypto_lock_rand/locking_threadid are used to allow a recursive mutex to be built on top of the external lock API which provides only non-recursive rwlocks.) Excitingly, this reintroduces a really old upstream bug from 2001: https://www.mail-archive.com/openssl-dev@openssl.org/msg09018.html In a multi-threaded program with multiple threads calling RAND_bytes(), the following can happen: 1) thread-A calls RAND_bytes() and completes normally. LOCK_RAND is left unlocked, crypto_lock_rand is 0, and locking_threadid remains set to thread-A's id. 2) thread-B calls RAND_bytes(), and gets as far as setting crypto_lock_rand=1 before being pre-empted just before it takes the RAND2 lock. 3) thread-A calls into RAND_bytes() again, sees that crypto_lock_rand==1, takes RAND2 before thread-B can, sees its own thread id in locking_threadid, and enters the main critical section without bothering to take the RAND lock 4) thread-B takes RAND2, sets its own thread id, and also enters the main critical section 5) Within the critical section in ssleay_rand_bytes() both threads execute: st_idx=state_index; st_num=state_num; [...] state_index+=num_ceil; if (state_index > state_num) state_index %= state_num; The += and %= are done as two separate memory writes, so one thread can read st_idx in between the other thread executing the writes. It can therefore see st_idx>st_num. 6) If st_idx+(MD_DIGEST_LENGTH/2)<=st_num, we feed that part of state[] into the hash function as a single block. If however it is > st_num, we feed it in as two separate blocks, the part from st_idx to the end of state[], then the part from state[0] up to the remaining number of bytes. If st_idx itself is > st_num, the former calculation results in a "negative" length passed to the hash, which causes it to start reading memory from past the end of state[] and continue around the entire address space until it wraps round to the actual end of state[]. During this loop it eventually hits an unmapped page and segfaults. (The actual final effects may depend on which optimised version of the hash function openssl chooses, in our case with the default SHA1 we were running on SSSE3 capable x86_64 hardware.) In a multi-threaded server handling TLS connections, for appropriate ciphersuites, we read one block's worth of random for each TLS record transmitted to form the IV for the block cipher. With multiple threads fielding TLS connections, under moderate load, this causes repeated crashes anywhere from minutes to hours apart. (This isn't specifically a security bug: completely non-malicious traffic at entirely reasonable load levels triggers it.) This part of the patch was actually removed during FC21, as part of the upgrade from 1.0.1h to 1.0.1i. There appears to be no reason given why they were deemed no longer necessary (other parts of the patch remain), which makes me think fixing this bug was not the motivation. Distros that are still on 1.0.1e (FC20/RH* and others) remain affected. If the refactor is still required (and apart from this issue I think it's actually better code) for those versions, the fix is simple enough: --- openssl-1.0.1e/crypto/rand/rand_lib.c.randlock 2015-05-14 14:48:26.073062716 +0100 +++ openssl-1.0.1e/crypto/rand/rand_lib.c 2015-05-14 14:50:02.907762929 +0100 @@ -208,10 +208,10 @@ if (do_lock) { CRYPTO_w_lock(CRYPTO_LOCK_RAND); - crypto_lock_rand = 1; CRYPTO_w_lock(CRYPTO_LOCK_RAND2); CRYPTO_THREADID_current(&locking_threadid); CRYPTO_w_unlock(CRYPTO_LOCK_RAND2); + crypto_lock_rand = 1; } return do_lock; }
Thank you for the report and analysis. We will need to fix this everywhere where present. The refactoring was done due to need to fix RAND locking issues in the FIPS mode - deadlocks or no locking where needed. Unfortunately I made this mistake in the process. The refactoring was later dropped because upstream fixed the FIPS RNG locking in a different way.
The only thing I am curious about is why nobody reported the regression earlier - especially on RHEL.
I can only imagine that using openssl to implement multi-threaded TLS servers is somewhat of a rare use case. (Most of the off-the-shelf uses prefer to run multi-process single-threaded.) (The other possibility is that even in that case, as I hinted above, the exact optimised form of the hash may be important. In the SSSE3 case the inner hash block loop is formulated as a loop from a start pointer to a specific end pointer value, so the negative length causes an unchecked wrap around the address space. Other forms may check the length value directly against an index, so under signed arithmetic we simply terminate the loop early. That might mean less good randomness is produced, but won't crash.)
This message is a reminder that Fedora 20 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 20. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '20'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 20 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 22 is not affected.
openssl-1.0.1e-43.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/openssl-1.0.1e-43.fc20
(In reply to John Sullivan from comment #0) > Still present (I believe) in RH6/RH7/FC20. (Raising against Fedora because I > currently use FC20 as my main desktop OS, though others will consider RH the > more important packages.) I'm trying to reproduce this with the stress tester below. I don't see the issue. Is there something else I need to configure to expose it? I do see a data races with Helgrind, though, even on FC22, which supposedly has been fixed. #include <openssl/crypto.h> #include <openssl/rand.h> #include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <stdbool.h> static pthread_mutex_t *locks; static void lock_callback(int mode, int type, const char *file, int line) { if (mode & CRYPTO_LOCK) { if (pthread_mutex_lock(locks + type) != 0) { abort(); } } else { if (pthread_mutex_unlock(locks + type) != 0) { abort(); } } } static void init_locks(void) { int count = CRYPTO_num_locks(); locks = calloc(count, sizeof(*locks)); for (int i = 0; i < count; ++i) { if (pthread_mutex_init(locks + i, NULL) != 0) { abort(); } } CRYPTO_set_locking_callback(lock_callback); } static void * runner(void *closure) { uintptr_t size = (uintptr_t) closure; unsigned char *buffer = malloc(size); if (buffer == NULL) { abort(); } while (true) { RAND_bytes(buffer, size); } } int main(int argc, char **argv) { int threads = atoi(argv[1]); int start_size = atoi(argv[2]); int size_increment = atoi(argv[3]); init_locks(); for (int i = 1; i < threads; ++i) { pthread_t thread; if (pthread_create(&thread, NULL, runner, (void *)(uintptr_t) start_size + i * size_increment) != 0) { abort(); } } runner((void *)(uintptr_t) start_size); }
(In reply to Florian Weimer from comment #7) > (In reply to John Sullivan from comment #0) > > Still present (I believe) in RH6/RH7/FC20. (Raising against Fedora because I > > currently use FC20 as my main desktop OS, though others will consider RH the > > more important packages.) > > I'm trying to reproduce this with the stress tester below. I don't see the > issue. Is there something else I need to configure to expose it? I have been able to reproduce it with openssl-1.0.1e-30.el6.ppc64 and openssl-1.0.1e-30.el6.x86_64. Thanks.
This issue has been classified as a security flaw and hence been assigned CVE-2015-3216
Package openssl-1.0.1e-43.fc20: * should fix your issue, * was pushed to the Fedora 20 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing openssl-1.0.1e-43.fc20' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2015-9371/openssl-1.0.1e-43.fc20 then log in and leave karma (feedback).
Package openssl-1.0.1e-44.fc20: * should fix your issue, * was pushed to the Fedora 20 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing openssl-1.0.1e-44.fc20' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2015-9371/openssl-1.0.1e-44.fc20 then log in and leave karma (feedback).
Package openssl-1.0.1e-45.fc20: * should fix your issue, * was pushed to the Fedora 20 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing openssl-1.0.1e-45.fc20' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2015-9371/openssl-1.0.1e-45.fc20 then log in and leave karma (feedback).
Fedora 20 changed to end-of-life (EOL) status on 2015-06-23. Fedora 20 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.