Now that some of our own multi-threading issues are fixed, we're starting to see some of OpenSSL's. In particular, during concurrent ssl_accept calls, we sometimes get crashes in X509_verify_cert and functions it calls. While researching this, I came across the following reference. http://www.openssl.org/docs/crypto/threads.html Apparently we're supposed to register a lock callback if we're going to use OpenSSL from multiple threads (as we do). We could just put our own lock around ssl_accept and any other cases we find, but those are just band-aids. A full solution would require adding code to allocate/initialize a lock array and register a lock function, as defined in the cited document. Until then, we'll never quite be sure if problems in SSL are being caused by this.
REVIEW: http://review.gluster.org/10075 (socket: use OpenSSL multi-threading interfaces) posted (#2) for review on master by Jeff Darcy (jdarcy)
REVIEW: http://review.gluster.org/10075 (socket: use OpenSSL multi-threading interfaces) posted (#3) for review on master by Jeff Darcy (jdarcy)
REVIEW: http://review.gluster.org/10075 (socket: use OpenSSL multi-threading interfaces) posted (#4) for review on master by Vijay Bellur (vbellur)
COMMIT: http://review.gluster.org/10075 committed in master by Vijay Bellur (vbellur) ------ commit 8830e90fa1b131057e4ee1742cb83d78102714c0 Author: Jeff Darcy <jdarcy> Date: Tue Mar 31 14:34:22 2015 -0400 socket: use OpenSSL multi-threading interfaces OpenSSL isn't thread-safe unless you register these locking and thread ID functions. Most often the crashes would occur around X509_verify_cert, even though it's insane that the certificate parsing functions wouldn't be thread-safe. The bug for this was filed over two years ago, but it didn't seem like a high priority because the bug didn't bite anyone until it caused a spurious regression-test failure. Ironically, that was on a test for a *different* spurious regression-test failure, which I guess is just deserts[1] for leaving this on the to-do list so long. [1] Yes, it really is "deserts" in that phrase - not as in very dry places, but from late Latin "deservire" meaning to serve well or zealously. Aren't commit messages educational? Change-Id: I2a6c0e9b361abf54efa10ffbbbe071404f82b0d9 BUG: 906763 Signed-off-by: Jeff Darcy <jdarcy> Reviewed-on: http://review.gluster.org/10075 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Kaleb KEITHLEY <kkeithle> Reviewed-by: Vijay Bellur <vbellur>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report. glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user