Hide Forgot
+++ This bug was initially created as a clone of Bug #671484 +++ Description of problem: The CHIL Engine, used to access Thales/nCipher hardware, requires that thread locking upcalls be set regardless of whether the calling program is multithreaded. This is discussed in the following OpenSSL ticket: http://rt.openssl.org/Ticket/Display.html?id=1736 This issue was caused to go away in the following OpenSSL commit: http://cvs.openssl.org/filediff?f=openssl/engines/e_chil.c&v1=1.1.2.5&v2=1.1.2.6 which was part of 0.9.8j. Version-Release number of selected component (if applicable): Any OpenSSL before 0.9.8j is affected. Any Red Hat Enterprise Linux 5 release is affected. Tests conducted with OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008, and openssl-0.9.8e-12.el5_5.7 on RHEL 5.5. How reproducible: Attempt to load the CHIL engine into the Red Hat supplied copy of OpenSSL. Steps to Reproduce: 1. Install nCipher hardware. 2. Install nCipher software. 3. Set and export LD_LIBRARY_PATH=/opt/nfast/toolkits/hwcrhk 4. Call /usr/bin/openssl engine -vvvv -tt -c chil Actual results: (chil) CHIL hardware engine support [RSA, DH, RAND] [ unavailable ] 13205:error:80067072:CHIL engine:HWCRHK_INIT:locking missing:e_chil.c:594:You HAVE to add dynamic locking callbacks via CRYPTO_set_dynlock_{create,lock,destroy}_callback() SO_PATH: Specifies the path to the 'hwcrhk' shared library (input flags): STRING FORK_CHECK: Turns fork() checking on or off (boolean) (input flags): NUMERIC THREAD_LOCKING: Turns thread-safe locking on or off (boolean) (input flags): NUMERIC Expected results: (chil) CHIL hardware engine support [RSA, DH, RAND] [ available ] SO_PATH: Specifies the path to the 'hwcrhk' shared library (input flags): STRING FORK_CHECK: Turns fork() checking on or off (boolean) (input flags): NUMERIC THREAD_LOCKING: Turns thread-safe locking on or off (boolean) (input flags): NUMERIC Additional info: If a multithreaded program calls OpenSSL and loads the CHIL Engine without setting those callbacks, unexpected behavior may occur. It is my opinion that in this case, you get what you deserve. Apache 2.2 was updated to set the right upcalls in the same timeframe the OpenSSL RT issue was discussed, and this fix was already backported by Red Hat. Description of problem: The CHIL Engine (used to access Thales/nCipher cryptographic hardware) sets an ex_data table entry in OpenSSL, with a function pointer to the cleanup function located in the CHIL engine binary. When a calling program unloads the Engine, the cleanup function pointer is not cleared. When the calling program loads the Engine again, a second function pointer is added to the ex_data cleanup stack, leaving the first one in place but likely pointing to invalid memory. This crashes the calling program as soon as it attempts to clean up the ex_data entry. Version-Release number of selected component (if applicable): Any OpenSSL 0.9.8 RPM. Any OpenSSL version from openssl.org How reproducible: Apache 2.2.x, as supplied by Red Hat or downloaded from apache.org, does this double library load and triggers this issue any time the Engine library text does not land in the same memory location second time around, which is most of the time. Steps to Reproduce: 1. Install Thales nCipher HSM 2. Install nCSS Software 3. Add /opt/nfast/toolkits/hwcrhk to /etc/ld.so.conf.d/nfast, run ldconfig -l (if I recall) 4. Edit /etc/httpd/conf/httpd.conf (if I recall) to add SSLCryptoDevice chil anywhere 5. /etc/init.d/httpd start. Actual results: httpd server segfaults. Expected results: httpd server does not segfault. Additional info: This patch to the openssl upstream fixes the issue: http://cvs.openssl.org/filediff?f=openssl/engines/e_chil.c&v1=1.1.2.9&v2=1.1.2.10
Great I have a NetHSM connected to my RHEL6 machine and are able to verify this issue.
But I cannot reproduce the issue on the latest openssl (which should NOT contain the fix AFAIK). Tomas is it possible that this bug is NOTABUG on RHEL6? It works even on openssl-1.0.0-1.el6. # rpm -q openssl openssl-1.0.0-10.el6_1.3.x86_64 # ldconfig -p | grep nfast libnfhwcrhk.so (libc6,x86-64) => /opt/nfast/toolkits/hwcrhk/libnfhwcrhk.so # /usr/bin/openssl engine -vvvv -tt -c chil (chil) CHIL hardware engine support [RSA, DH, RAND] [ available ] SO_PATH: Specifies the path to the 'hwcrhk' shared library (input flags): STRING FORK_CHECK: Turns fork() checking on (non-zero) or off (zero) (input flags): NUMERIC THREAD_LOCKING: Turns thread-safe locking on (zero) or off (non-zero) (input flags): NUMERIC SET_USER_INTERFACE: Set the global user interface (internal) (input flags): [Internal] SET_CALLBACK_DATA: Set the global user interface extra data (internal) (input flags): [Internal]
(In reply to comment #5) > But I cannot reproduce the issue on the latest openssl (which should NOT > contain the fix AFAIK). Tomas is it possible that this bug is NOTABUG on RHEL6? > It works even on openssl-1.0.0-1.el6. > > # rpm -q openssl > openssl-1.0.0-10.el6_1.3.x86_64 The thread locking upcall fix was applied to OpenSSL before 1.0.0 was forked, so all versions have it. Could you please see if the second portion of the report, causing Apache to crash, is still an issue with this OpenSSL version? I don't have EL6 so I can't verify this.
Sure, it works for me withouth issue in RHEL6. To be precise adding "SSLCryptoDevice chil" to /etc/httpd/conf.d/ssl.conf makes httpd restart OK. Sander do you know how to test the performance of the chil engine? I tried openssl speed with -engine chil (with and w/o -evp) and I cannot see that the engine would bring some performance gain.
Miroslav, please ensure that ssl.conf is in fact included in the main config by making an HTTPS connection to the server. Please note that the CHIL enginge only registers for RSA exponentiation. Any other algorithms requested by the calling program will be handled by the OpenSSL software. When running openssl speed, add -elapsed to the options so that it uses the wall clock time instead of CPU cycles to calculate performance: since the host CPU is barely active when running against the module, the output is strongly skewed. Also, to get maximum benefit from the module, use -multi X to spawn multiple processes. Choosing 2 < X < 20 should max out the module. You can check the load on the module by periodically running /opt/nfast/bin/stattree PerModule 1 ModuleJobStats. Look at the CPULoadPercent to see whether the module is operating at capacity (I get mine up to 94% with 16 processes). Even with multiple processes, you'll find the module slower than your host computer. This type of device used to be known as crypto-accelerator, but computers are so much faster now that they will beat most of our modules in a head-to-head speed test. The discrepancy will be smaller for larger key sizes (2048, 4096). You'll also get more performance benefit when the host computer is supposed to be doing something else like, say, run a web app.
(In reply to comment #8) > Miroslav, please ensure that ssl.conf is in fact included in the main config by > making an HTTPS connection to the server. ssl.conf is included by default with installed mod_ssl package on RHEL systems. HTTPS connection is successful I tested just now. > > Please note that the CHIL enginge only registers for RSA exponentiation. Any > other algorithms requested by the calling program will be handled by the > OpenSSL software. When running openssl speed, add -elapsed to the options so > that it uses the wall clock time instead of CPU cycles to calculate > performance: since the host CPU is barely active when running against the > module, the output is strongly skewed. Also, to get maximum benefit from the > module, use -multi X to spawn multiple processes. Choosing 2 < X < 20 should > max out the module. I'm using netHSM 6000 connected via network. I do not have a PCI card available so maybe this may cause alsmost constant sign/verify times below. With chil engine: # openssl speed -engine chil -elapsed -multi 18 rsa sign verify sign/s verify/s rsa 512 bits 0.012221s 0.011017s 81.8 90.8 rsa 1024 bits 0.012137s 0.011310s 82.4 88.4 rsa 2048 bits 0.011060s 0.010911s 90.4 91.7 rsa 4096 bits 0.011680s 0.036412s 85.6 27.5 W/o chil engine: # openssl speed -elapsed -multi 18 rsa sign verify sign/s verify/s rsa 512 bits 0.000091s 0.000008s 10973.7 129813.0 rsa 1024 bits 0.000450s 0.000023s 2222.6 43124.6 rsa 2048 bits 0.002741s 0.000080s 364.8 12423.1 rsa 4096 bits 0.019434s 0.000303s 51.5 3301.6 > > You can check the load on the module by periodically running > /opt/nfast/bin/stattree PerModule 1 ModuleJobStats. Look at the CPULoadPercent > to see whether the module is operating at capacity (I get mine up to 94% with > 16 processes). with 18 processes I see only up to 3-4% of CPULoadPercent though this may be expected as I'm using netHSM: # /opt/nfast/bin/stattree PerModule 1 ModuleJobStats +#PerModule: +#1: +#ModuleJobStats: -CmdCount 11980925 -ReplyCount 11980923 -CmdBytes 3275495060 -ReplyBytes 2122423056 -HostWriteCount 11336366 -HostWriteErrors 0 -HostReadCount 22358238 -HostReadErrors 0 -HostReadEmpty 0 -HostReadDeferred 10760961 -HostReadTerminated 0 -PFNIssued 267636 -PFNRejected 0 -PFNCompleted 267635 -ANIssued 11 -CPULoadPercent 3 > > Even with multiple processes, you'll find the module slower than your host > computer. This type of device used to be known as crypto-accelerator, but > computers are so much faster now that they will beat most of our modules in a > head-to-head speed test. The discrepancy will be smaller for larger key sizes > (2048, 4096). You'll also get more performance benefit when the host computer > is supposed to be doing something else like, say, run a web app. Thanks for your kind answers and explanation!
Please disregard the first part of the bug description up to the second "Description of problem".
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1730.html