Bug 1762741

Summary: FIPS mutex lock used from gcry_cipher_(en|de)crypt imposes severe performance penalty
Product: Red Hat Enterprise Linux 8 Reporter: Daniel Berrangé <berrange>
Component: libgcryptAssignee: Tomas Mraz <tmraz>
Status: CLOSED ERRATA QA Contact: Ivan Nikolchev <inikolch>
Severity: medium Docs Contact:
Priority: medium    
Version: 8.0CC: inikolch, mmethot, moddi, omoris
Target Milestone: rcKeywords: Triaged
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libgcrypt-1.8.5-1.el8 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-04 01:44:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1701948    
Attachments:
Description Flags
Simple benchmark for gcrypt cipher performance none

Description Daniel Berrangé 2019-10-17 11:14:45 UTC
Description of problem:
The gcry_cipher_encrypt and gcry_cipher_decrypt methods have a FIPS check as the first thing they do:

  if (!fips_is_operational ())
     ...snip...


RHEL-8 has introduced a non-usptream patch related to FIPS:

   libgcrypt-1.8.3-fips-ctor.patch

This adds mutex lock + unlock calls in the fips_is_operational() method

Even when these mutex locks are uncontended, they are imposing a severe performance penalty on the gcry_cipher_encrypt/decrypt API calls.


The performance penalty varies according to size of data blocks that are processed in each call.

With 16 byte data blocks, the AES-ECB perf is degraded by approx 53%, with 1 MB data blocks, the AES-ECB perf is degraded by approx 7%

QEMU uses 512 byte data blocks for LUKS and suffering approx 10% degradation in ECB mode. 

With QEMU's XTS mode, data is encrypted 16 bytes at a time so we're suffering from the 53% performance penalty.

Fortunately gcrypt 1.8.0 introduced its own support for XTS mode which QEMU can switch to use. This will avoid the worst 53% performance hit, but we're still going to be suffering about 4-5% performance hit from this FIPS locking code in my current tests of QEMU code. This is significant to QEMU when we're trying to maximize our I/O throughput.

Version-Release number of selected component (if applicable):
libgcrypt-1.8.3-4.el8

How reproducible:
Always

Steps to Reproduce:
1. Run a performance benchmark on gcry_cipher_encrypt using RHEL-8 gcrypt RPMs
2. Run a performance benchmark on gcry_cipher_encrypt using plain upstream 1.8.3 release

Actual results:
RHEL-8 build is between 7% and 53% slower than the unmodified 1.8.3 release

Expected results:
RHEL-8 performance matches unnmodified 1.8.3 release performance

Additional info:

Comment 1 Daniel Berrangé 2019-10-17 11:17:42 UTC
Created attachment 1626771 [details]
Simple benchmark for gcrypt cipher performance

This patch tests gcrypt cipher performance using a variety of data chunk sizes.

Compile with:

  $ gcc `pkg-config --libs --cflags glib-2.0` -lgcrypt -o gcry-bench gcry-bench.c 

Run with no args. Inside a RHEL-8 KVM guest, running on Fedora 30 host with Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz CPUs I get this result:

$ ~/gcry-bench 
GCRYPT ECB chunk size      16 bytes 233.74 MB/sec 
GCRYPT ECB chunk size      64 bytes 728.43 MB/sec 
GCRYPT ECB chunk size     128 bytes 1133.76 MB/sec 
GCRYPT ECB chunk size     512 bytes 1773.55 MB/sec 
GCRYPT ECB chunk size   65536 bytes 2160.46 MB/sec 
GCRYPT ECB chunk size 1048576 bytes 2278.38 MB/sec 

stripping the patch for FIPS locking from the RPM I get this improvement:

$ ~/gcry-bench 
GCRYPT ECB chunk size      16 bytes 537.54 MB/sec 
GCRYPT ECB chunk size      64 bytes 1203.67 MB/sec 
GCRYPT ECB chunk size     128 bytes 1643.59 MB/sec 
GCRYPT ECB chunk size     512 bytes 2052.22 MB/sec 
GCRYPT ECB chunk size   65536 bytes 2354.87 MB/sec 
GCRYPT ECB chunk size 1048576 bytes 2346.28 MB/sec

Comment 2 Tomas Mraz 2019-10-17 11:35:47 UTC
The lock was introduced as part of the patch that postponed self-testing before the first usage of libgcrypt in an application which was fix for the dead-lock on boot due to low entropy problem.

I am afraid until the issue is resolved by introducing the jitter RNG in kernel (as in upstream kernel-5.4) this lock cannot be removed.

The use of AES-ECB to emulate AES-XTS is FIPS non-compliant, so please move to proper AES-XTS use in qemu anyway.

In the long run it would be a good idea to move away from using libgcrypt to gnutls or openssl.

Comment 3 Daniel Berrangé 2019-10-17 12:09:46 UTC
IIUC, this self-test is a one-time initialization task, and as such that shouldn't require a mutex lock every time.

It could use pthread_once() to decide whether to run the one-time initialization and thus only cost a single atomic read after the first time.

This would eliminate the worst of performance penalty I expect.

Comment 4 Tomas Mraz 2019-10-17 12:18:40 UTC
It would be possible to do however it would require thorough redesign of the state machine checks. I am not inclined to take that path. I'd prefer to spend my time backporting the HW acceleration improvements of XTS mode implementation (that you mentioned in the blocked bug) instead.

Comment 5 Daniel Berrangé 2019-10-17 12:34:55 UTC
FYI, I just filed https://bugzilla.redhat.com/show_bug.cgi?id=1762765 as an RFE for hardware accelerated AES-XTS mode.

If you're willing to consider that RFE, then I agree that this FIPS mutex issue is much less important.

Comment 14 errata-xmlrpc 2020-11-04 01:44:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: libgcrypt security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:4482