Bug 1762741

Summary:

FIPS mutex lock used from gcry_cipher_(en|de)crypt imposes severe performance penalty

Product:

Red Hat Enterprise Linux 8

Reporter:

Daniel Berrangé <berrange>

Component:

libgcrypt

Assignee:

Tomas Mraz <tmraz>

Status:

CLOSED ERRATA

QA Contact:

Ivan Nikolchev <inikolch>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

8.0

CC:

inikolch, mmethot, moddi, omoris

Target Milestone:

Keywords:

Triaged

Target Release:

8.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

libgcrypt-1.8.5-1.el8

Doc Type:

No Doc Update

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2020-11-04 01:44:25 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1701948

Attachments:

Description	Flags
Simple benchmark for gcrypt cipher performance	none

Description Daniel Berrangé 2019-10-17 11:14:45 UTC

Description of problem:
The gcry_cipher_encrypt and gcry_cipher_decrypt methods have a FIPS check as the first thing they do:

if (!fips_is_operational ())
...snip...

RHEL-8 has introduced a non-usptream patch related to FIPS:

libgcrypt-1.8.3-fips-ctor.patch

This adds mutex lock + unlock calls in the fips_is_operational() method

Even when these mutex locks are uncontended, they are imposing a severe performance penalty on the gcry_cipher_encrypt/decrypt API calls.

The performance penalty varies according to size of data blocks that are processed in each call.

With 16 byte data blocks, the AES-ECB perf is degraded by approx 53%, with 1 MB data blocks, the AES-ECB perf is degraded by approx 7%

QEMU uses 512 byte data blocks for LUKS and suffering approx 10% degradation in ECB mode.

With QEMU's XTS mode, data is encrypted 16 bytes at a time so we're suffering from the 53% performance penalty.

Fortunately gcrypt 1.8.0 introduced its own support for XTS mode which QEMU can switch to use. This will avoid the worst 53% performance hit, but we're still going to be suffering about 4-5% performance hit from this FIPS locking code in my current tests of QEMU code. This is significant to QEMU when we're trying to maximize our I/O throughput.

Version-Release number of selected component (if applicable):
libgcrypt-1.8.3-4.el8

How reproducible:
Always

Steps to Reproduce:
1. Run a performance benchmark on gcry_cipher_encrypt using RHEL-8 gcrypt RPMs
2. Run a performance benchmark on gcry_cipher_encrypt using plain upstream 1.8.3 release

Actual results:
RHEL-8 build is between 7% and 53% slower than the unmodified 1.8.3 release

Expected results:
RHEL-8 performance matches unnmodified 1.8.3 release performance

Additional info:

Comment 1 Daniel Berrangé 2019-10-17 11:17:42 UTC

Created attachment 1626771 [details]
Simple benchmark for gcrypt cipher performance

This patch tests gcrypt cipher performance using a variety of data chunk sizes.

Compile with:

  $ gcc `pkg-config --libs --cflags glib-2.0` -lgcrypt -o gcry-bench gcry-bench.c 

Run with no args. Inside a RHEL-8 KVM guest, running on Fedora 30 host with Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz CPUs I get this result:

$ ~/gcry-bench 
GCRYPT ECB chunk size      16 bytes 233.74 MB/sec 
GCRYPT ECB chunk size      64 bytes 728.43 MB/sec 
GCRYPT ECB chunk size     128 bytes 1133.76 MB/sec 
GCRYPT ECB chunk size     512 bytes 1773.55 MB/sec 
GCRYPT ECB chunk size   65536 bytes 2160.46 MB/sec 
GCRYPT ECB chunk size 1048576 bytes 2278.38 MB/sec 

stripping the patch for FIPS locking from the RPM I get this improvement:

$ ~/gcry-bench 
GCRYPT ECB chunk size      16 bytes 537.54 MB/sec 
GCRYPT ECB chunk size      64 bytes 1203.67 MB/sec 
GCRYPT ECB chunk size     128 bytes 1643.59 MB/sec 
GCRYPT ECB chunk size     512 bytes 2052.22 MB/sec 
GCRYPT ECB chunk size   65536 bytes 2354.87 MB/sec 
GCRYPT ECB chunk size 1048576 bytes 2346.28 MB/sec

Comment 2 Tomas Mraz 2019-10-17 11:35:47 UTC

The lock was introduced as part of the patch that postponed self-testing before the first usage of libgcrypt in an application which was fix for the dead-lock on boot due to low entropy problem.

I am afraid until the issue is resolved by introducing the jitter RNG in kernel (as in upstream kernel-5.4) this lock cannot be removed.

The use of AES-ECB to emulate AES-XTS is FIPS non-compliant, so please move to proper AES-XTS use in qemu anyway.

In the long run it would be a good idea to move away from using libgcrypt to gnutls or openssl.

Comment 3 Daniel Berrangé 2019-10-17 12:09:46 UTC

IIUC, this self-test is a one-time initialization task, and as such that shouldn't require a mutex lock every time.

It could use pthread_once() to decide whether to run the one-time initialization and thus only cost a single atomic read after the first time.

This would eliminate the worst of performance penalty I expect.

Comment 4 Tomas Mraz 2019-10-17 12:18:40 UTC

It would be possible to do however it would require thorough redesign of the state machine checks. I am not inclined to take that path. I'd prefer to spend my time backporting the HW acceleration improvements of XTS mode implementation (that you mentioned in the blocked bug) instead.

Comment 5 Daniel Berrangé 2019-10-17 12:34:55 UTC

FYI, I just filed https://bugzilla.redhat.com/show_bug.cgi?id=1762765 as an RFE for hardware accelerated AES-XTS mode.

If you're willing to consider that RFE, then I agree that this FIPS mutex issue is much less important.

Comment 14 errata-xmlrpc 2020-11-04 01:44:25 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: libgcrypt security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:4482