RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2012249 - gnutls_priority_set_direct occasionally fails with "The request is invalid"
Summary: gnutls_priority_set_direct occasionally fails with "The request is invalid"
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: gnutls
Version: 9.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Daiki Ueno
QA Contact: Alexander Sosedkin
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-08 16:22 UTC by Richard W.M. Jones
Modified: 2022-05-17 16:17 UTC (History)
7 users (show)

Fixed In Version: gnutls-3.7.2-9.el9
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-05-17 15:52:13 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
log file without gnutls debugging (19.21 KB, text/plain)
2021-10-08 16:23 UTC, Richard W.M. Jones
no flags Details
synch-parallel-tls.sh.log (15.83 KB, text/plain)
2021-10-25 08:03 UTC, Richard W.M. Jones
no flags Details
Log with GNUTLS_DEBUG_LEVEL=10 (30.59 KB, text/plain)
2021-10-25 17:31 UTC, Richard W.M. Jones
no flags Details
tlsthread.c (1.03 KB, text/plain)
2021-10-25 18:31 UTC, Richard W.M. Jones
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Gitlab redhat/centos-stream/rpms gnutls merge_requests 13 0 None None None 2021-10-26 11:05:59 UTC
Red Hat Issue Tracker CRYPTO-5240 0 None None None 2021-11-09 16:32:16 UTC
Red Hat Issue Tracker RHELPLAN-99313 0 None None None 2021-10-08 16:22:43 UTC
Red Hat Product Errata RHBA-2022:3937 0 None None None 2022-05-17 15:52:22 UTC

Description Richard W.M. Jones 2021-10-08 16:22:26 UTC
Description of problem:

In RHEL 9, possibly when using lots of threads, calling
gnutls_priority_set_direct can fail with the error

error: failed to set TLS session priority to @NBDKIT,SYSTEM:+ECDHE-PSK:+DHE-PSK:+PSK: The request is invalid.

Version-Release number of selected component (if applicable):

gnutls-3.7.2-4.el9.x86_64

How reproducible:

Rare

Steps to Reproduce:

$ git clone https://gitlab.com/nbdkit/libnbd
$ sudo dnf builddep libnbd
$ cd libnbd
$ ./configure
$ make
$ while make -C tests check TESTS=synch-parallel-tls.sh  >& /tmp/log; do echo -n . ; done

Eventually it should fail.  For the log see tests/synch-parallel-tls.sh.log

Comment 1 Richard W.M. Jones 2021-10-08 16:23:55 UTC
Created attachment 1830898 [details]
log file without gnutls debugging

Comment 2 Richard W.M. Jones 2021-10-08 17:15:18 UTC
Only seems to be reproducible in RHEL 9.  I cannot reproduce it in Fedora.

Some think it might be connected to this non-upstream change
which is only in RHEL 9:
https://gitlab.com/gnutls/gnutls/-/merge_requests/1427

Comment 3 Daiki Ueno 2021-10-14 12:57:39 UTC
3.7.2-4 is the package that re-introduced LTO enablement after a long time. As it created several obscure issues in tests when running on aarch64 and ppc64le, we disabled LTO on those arches in 3.7.2-6. As far as I read from the original thread, the failure seems to happen only on aarch64 with 3.7.2-4, so I would suggest building with the latest gnutls package (3.7.2-7).

Comment 4 Richard W.M. Jones 2021-10-25 08:03:09 UTC
Created attachment 1836609 [details]
synch-parallel-tls.sh.log

My locally testing is on x86-64.

The bug still happens (perhaps less often?) with gnutls-3.7.2-7.el9.x86_64

Attached latest log of the failure.

Comment 5 Daniel Berrangé 2021-10-25 08:21:41 UTC
(In reply to Daiki Ueno from comment #3)
> 3.7.2-4 is the package that re-introduced LTO enablement after a long time.
> As it created several obscure issues in tests when running on aarch64 and
> ppc64le, we disabled LTO on those arches in 3.7.2-6. 

(In reply to Richard W.M. Jones from comment #4)
> My locally testing is on x86-64.
>
> The bug still happens (perhaps less often?) with gnutls-3.7.2-7.el9.x86_64

Perhaps worth doing a gnutls scratch build with LTO disabled on x86_64 too, and seeing if that solves it, as LTO has been a source of many wierd  non-deterministic bugs.

Comment 6 Richard W.M. Jones 2021-10-25 08:39:15 UTC
Oh interesting, I thought LTO had been disabled on all architectures.
I did a scratch build with LTO disabled on x86-64 too which I will
test once it has finished:
https://kojihub.stream.rdu2.redhat.com/koji/taskinfo?taskID=746280

Comment 7 Richard W.M. Jones 2021-10-25 09:13:13 UTC
(In reply to Richard W.M. Jones from comment #6)
> https://kojihub.stream.rdu2.redhat.com/koji/taskinfo?taskID=746280

This did *not* fix the problem, so it's not LTO.

Comment 8 Richard W.M. Jones 2021-10-25 10:08:03 UTC
(In reply to Richard W.M. Jones from comment #7)
> (In reply to Richard W.M. Jones from comment #6)
> > https://kojihub.stream.rdu2.redhat.com/koji/taskinfo?taskID=746280
> 
> This did *not* fix the problem, so it's not LTO.

My apologies, I was reading the wrong log file.  In fact this package
does fix the problem, so it is a problem related to LTO on x86-64.

Comment 9 Richard W.M. Jones 2021-10-25 10:11:49 UTC
Oh I hate intermittent errors!  Just as I hit submit on that comment, the
test which had run successfully for 100+ cycles failed again with the
same problem.

This is with LTO disabled, so the problem still seems to be present and
NOT related to LTO after all.

I'm going to try this with upstream gnutls, and also see if I can get a
more reliable test case.

Comment 10 Richard W.M. Jones 2021-10-25 11:01:17 UTC
It would be really nice if gnutls got rid of the requirement for autogen.
This is not available on RHEL 9 and almost impossible to build on RHEL 9
because it depends on both itself and gnulib.

Comment 11 Richard W.M. Jones 2021-10-25 17:31:35 UTC
Created attachment 1836943 [details]
Log with GNUTLS_DEBUG_LEVEL=10

Comment 12 Daiki Ueno 2021-10-25 18:25:26 UTC
(In reply to Richard W.M. Jones from comment #11)
> Created attachment 1836943 [details]
> Log with GNUTLS_DEBUG_LEVEL=10

OK, thank you so much for looking into this; it seems indeed like a race condition: the resolved priority string is stored in the global variable system_wide_priority_string, while the other threads may independently update the variable, without lock. A similar race seems to be found in _gnutls_unload_system_priorities() for system_wide_priority_strings, though it is not called so frequently as the caller checks mtime of the config file.  I'll create a patch shortly.

Comment 13 Richard W.M. Jones 2021-10-25 18:31:36 UTC
Created attachment 1836965 [details]
tlsthread.c

This is a reproducer.  It fails for me reliably and in a couple
of interesting ways.  It does not require anything except RHEL 9
and gnutls-devel.

$ gcc -O2 -Wall -pthread tlsthread.c -o tlsthread -lgnutls                      
$ while ./tlsthread ; do echo -n . ; done                                       
.............................................................................................................................................../tlsthread: gnutls_priority_set_direct: The request is invalid.

Sometimes it fails indicating memory corruption:

$ while ./tlsthread ; do echo -n .; done
tcache_thread_shutdown(): unaligned tcache chunk detected
Aborted (core dumped)

(The stack trace from this was not very interesting)

Comment 14 Daiki Ueno 2021-10-26 12:24:23 UTC
Thank you for the reproducer. I've created a scratch build with the proposed fix:
https://kojihub.stream.rdu2.redhat.com/koji/taskinfo?taskID=748121

Comment 21 errata-xmlrpc 2022-05-17 15:52:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (new packages: gnutls), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:3937


Note You need to log in before you can comment on or make changes to this bug.