Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
+++ This bug was initially created as a clone of Bug #2112116 +++
Description of problem:
If calling clnt_create() or one of its related functions concurrently from multiple threads, the call may occasionally deadlock, and the program making the call will hang.
The bug may affect NFS (remote file systems), and hence the Kubernetes infrastructure also, or applications that rely on managing RPC clients in parallel.
Version-Release number of selected component (if applicable):
The bug is definitely present in libtirpc versions 1.1.4 to 3.2.1. However, it likely affected at least some earlier versions of the library also.
How reproducible:
Making clnt_create calls on a multi-core system in parallel threads will produce the deadlock sooner or later. In our case on a 4-core x86_64 VM, with 8 parallel threads calling clnt_create() nearly simultaneously to 8 different RPC hosts, the deadlock typically occurs after a few dozen attempts.
Steps to Reproduce:
1. Make clnt_create() calls in multiple threads on a multicore Linux PC. Assume you have server nodes ('server1' through 'server8') running some RPC service (SOMEPROG, SOMVERS). You want to talk to these servers asynchronously in parallel threads. Each thread makes its own RPC client connection. Here is an example C test program for that particular scenario:
#include <stdio.h>
#include <stdlib.h>
#include <rpc/rpc.h>
int main() {
int i;
#pragma omp parallel for num_threads(8)
for (i = 1; i<=8; i++) {
char hostname[40];
sprintf(hostname, "server%d", i);
clnt_create(hostname, SOMEPROG, SOMEVERS, "tcp");
}
fprint(stderr, "Success!!!\n");
return 0;
}
2. Modify the above program for a particular RPC service that runs on some cluster of nodes as appropriate for their host names and RPC program info.
3. compile with -fopenmp -lpthread -ltirpc -lrt
Actual results:
The program will mostly run fine, printing "Success!!!" to stderr, and returning to the shell prompt. However, after several (few dozen) attempts, it will eventually just hang without printing anything.
Expected results:
The program should ALWAYS print "Success!!!" and ALWAYS return to the prompt. Crucially, it should never hang.
Additional info:
The expected behavior (no hangs in MT environment) was in fact the old behavior of the original SunRPC library, such as the one we use on some very old LynxOS 3.1.0 PowerPCs from the 1990s... The hanging is a regression that was introduced in libtirpc sometime after cloning the original SunRPC...
--- Additional comment from Steve Dickson on 2022-07-28 21:41:42 UTC ---
commit 667ce638454d0995170dd8e6e0668ada733d72e7
Author: Attila Kovacs <attila.kovacs.edu>
Date: Thu Jul 28 09:14:24 2022 -0400
SUNRPC: mutexed access blacklist_read state variable.
commit 3f2a5459fb00c2f529d68a4a0fd7f367a77fa65a
Author: Attila Kovacs <attila.kovacs.edu>
Date: Tue Jul 26 15:24:01 2022 -0400
thread safe clnt destruction.
commit 7a6651a31038cb19807524d0422e09271c5ffec9
Author: Attila Kovacs <attila.kovacs.edu>
Date: Tue Jul 26 15:20:05 2022 -0400
clnt_dg_freeres() uncleared set active state may deadlock.
Author: Attila Kovacs <attila.kovacs.edu>
Date: Wed Jul 20 17:03:28 2022 -0400
Eliminate deadlocks in connects with an MT environment