RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2112125 - Multithreaded clnt_create() may deadlock.
Summary: Multithreaded clnt_create() may deadlock.
Keywords:
Status: CLOSED DUPLICATE of bug 2118157
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: libtirpc
Version: 9.1
Hardware: All
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Steve Dickson
QA Contact: Zhi Li
URL:
Whiteboard:
Depends On: 2112116
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-28 21:43 UTC by Steve Dickson
Modified: 2022-08-16 14:02 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2112116
Environment:
Last Closed: 2022-08-16 14:02:12 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-129540 0 None None None 2022-07-28 21:46:00 UTC

Description Steve Dickson 2022-07-28 21:43:07 UTC
+++ This bug was initially created as a clone of Bug #2112116 +++

Description of problem:

If calling clnt_create() or one of its related functions concurrently from multiple threads, the call may occasionally deadlock, and the program making the call will hang.

The bug may affect NFS (remote file systems), and hence the Kubernetes infrastructure also, or applications that rely on managing RPC clients in parallel.   

Version-Release number of selected component (if applicable):

The bug is definitely present in libtirpc versions 1.1.4 to 3.2.1. However, it likely affected at least some earlier versions of the library also.

How reproducible:

Making clnt_create calls on a multi-core system in parallel threads will produce the deadlock sooner or later. In our case on a 4-core x86_64 VM, with 8 parallel threads calling clnt_create() nearly simultaneously to 8 different RPC hosts, the deadlock typically occurs after a few dozen attempts.


Steps to Reproduce:

1. Make clnt_create() calls in multiple threads on a multicore Linux PC. Assume you have server nodes ('server1' through 'server8') running some RPC service (SOMEPROG, SOMVERS). You want to talk to these servers asynchronously in parallel threads. Each thread makes its own RPC client connection. Here is an example C test program for that particular scenario: 

 #include <stdio.h>
 #include <stdlib.h>
 #include <rpc/rpc.h>

 int main() {
   int i;

   #pragma omp parallel for num_threads(8)
   for (i = 1; i<=8; i++) {
     char hostname[40];
     sprintf(hostname, "server%d", i);
     clnt_create(hostname, SOMEPROG, SOMEVERS, "tcp");
   }

   fprint(stderr, "Success!!!\n");
   return 0;
 } 

2. Modify the above program for a particular RPC service that runs on some cluster of nodes as appropriate for their host names and RPC program info. 

3. compile with -fopenmp -lpthread -ltirpc -lrt


Actual results:

The program will mostly run fine, printing "Success!!!" to stderr, and returning to the shell prompt. However, after several (few dozen) attempts, it will eventually just hang without printing anything.

Expected results:

The program should ALWAYS print "Success!!!" and ALWAYS return to the prompt. Crucially, it should never hang. 

Additional info:

The expected behavior (no hangs in MT environment) was in fact the old behavior of the original SunRPC library, such as the one we use on some very old LynxOS 3.1.0 PowerPCs from the 1990s... The hanging is a regression that was introduced in libtirpc sometime after cloning the original SunRPC...

--- Additional comment from Steve Dickson on 2022-07-28 21:41:42 UTC ---

commit 667ce638454d0995170dd8e6e0668ada733d72e7
Author: Attila Kovacs <attila.kovacs.edu>
Date:   Thu Jul 28 09:14:24 2022 -0400

    SUNRPC: mutexed access blacklist_read state variable.

commit 3f2a5459fb00c2f529d68a4a0fd7f367a77fa65a
Author: Attila Kovacs <attila.kovacs.edu>
Date:   Tue Jul 26 15:24:01 2022 -0400

    thread safe clnt destruction.

commit 7a6651a31038cb19807524d0422e09271c5ffec9
Author: Attila Kovacs <attila.kovacs.edu>
Date:   Tue Jul 26 15:20:05 2022 -0400

    clnt_dg_freeres() uncleared set active state may deadlock.


Author: Attila Kovacs <attila.kovacs.edu>
Date:   Wed Jul 20 17:03:28 2022 -0400

    Eliminate deadlocks in connects with an MT environment

Comment 4 Steve Dickson 2022-08-01 18:16:07 UTC
This patch is also needed:

commit fa153d634228216fc162e5d6583a7035af2c40ba (HEAD -> master, tag: libtirpc-1-3-3-rc5)
Author: Attila Kovacs <attila.kovacs.edu>
Date:   Mon Aug 1 11:28:43 2022 -0400

    SUNRPC: MT-safe overhaul of address cache management in rpcb_clnt.c

(In reply to Steve Dickson from comment #0)
> 
> commit 667ce638454d0995170dd8e6e0668ada733d72e7
> Author: Attila Kovacs <attila.kovacs.edu>
> Date:   Thu Jul 28 09:14:24 2022 -0400
> 
>     SUNRPC: mutexed access blacklist_read state variable.
> 
> commit 3f2a5459fb00c2f529d68a4a0fd7f367a77fa65a
> Author: Attila Kovacs <attila.kovacs.edu>
> Date:   Tue Jul 26 15:24:01 2022 -0400
> 
>     thread safe clnt destruction.
> 
> commit 7a6651a31038cb19807524d0422e09271c5ffec9
> Author: Attila Kovacs <attila.kovacs.edu>
> Date:   Tue Jul 26 15:20:05 2022 -0400
> 
>     clnt_dg_freeres() uncleared set active state may deadlock.
> 
> 
> Author: Attila Kovacs <attila.kovacs.edu>
> Date:   Wed Jul 20 17:03:28 2022 -0400
> 
>     Eliminate deadlocks in connects with an MT environment

Comment 5 Steve Dickson 2022-08-16 14:02:12 UTC

*** This bug has been marked as a duplicate of bug 2118157 ***


Note You need to log in before you can comment on or make changes to this bug.