Bug 2112116

Summary:	Multithreaded clnt_create() may deadlock.
Product:	Red Hat Enterprise Linux 8	Reporter:	Attila Kovacs <attipaci>
Component:	libtirpc	Assignee:	Steve Dickson <steved>
Status:	CLOSED ERRATA	QA Contact:	Zhi Li <yieli>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	8.6	CC:	xzhou, yoyang
Target Milestone:	rc	Keywords:	Patch, Triaged
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	libtirpc-1.1.4-8.el8	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	2112125 (view as bug list)		Environment:
Last Closed:	2022-11-08 10:51:52 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	2112125

Description Attila Kovacs 2022-07-28 21:18:51 UTC

Description of problem:

If calling clnt_create() or one of its related functions concurrently from multiple threads, the call may occasionally deadlock, and the program making the call will hang.

The bug may affect NFS (remote file systems), and hence the Kubernetes infrastructure also, or applications that rely on managing RPC clients in parallel.   

Version-Release number of selected component (if applicable):

The bug is definitely present in libtirpc versions 1.1.4 to 3.2.1. However, it likely affected at least some earlier versions of the library also.

How reproducible:

Making clnt_create calls on a multi-core system in parallel threads will produce the deadlock sooner or later. In our case on a 4-core x86_64 VM, with 8 parallel threads calling clnt_create() nearly simultaneously to 8 different RPC hosts, the deadlock typically occurs after a few dozen attempts.


Steps to Reproduce:

1. Make clnt_create() calls in multiple threads on a multicore Linux PC. Assume you have server nodes ('server1' through 'server8') running some RPC service (SOMEPROG, SOMVERS). You want to talk to these servers asynchronously in parallel threads. Each thread makes its own RPC client connection. Here is an example C test program for that particular scenario: 

 #include <stdio.h>
 #include <stdlib.h>
 #include <rpc/rpc.h>

 int main() {
   int i;

   #pragma omp parallel for num_threads(8)
   for (i = 1; i<=8; i++) {
     char hostname[40];
     sprintf(hostname, "server%d", i);
     clnt_create(hostname, SOMEPROG, SOMEVERS, "tcp");
   }

   fprint(stderr, "Success!!!\n");
   return 0;
 } 

2. Modify the above program for a particular RPC service that runs on some cluster of nodes as appropriate for their host names and RPC program info. 

3. compile with -fopenmp -lpthread -ltirpc -lrt


Actual results:

The program will mostly run fine, printing "Success!!!" to stderr, and returning to the shell prompt. However, after several (few dozen) attempts, it will eventually just hang without printing anything.

Expected results:

The program should ALWAYS print "Success!!!" and ALWAYS return to the prompt. Crucially, it should never hang. 

Additional info:

The expected behavior (no hangs in MT environment) was in fact the old behavior of the original SunRPC library, such as the one we use on some very old LynxOS 3.1.0 PowerPCs from the 1990s... The hanging is a regression that was introduced in libtirpc sometime after cloning the original SunRPC...

Comment 1 Steve Dickson 2022-07-28 21:41:42 UTC

commit 667ce638454d0995170dd8e6e0668ada733d72e7
Author: Attila Kovacs <attila.kovacs.edu>
Date:   Thu Jul 28 09:14:24 2022 -0400

    SUNRPC: mutexed access blacklist_read state variable.

commit 3f2a5459fb00c2f529d68a4a0fd7f367a77fa65a
Author: Attila Kovacs <attila.kovacs.edu>
Date:   Tue Jul 26 15:24:01 2022 -0400

    thread safe clnt destruction.

commit 7a6651a31038cb19807524d0422e09271c5ffec9
Author: Attila Kovacs <attila.kovacs.edu>
Date:   Tue Jul 26 15:20:05 2022 -0400

    clnt_dg_freeres() uncleared set active state may deadlock.


Author: Attila Kovacs <attila.kovacs.edu>
Date:   Wed Jul 20 17:03:28 2022 -0400

    Eliminate deadlocks in connects with an MT environment

Comment 2 Yongcheng Yang 2022-07-29 07:35:04 UTC

(In reply to Attila Kovacs from comment #0)
> ...
> 
> How reproducible:
> 
> Making clnt_create calls on a multi-core system in parallel threads will
> produce the deadlock sooner or later. In our case on a 4-core x86_64 VM,
> with 8 parallel threads calling clnt_create() nearly simultaneously to 8
> different RPC hosts, the deadlock typically occurs after a few dozen
> attempts.

I have just tried this in a machine with 32 core-cpu (with 8x threads) 100 attempts but didn't trigger the hang.

I.e.
#define NFS_PROGRAM    (100003)
int main() {
    int i;
    for (i = 1; i<=256; i++) {
        char hostname[40];
        sprintf(hostname, "server%d", i);
        clnt_create(hostname, NFS_PROGRAM, 4, "tcp");
    }
}

However, I just added this scenario as our regression testcase.

Comment 3 Attila Kovacs 2022-07-29 12:49:23 UTC

(In reply to Yongcheng Yang from comment #2)
> (In reply to Attila Kovacs from comment #0)
> > ...
> > 
> > How reproducible:
> > 
> > Making clnt_create calls on a multi-core system in parallel threads will
> > produce the deadlock sooner or later. In our case on a 4-core x86_64 VM,
> > with 8 parallel threads calling clnt_create() nearly simultaneously to 8
> > different RPC hosts, the deadlock typically occurs after a few dozen
> > attempts.
> 
> I have just tried this in a machine with 32 core-cpu (with 8x threads) 100
> attempts but didn't trigger the hang.
> 
> I.e.
> #define NFS_PROGRAM    (100003)
> int main() {
>     int i;
>     for (i = 1; i<=256; i++) {
>         char hostname[40];
>         sprintf(hostname, "server%d", i);
>         clnt_create(hostname, NFS_PROGRAM, 4, "tcp");
>     }
> }
> 
> However, I just added this scenario as our regression testcase.

I assume you had server1 through server256 running NFS for the test... How did you then parallelize the above test case (the quoted test code does not have an OMP pragma -- so it seems sequential by design)?

Comment 4 Yongcheng Yang 2022-07-29 14:10:50 UTC

(In reply to Attila Kovacs from comment #3)
> (In reply to Yongcheng Yang from comment #2)
> 
> I assume you had server1 through server256 running NFS for the test... How
> did you then parallelize the above test case (the quoted test code does not
> have an OMP pragma -- so it seems sequential by design)?

TBH I don't know the mean of "OMP pragma".

From my understanding it parallelize as run clnt_create() from server1 to server256. And I execute this reproducer many times sequentially then.

Am I misunderstanding?

Comment 5 Attila Kovacs 2022-07-29 14:41:39 UTC

Well without OpenMP converting the 'for' loop to a parallel execution with the:

  #pragma omp parallel for

directive directly above the 'for' statement, the loop body  is going to be executed sequentially. In that case, the clnt_create() calls in the body have no chance for racing against one another. Perhaps it's better to make the threading more explicit with pthreads. (OMP is great for quick and dirty multithreading, but it's harder to be sure that the threading actually happens, since the program will compile file even if the #pragma is not actually being used.). So here is the explicily threaded version of the test:

#define NFS_PROGRAM    (100003)
#define N_SERVERS      (256)

static void *connect_thread(void *arg) {
    return (void *) clnt_create((char *) arg, NFS_PROGRAM, 4, "tcp");
}

int main() {
     char hostname[N_SERVERS][40];
     pthread_t tid[N_SERVERS];
     int i;
 
     // Set up the server node names first...
     for (i = 1; i <= N_SERVERS; i++)
         sprintf(hostname[i], "server%d", i);

     // Create RPC clients in parallel...
     for (i = 1; i <= N_SERVERS; i++) 
         pthread_create(&tid[i], NULL, connect_thread, (void *) hostname[i]);

     // Wait for the clients to be created and check that they are valid...
     for (i = 1; i <= N_SERVERS; i++) {
         CLIENT *cl;
         pthread_join(tid[i], (void *) &cl);
         if (cl == NULL)
             fprintf(stderr, "WARNING! Client creation to %s failed.\n", hostname[i]); 
     }
}

Also, coming to think of it, server response times and/or DNS lookup times probably matter too, since they affect how long steps of the client creation take or block. In our case round-trips to the server nodes are longish, typically 2--3 ms. Perhaps that plays a role in the reproducibility. There may be other factors also... On our end the hangs are fairly commonly reproducible, but there might not be an easy way to set up a universal test that works reliably in all environments.

-- A.

Comment 6 Yongcheng Yang 2022-08-01 08:07:15 UTC

Many thanks to Attila for the explanation and the pthread approach!
I can understand it now.

But looks like the response time does matter as both reproducers can't trigger the hangs in our environment.

Anyway this is a great test scenario and thank you for providing it.

Comment 7 Attila Kovacs 2022-08-01 08:37:59 UTC

I just noticed that in the pthread example, you'll need to size the arrays to [N_SERVERS+1] instead of [N_SERVERS] given the way I set up the for loops with indices starting at 1...

Also, thank you for the quick turnaround!

-- A.

Comment 8 Steve Dickson 2022-08-01 18:14:56 UTC

This patch is also needed:

commit fa153d634228216fc162e5d6583a7035af2c40ba (HEAD -> master, tag: libtirpc-1-3-3-rc5)
Author: Attila Kovacs <attila.kovacs.edu>
Date:   Mon Aug 1 11:28:43 2022 -0400

    SUNRPC: MT-safe overhaul of address cache management in rpcb_clnt.c

https://gitlab.com/steved92/nfs-utils/-/merge_requests/new?merge_request%5B
(In reply to Steve Dickson from comment #1)
> commit 667ce638454d0995170dd8e6e0668ada733d72e7
> Author: Attila Kovacs <attila.kovacs.edu>
> Date:   Thu Jul 28 09:14:24 2022 -0400
> 
>     SUNRPC: mutexed access blacklist_read state variable.
> 
> commit 3f2a5459fb00c2f529d68a4a0fd7f367a77fa65a
> Author: Attila Kovacs <attila.kovacs.edu>
> Date:   Tue Jul 26 15:24:01 2022 -0400
> 
>     thread safe clnt destruction.
> 
> commit 7a6651a31038cb19807524d0422e09271c5ffec9
> Author: Attila Kovacs <attila.kovacs.edu>
> Date:   Tue Jul 26 15:20:05 2022 -0400
> 
>     clnt_dg_freeres() uncleared set active state may deadlock.
> 
> 
> Author: Attila Kovacs <attila.kovacs.edu>
> Date:   Wed Jul 20 17:03:28 2022 -0400
> 
>     Eliminate deadlocks in connects with an MT environment

Comment 9 Attila Kovacs 2022-08-02 14:52:28 UTC

I just wanted to confirm that with the 5 patches above, the deadlocks have stopped occurring on our systems. From my point of view, the patches constitute a 'fix' that is good to go... 

If I find the time, I may poke around looking for further MT flaws lurking in libtirpc. (Given that I found a few in the source code used by clnt_create(), there is probably room for further improvements, but that's beyond the scope of this particular bug..)

Thanks to all for being very responsive. It's been a pleasure to work with you, and I learnt a lot in the process too. To be continued...

-- A.

Comment 10 Steve Dickson 2022-08-03 21:27:37 UTC

Hello

(In reply to Attila Kovacs from comment #9)
> I just wanted to confirm that with the 5 patches above, the deadlocks have
> stopped occurring on our systems. From my point of view, the patches
> constitute a 'fix' that is good to go... 
> 
> If I find the time, I may poke around looking for further MT flaws lurking
> in libtirpc. (Given that I found a few in the source code used by
> clnt_create(), there is probably room for further improvements, but that's
> beyond the scope of this particular bug..)
> 
> Thanks to all for being very responsive. It's been a pleasure to work with
> you, and I learnt a lot in the process too. To be continued...

I'm wondering if you might want take an additional step.... 

This RHEL8 rpm contains those five patches along with a few
patches make things apply cleanly.

https://people.redhat.com/steved/.bz2112116/libtirpc-1.1.4-8.el8.x86_64.rpm

If you would not mind throwing this rpm into your environment to
insure we have plugged all the wholes... 

Please Note: this is a per-released rpm so it is not supported... yet!
Just looking to get some additional testing we are not able to do.

tia!

Comment 11 Attila Kovacs 2022-08-04 07:43:26 UTC

Hi Steve!

Your RHEL8 test RPM is looking great from my perspective.

I ran my test a few thousand times on our 'over-sensitive' system. It seems to be working very reliably.

(And, just to be sure, I have also run the current release 1.1.4-6.el8 back-to-back -- and that one was still hanging reliably after a few dozen tries every time, just as expected).

So, it all checks out. 

Thanks again,

-- A.

Comment 16 errata-xmlrpc 2022-11-08 10:51:52 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (libtirpc bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:7740