Bug 1299003 - Unhandled EINTR during connection establishment leads to EACCES failure
Unhandled EINTR during connection establishment leads to EACCES failure
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: nfs-utils (Show other bugs)
7.2
Unspecified Unspecified
medium Severity unspecified
: rc
: ---
Assigned To: Steve Dickson
Yongcheng Yang
:
Depends On:
Blocks: 1313485 1295577
  Show dependency treegraph
 
Reported: 2016-01-15 11:23 EST by Olga Kornieskaia
Modified: 2016-11-04 01:02 EDT (History)
7 users (show)

See Also:
Fixed In Version: nfs-utils-1.3.0-0.24.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-11-04 01:02:58 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
testing scripts (19.50 KB, application/x-tar)
2016-01-15 11:23 EST, Olga Kornieskaia
no flags Details

  None (edit)
Description Olga Kornieskaia 2016-01-15 11:23:55 EST
Created attachment 1115189 [details]
testing scripts

Description of problem:
When gssd tries to establish an rpc client with an NFS server, it can receives EINTR during connection establishment which the current code does not handle. It results in the nfs operation failing with "permission denied".

Version-Release number of selected component (if applicable):
Problem exists in upstream nfs-utils. Asking to be fixed in RHEL7.x

How reproducible:
Create 9mounts on the client with each security flavor for each nfs versions:nfs3krb5, nfs3krb5i, nfs3krb5p, same for nfs4 and nfs4.1. On each mount as user create a directory to write a file, then unmount and then repeat from mount in a loop. 

Attached is a tar of my script(s) for reproducing the problem (I'm not a script person so it's just functional)

I run “allnfskrb2.sh” which starts on the background each of the 9 other files. To kill it I did “killall *hammer*”. then to unmount the 9mounts I run nfsumount.sh.

Scripts depend that on the fileserver there are several directories created (i used to start hammer there and didn’t change it after i change to only create a single file):
/t/hammer/nfs3krb5
… /nfs3krb5i
… /nfs3krb5p
… /nfs4krb5
… /nfs4krb5i
…. /nfs4krb5p
…. /nfs41krb5
… /nfs41krb5i
… /nfs41krb5p


Steps to Reproduce:
1.
2.
3.

Actual results:
Operation fails with "permission denied".

Expected results:
No failure should happen.

Additional info:
This problem exists in the upstream nfs-utils. Thus I submitted an upstream nfs-utils patch for review.

From d346c9516bedfda11bb47ab541e33a3d10f339bf Mon Sep 17 00:00:00 2001
From: Olga Kornievskaia <kolga@netapp.com>
Date: Thu, 14 Jan 2016 09:55:38 -0500
Subject: [PATCH 1/1] [nfs-utils] handle EINTR during connection establishment
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

both connect() and select() can receive EINTR signals that we need to
recover from.

In Unix Network Programming, volume 1, section 5.9, W. Richard Stevens
states:

What we are doing […] is restarting the interrupted system call ourself.
This is fine for accept, along with the functions such as read, write,
select and open. But there is one function that we cannot restart ourself:
connect. If this function returns EINTR, we cannot call it again, as doing
so will return an immediate error. When connect is interrupted by a caught
signal and is not automatically restarted, we must call select to wait for
the connection to complete,

Thus for connect() treat both EINPROGRESS and EINTR the same -- call
select().

For select(), it should be re-tried again upon receiving EINTR.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 support/nfs/rpc_socket.c |   16 +++++++++++-----
 1 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/support/nfs/rpc_socket.c b/support/nfs/rpc_socket.c
index c14efe8..edd43cc 100644
--- a/support/nfs/rpc_socket.c
+++ b/support/nfs/rpc_socket.c
@@ -215,7 +215,7 @@ static int nfs_connect_nb(const int fd, const struct sockaddr *sap,
 	 * use it later.
 	 */
 	ret = connect(fd, sap, salen);
-	if (ret < 0 && errno != EINPROGRESS) {
+	if (ret < 0 && errno != EINPROGRESS && errno != EINTR) {
 		ret = -1;
 		goto done;
 	}
@@ -227,10 +227,16 @@ static int nfs_connect_nb(const int fd, const struct sockaddr *sap,
 	FD_ZERO(&rset);
 	FD_SET(fd, &rset);
 
-	ret = select(fd + 1, NULL, &rset, NULL, timeout);
-	if (ret <= 0) {
-		if (ret == 0)
-			errno = ETIMEDOUT;
+	while ((ret = select(fd + 1, NULL, &rset, NULL, timeout)) < 0) {
+		if (errno != EINTR) {
+			ret = -1;
+			goto done;
+		} else {
+			continue;
+		}
+	}
+	if (ret == 0) {
+		errno = ETIMEDOUT;
 		ret = -1;
 		goto done;
 	}
-- 
1.7.1
Comment 3 Steve Dickson 2016-01-16 17:09:57 EST
The needed upstream patch:

Author: Olga Kornievskaia <kolga@netapp.com>
Date:   Sat Jan 16 12:25:46 2016 -0500

    nfs_connect_nb: handle EINTR during connection establishment
Comment 4 Yongcheng Yang 2016-01-22 02:10:28 EST
(In reply to Olga Kornieskaia from comment #0)

> How reproducible:
> Create 9mounts on the client with each security flavor for each nfs
> versions:nfs3krb5, nfs3krb5i, nfs3krb5p, same for nfs4 and nfs4.1. On each
> mount as user create a directory to write a file, then unmount and then
> repeat from mount in a loop. 
> 

Hi Olga,
Is there any specific configuration in server or client to reproduce this issue?
I have tested several times as your above statement (also use the nfsscripts in attachment) and the "permission denied" never occur.
Comment 5 Olga Kornieskaia 2016-01-22 08:05:47 EST
Hi, 

This problem is seen on RHEL6.6 against the NetApp server, however it can potentially occur on an upstream nfs-utils when the conditions are such that the OS sends the select() an EINTR. How to trigger that I don't really know.
Comment 8 Yongcheng Yang 2016-06-14 04:58:35 EDT
Have checked the patch has already been merged into nfs-utils-1.3.0-0.24.el7

Move to VERIFIED and set SanityOnly according to Comment 6
Comment 11 Olga Kornieskaia 2016-07-29 17:27:56 EDT
Will this fix be available in RHEL7.2? I'm not sure what nfs-utils-1.3.0-0.24.el7 corresponds to. NetApp QA tells me the latest on RHEL7.2 they have is nfs-utils-1.3.0-0.21.el7_2.1.x86_64.

Thank you.
Comment 12 Yongcheng Yang 2016-08-01 00:17:37 EDT
(In reply to Olga Kornieskaia from comment #11)
> Will this fix be available in RHEL7.2? I'm not sure what
> nfs-utils-1.3.0-0.24.el7 corresponds to. NetApp QA tells me the latest on
> RHEL7.2 they have is nfs-utils-1.3.0-0.21.el7_2.1.x86_64.
> 
> Thank you.

Hi Olga, the nfs-utils-1.3.0-0.24.el7 is the ongoing (maybe internal) version for RHEL7.3.

To backport this issue into RHEL7.2, we need to set the "7.2.z?" flag into this bug. Sorry for I don't have the privilege doing that.
Comment 13 Dave Wysochanski 2016-08-02 10:42:09 EDT
(In reply to Olga Kornieskaia from comment #0)
> Created attachment 1115189 [details]
> testing scripts
> 
> Description of problem:
> When gssd tries to establish an rpc client with an NFS server, it can
> receives EINTR during connection establishment which the current code does
> not handle. It results in the nfs operation failing with "permission denied".
> 
> Version-Release number of selected component (if applicable):
> Problem exists in upstream nfs-utils. Asking to be fixed in RHEL7.x
> 
> How reproducible:
> Create 9mounts on the client with each security flavor for each nfs
> versions:nfs3krb5, nfs3krb5i, nfs3krb5p, same for nfs4 and nfs4.1. On each
> mount as user create a directory to write a file, then unmount and then
> repeat from mount in a loop. 
> 
> Attached is a tar of my script(s) for reproducing the problem (I'm not a
> script person so it's just functional)
> 
> I run “allnfskrb2.sh” which starts on the background each of the 9 other
> files. To kill it I did “killall *hammer*”. then to unmount the 9mounts I
> run nfsumount.sh.
> 
> Scripts depend that on the fileserver there are several directories created
> (i used to start hammer there and didn’t change it after i change to only
> create a single file):
> /t/hammer/nfs3krb5
> … /nfs3krb5i
> … /nfs3krb5p
> … /nfs4krb5
> … /nfs4krb5i
> …. /nfs4krb5p
> …. /nfs41krb5
> … /nfs41krb5i
> … /nfs41krb5p
> 
> 
> Steps to Reproduce:
> 1.
> 2.
> 3.
> 
> Actual results:
> Operation fails with "permission denied".
> 
> Expected results:
> No failure should happen.
> 
> Additional info:
> This problem exists in the upstream nfs-utils. Thus I submitted an upstream
> nfs-utils patch for review.
> 
> From d346c9516bedfda11bb47ab541e33a3d10f339bf Mon Sep 17 00:00:00 2001
> From: Olga Kornievskaia <kolga@netapp.com>
> Date: Thu, 14 Jan 2016 09:55:38 -0500
> Subject: [PATCH 1/1] [nfs-utils] handle EINTR during connection establishment
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> both connect() and select() can receive EINTR signals that we need to
> recover from.
> 
> In Unix Network Programming, volume 1, section 5.9, W. Richard Stevens
> states:
> 
> What we are doing […] is restarting the interrupted system call ourself.
> This is fine for accept, along with the functions such as read, write,
> select and open. But there is one function that we cannot restart ourself:
> connect. If this function returns EINTR, we cannot call it again, as doing
> so will return an immediate error. When connect is interrupted by a caught
> signal and is not automatically restarted, we must call select to wait for
> the connection to complete,
> 
> Thus for connect() treat both EINPROGRESS and EINTR the same -- call
> select().
> 
> For select(), it should be re-tried again upon receiving EINTR.
> 
> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> ---
>  support/nfs/rpc_socket.c |   16 +++++++++++-----
>  1 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/support/nfs/rpc_socket.c b/support/nfs/rpc_socket.c
> index c14efe8..edd43cc 100644
> --- a/support/nfs/rpc_socket.c
> +++ b/support/nfs/rpc_socket.c
> @@ -215,7 +215,7 @@ static int nfs_connect_nb(const int fd, const struct
> sockaddr *sap,
>  	 * use it later.
>  	 */
>  	ret = connect(fd, sap, salen);
> -	if (ret < 0 && errno != EINPROGRESS) {
> +	if (ret < 0 && errno != EINPROGRESS && errno != EINTR) {
>  		ret = -1;
>  		goto done;
>  	}
> @@ -227,10 +227,16 @@ static int nfs_connect_nb(const int fd, const struct
> sockaddr *sap,
>  	FD_ZERO(&rset);
>  	FD_SET(fd, &rset);
>  
> -	ret = select(fd + 1, NULL, &rset, NULL, timeout);
> -	if (ret <= 0) {
> -		if (ret == 0)
> -			errno = ETIMEDOUT;
> +	while ((ret = select(fd + 1, NULL, &rset, NULL, timeout)) < 0) {
> +		if (errno != EINTR) {
> +			ret = -1;
> +			goto done;
> +		} else {
> +			continue;
> +		}
> +	}
> +	if (ret == 0) {
> +		errno = ETIMEDOUT;
>  		ret = -1;
>  		goto done;
>  	}
> -- 
> 1.7.1


What guarantees that this new 'select()' loop will terminate?  Did you make sure you're not introducing an unwanted infinite loop here under certain circumstances?  For example, if a mount hangs can it still be killed?
Comment 14 Olga Kornieskaia 2016-08-02 11:17:51 EDT
The patch is as per suggestion of not necessarily the spec but rather guidance to how to handle receiving EINTR for the select(). Kernel should allow for select() to be retried again if it received an interrupt system call. If the system is continuously receiving EINTR then there is a problem elsewhere in the kernel.
Comment 17 errata-xmlrpc 2016-11-04 01:02:58 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2383.html

Note You need to log in before you can comment on or make changes to this bug.