Bug 186604

Summary: NFS_ROOT reuses RPC XIDs
Product: Red Hat Enterprise Linux 4 Reporter: Chuck Lever <cel>
Component: kernelAssignee: Peter Staubach <staubach>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: low Docs Contact:
Priority: medium    
Version: 4.0CC: jbaron, steved, xdl-redhat-bugzilla
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2007-0304 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-05-08 01:01:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 176344    
Attachments:
Description Flags
patch to use current time as the initial XID
none
Proposed patch none

Description Chuck Lever 2006-03-24 17:13:11 UTC
Description of problem:
The 2.6 version of the RPC client's XID generator depends on having the random
driver initialized before get_random_bytes() is invoked to generate the initial
XID for an NFS mount point.  If the random driver is not initialized, no error
is returned, and the initial XID is left with its previous value, which is zero.
  This results in the client sending NFS_ROOT requests with the same XID across
client reboots.  If the server's DRC is large or persistent, it will return a
cached response to these reused XIDs that has nothing to do with the current
request.  Symptoms include short write replies, created files that don't exist,
and removed files that haven't been removed.

Although Red Hat does not officially support NFS_ROOT, some customers do use
this feature with RHEL 4, and should be aware of this problem.

Version-Release number of selected component (if applicable):
Will affect all 2.6-based versions of RHEL.

How reproducible:
Every boot that uses NFS_ROOT will reuse the same XIDs.  The probability that
the server will reply with a cached request is variable, meaning that
reproducing this problem is sometimes difficult.

Steps to Reproduce:
1.  Set up a client system with NFS_ROOT.
2.  Set up a server to export the share that is the client's root fs.
3.  Reboot the client every few minutes.
  
Actual results:
The client boot will fail intermittently.

Expected results:
The client should always boot correctly.

Additional info:
This has been reported with a NetApp filer and SuSE SLES 9.  The bug comes from
the mainline kernel, so RHEL 4 is also susceptible.

Comment 1 Chuck Lever 2006-03-26 04:06:05 UTC
Created attachment 126761 [details]
patch to use current time as the initial XID

compile tested only -- attached patch gives a general idea about how this might
be fixed.

Comment 2 Chuck Lever 2006-03-29 23:33:15 UTC
here's a patch against 2.6.16 that i worked up with trond's help.  our customer has tried this in the SLES 
9 kernel, and found it to eliminate their issue.

the customer also remarked that net_random_init(), the function used to seed the net_random() random 
number generator, does not exist in 2.6.5.  i haven't looked at 2.6.9, but if net_random_init() does not 
exist there, imo it should be backported.

diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 4dd5b3c..02060d0 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -41,7 +41,7 @@
 #include <linux/types.h>
 #include <linux/interrupt.h>
 #include <linux/workqueue.h>
-#include <linux/random.h>
+#include <linux/net.h>
 
 #include <linux/sunrpc/clnt.h>
 #include <linux/sunrpc/metrics.h>
@@ -830,7 +830,7 @@ static inline u32 xprt_alloc_xid(struct 
 
 static inline void xprt_init_xid(struct rpc_xprt *xprt)
 {
-	get_random_bytes(&xprt->xid, sizeof(xprt->xid));
+	xprt->xid = net_random();
 }
 
 static void xprt_request_init(struct rpc_task *task, struct rpc_xprt *xprt)
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 4b4e7df..21006b1 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -930,6 +930,13 @@ static void xs_udp_timer(struct rpc_task
 	xprt_adjust_cwnd(task, -ETIMEDOUT);
 }
 
+static unsigned short xs_get_random_port(void)
+{
+	unsigned short range = xprt_max_resvport - xprt_min_resvport;
+	unsigned short rand = (unsigned short) net_random() % range;
+	return rand + xprt_min_resvport;
+}
+
 /**
  * xs_set_port - reset the port number in the remote endpoint address
  * @xprt: generic transport
@@ -1275,7 +1282,7 @@ int xs_setup_udp(struct rpc_xprt *xprt, 
 	memset(xprt->slot, 0, slot_table_size);
 
 	xprt->prot = IPPROTO_UDP;
-	xprt->port = xprt_max_resvport;
+	xprt->port = xs_get_random_port();
 	xprt->tsh_size = 0;
 	xprt->resvport = capable(CAP_NET_BIND_SERVICE) ? 1 : 0;
 	/* XXX: header size can vary due to auth type, IPv6, etc. */
@@ -1317,7 +1324,7 @@ int xs_setup_tcp(struct rpc_xprt *xprt, 
 	memset(xprt->slot, 0, slot_table_size);
 
 	xprt->prot = IPPROTO_TCP;
-	xprt->port = xprt_max_resvport;
+	xprt->port = xs_get_random_port();
 	xprt->tsh_size = sizeof(rpc_fraghdr) / sizeof(u32);
 	xprt->resvport = capable(CAP_NET_BIND_SERVICE) ? 1 : 0;
 	xprt->max_payload = RPC_MAX_FRAGMENT_SIZE;


Comment 3 RHEL Program Management 2006-09-07 19:24:55 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 4 RHEL Program Management 2006-09-07 19:24:56 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 5 RHEL Program Management 2006-09-07 19:24:59 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 7 Peter Staubach 2006-10-18 17:50:13 UTC
Created attachment 138810 [details]
Proposed patch

The patch described in Comment #2 seems to do the trick.

Comment 8 Jason Baron 2006-11-14 18:19:55 UTC
committed in stream U5 build 42.25. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 11 Red Hat Bugzilla 2007-05-08 01:01:34 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0304.html