Bug 124716

Summary: [PATCH] NFS Uncached IO logic error when mouting with noac and not specifying the rsize/wsize
Product: Red Hat Enterprise Linux 2.1 Reporter: Jody Haynes <jhaynes>
Component: nfs-utilsAssignee: Steve Dickson <steved>
Status: CLOSED ERRATA QA Contact: Ben Levenson <benl>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.1   
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-08-18 14:26:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Uncached IO Logic patch none

Description Jody Haynes 2004-05-28 19:22:50 UTC
An issue was found using uncached IO on RHEL 2.1AS.

When activating the uncached IO feature using the 'noac' mount option 
and not specifiy the rsize/wsize, you will get the following xdr ( 
Read or Write ) messages under certain loads:

May 25 14:29:32 LinuxHost kernel: nfs: can't encode arguments: 22
May 25 14:29:32 LinuxHost kernel: NFS: Bad number of iov's in 
xdr_writeargs

If you specify a rsize/wsize, the xdr message will not appear in the 
kernel log.

National Australia Bank discovered these messages during an Oracle 
RMAN backup to a Network Appliance filer.

Here's the errors that RMAN reported in Oracle alert log:

RMAN-00571: ==========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ==============
RMAN-00571: ==========================================================
RMAN-03002: failure of configure command at 05/20/2004 12:49:34
RMAN-03014: implicit resync of recovery catalog failed
RMAN-03009: failure of full resync command on default channel at 
05/20/2004  12:49:34
ORA-01587: error during controlfile backup file copy
ORA-27091: skgfqio: unable to queue I/O
ORA-27072: skgfdisp: I/O error
Linux Error: 22: Invalid argument
Additional information: 3

In the kernel log on the Linux server:

May 25 14:29:32 LinuxHost kernel: nfs: can't encode arguments: 22
May 25 14:29:32 LinuxHost kernel: NFS: Bad number of iov's in 
xdr_writeargs
May 25 14:29:32 LinuxHost kernel: nfs: can't encode arguments: 22
May 25 14:29:32 LinuxHost kernel: NFS: Bad number of iov's in 
xdr_readargs

How reproducible:


Steps to Reproduce:
1.  Mount the remote filesystem using noac
2.  Don't specify the value for rsize/wsize
3.  Produce an intense IO pattern against the mount point
  
Additional info:

Patch for NFS direct.c and inode.c

---------------------------

diff -X /home/cel/src/linux/dont-diff -Naurp old/fs/nfs/direct.c 
new/fs/nfs/direct.c
--- old/fs/nfs/direct.c 2004-05-25 21:55:46.000000000 -0400
+++ new/fs/nfs/direct.c 2004-05-27 13:49:57.000000000 -0400
@@ -44,6 +44,10 @@
 
 #define NFS_DIRECT_DEBUG       0
 
+/* We can't exceed (MAX_IOVEC - 2) pages per RPC, so cap the number
+ * of pages at 4 for now */
+#define NFS_MAX_DIRECT_IO_SIZE (PAGE_SIZE * 4U)
+
 static inline int
 nfs_direct_read_rpc(struct file *file, struct nfs_readargs *arg)
 {
@@ -104,7 +108,7 @@ nfs_direct_write_rpc(struct file *file, 
        int i;
        printk(KERN_ERR "%s: count=%d, offset=%Lu\n", __FUNCTION__,
                        arg->count, arg->offset);
-       for (i = 0; i > arg->nriov; i++)
+       for (i = 0; i < arg->nriov; i++)
                printk(KERN_ERR "%s: arg->iov[%d]: base=%p, len=%u\n",
                                __FUNCTION__, i, arg->iov[i].iov_base,
                                        arg->iov[i].iov_len);
@@ -221,6 +225,8 @@ nfs_kiobuf_read(struct file *file, struc
                request = count;
                if (count > rsize)
                        request = rsize;
+               if (count > NFS_MAX_DIRECT_IO_SIZE)
+                       request = NFS_MAX_DIRECT_IO_SIZE;
                args.count = request;
                args.offset = offset;
                args.nriov = 0;
@@ -317,6 +323,8 @@ retry:
                request = count;
                if (count > wsize)
                        request = wsize;
+               if (count > NFS_MAX_DIRECT_IO_SIZE)
+                       request = NFS_MAX_DIRECT_IO_SIZE;
                args.count = request;
                args.offset = offset;
                args.nriov = 0;
diff -X /home/cel/src/linux/dont-diff -Naurp old/fs/nfs/inode.c 
new/fs/nfs/inode.c
--- old/fs/nfs/inode.c  2004-05-25 21:56:20.000000000 -0400
+++ new/fs/nfs/inode.c  2004-05-27 13:47:12.000000000 -0400
@@ -309,14 +309,6 @@ nfs_read_super(struct super_block *sb, v
                        printk(KERN_NOTICE "NFS: uncached I/O enabled 
for noac mount\n");
                        server->flags &= ~NFS_MOUNT_NOAC;
                        server->flags |= NFS_MOUNT_FORCE_DIRECT;
-                       /*
-                        * Do to ABI issues: non-buffered I/O 
r/wsizes can
-                        * only support 16kb buffers
-                        */
-                       if (server->rsize > 16384)
-                               server->rsize = 16384;
-                       if (server->wsize > 16384)
-                               server->wsize = 16384;
                } else
                        sb->s_flags |= MS_SYNCHRONOUS;
        }


---------------------------

Comment 1 Suzanne Hillman 2004-06-02 19:24:40 UTC
Please attach this patch, as copying and pasting patches can cause
problems.

Comment 2 Jody Haynes 2004-06-02 19:37:22 UTC
Created attachment 100802 [details]
Uncached IO Logic patch

Here's the Uncached IO Logic patch for RHEL 2.1AS

Comment 3 Jason Baron 2004-07-09 19:37:31 UTC
this patch was accepted into rhel2.1 U5, e.46:

http://people.redhat.com/~jbaron/.private/u5/2.4.9-e.46/

Comment 4 John Flanagan 2004-08-18 14:26:00 UTC
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2004-437.html