An issue was found using uncached IO on RHEL 2.1AS. When activating the uncached IO feature using the 'noac' mount option and not specifiy the rsize/wsize, you will get the following xdr ( Read or Write ) messages under certain loads: May 25 14:29:32 LinuxHost kernel: nfs: can't encode arguments: 22 May 25 14:29:32 LinuxHost kernel: NFS: Bad number of iov's in xdr_writeargs If you specify a rsize/wsize, the xdr message will not appear in the kernel log. National Australia Bank discovered these messages during an Oracle RMAN backup to a Network Appliance filer. Here's the errors that RMAN reported in Oracle alert log: RMAN-00571: ========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ============== RMAN-00571: ========================================================== RMAN-03002: failure of configure command at 05/20/2004 12:49:34 RMAN-03014: implicit resync of recovery catalog failed RMAN-03009: failure of full resync command on default channel at 05/20/2004 12:49:34 ORA-01587: error during controlfile backup file copy ORA-27091: skgfqio: unable to queue I/O ORA-27072: skgfdisp: I/O error Linux Error: 22: Invalid argument Additional information: 3 In the kernel log on the Linux server: May 25 14:29:32 LinuxHost kernel: nfs: can't encode arguments: 22 May 25 14:29:32 LinuxHost kernel: NFS: Bad number of iov's in xdr_writeargs May 25 14:29:32 LinuxHost kernel: nfs: can't encode arguments: 22 May 25 14:29:32 LinuxHost kernel: NFS: Bad number of iov's in xdr_readargs How reproducible: Steps to Reproduce: 1. Mount the remote filesystem using noac 2. Don't specify the value for rsize/wsize 3. Produce an intense IO pattern against the mount point Additional info: Patch for NFS direct.c and inode.c --------------------------- diff -X /home/cel/src/linux/dont-diff -Naurp old/fs/nfs/direct.c new/fs/nfs/direct.c --- old/fs/nfs/direct.c 2004-05-25 21:55:46.000000000 -0400 +++ new/fs/nfs/direct.c 2004-05-27 13:49:57.000000000 -0400 @@ -44,6 +44,10 @@ #define NFS_DIRECT_DEBUG 0 +/* We can't exceed (MAX_IOVEC - 2) pages per RPC, so cap the number + * of pages at 4 for now */ +#define NFS_MAX_DIRECT_IO_SIZE (PAGE_SIZE * 4U) + static inline int nfs_direct_read_rpc(struct file *file, struct nfs_readargs *arg) { @@ -104,7 +108,7 @@ nfs_direct_write_rpc(struct file *file, int i; printk(KERN_ERR "%s: count=%d, offset=%Lu\n", __FUNCTION__, arg->count, arg->offset); - for (i = 0; i > arg->nriov; i++) + for (i = 0; i < arg->nriov; i++) printk(KERN_ERR "%s: arg->iov[%d]: base=%p, len=%u\n", __FUNCTION__, i, arg->iov[i].iov_base, arg->iov[i].iov_len); @@ -221,6 +225,8 @@ nfs_kiobuf_read(struct file *file, struc request = count; if (count > rsize) request = rsize; + if (count > NFS_MAX_DIRECT_IO_SIZE) + request = NFS_MAX_DIRECT_IO_SIZE; args.count = request; args.offset = offset; args.nriov = 0; @@ -317,6 +323,8 @@ retry: request = count; if (count > wsize) request = wsize; + if (count > NFS_MAX_DIRECT_IO_SIZE) + request = NFS_MAX_DIRECT_IO_SIZE; args.count = request; args.offset = offset; args.nriov = 0; diff -X /home/cel/src/linux/dont-diff -Naurp old/fs/nfs/inode.c new/fs/nfs/inode.c --- old/fs/nfs/inode.c 2004-05-25 21:56:20.000000000 -0400 +++ new/fs/nfs/inode.c 2004-05-27 13:47:12.000000000 -0400 @@ -309,14 +309,6 @@ nfs_read_super(struct super_block *sb, v printk(KERN_NOTICE "NFS: uncached I/O enabled for noac mount\n"); server->flags &= ~NFS_MOUNT_NOAC; server->flags |= NFS_MOUNT_FORCE_DIRECT; - /* - * Do to ABI issues: non-buffered I/O r/wsizes can - * only support 16kb buffers - */ - if (server->rsize > 16384) - server->rsize = 16384; - if (server->wsize > 16384) - server->wsize = 16384; } else sb->s_flags |= MS_SYNCHRONOUS; } ---------------------------
Please attach this patch, as copying and pasting patches can cause problems.
Created attachment 100802 [details] Uncached IO Logic patch Here's the Uncached IO Logic patch for RHEL 2.1AS
this patch was accepted into rhel2.1 U5, e.46: http://people.redhat.com/~jbaron/.private/u5/2.4.9-e.46/
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2004-437.html