Bug 557423 - nfs: sys_read sometimes returns -EIO
Summary: nfs: sys_read sometimes returns -EIO
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.4
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Jeff Layton
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
: 664814 (view as bug list)
Depends On:
Blocks: 594059 594060 594061 605694
TreeView+ depends on / blocked
 
Reported: 2010-01-21 12:37 UTC by Vitaliy Gusev
Modified: 2018-11-26 18:15 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
On NFS, the read(2) system call could have returned an unexpected EIO (input/ouput error) value.
Clone Of:
Environment:
Last Closed: 2011-01-13 21:01:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
patch to revert retcode check in nfs_revalidate_mapping() (1.23 KB, patch)
2010-03-23 22:18 UTC, Guru Anbalagane
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0017 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.6 kernel security and bug fix update 2011-01-13 10:37:42 UTC

Description Vitaliy Gusev 2010-01-21 12:37:26 UTC
Description of problem:

On NFS, read (2) can return unexpected -EIO

Version-Release number of selected component (if applicable):
kernel: 2.6.18-164.6.1.el5 x86_64
ltp:  20081130

How reproducible:

run "rwtest" program from LTP suite

     ./rwtest -N iogen01 -i 120s -s read,write -Da -Dv -n 2 500b:doio.f1.$$ 1000b:doio.f2.$$


Note, that for reproducing you perhaps need to run serveral instances of the "rwtest" (in background)


Actual results:


Expected results:


Additional info:
This issue introduced by patch:

   linux-2.6-nfs-fix-cache-invalidation-problems-in-nfs_readdir.patch:

 >         -		nfs_inc_stats(inode, NFSIOS_DATAINVALIDATE);
 >         -		if (S_ISREG(inode->i_mode))
 >         -			nfs_sync_mapping(mapping);
 >         -		invalidate_inode_pages3(mapping);
 >         -
 >         +		if (mapping->nrpages != 0) {
 >         +			if (S_ISREG(inode->i_mode)) {
 >         +				ret = nfs_sync_mapping(mapping);
 >         +				if (ret < 0)
 >         +					goto out;
 >         +			}
 >         +			ret = invalidate_inode_pages3(mapping);
 >         +			if (ret < 0)
 >         +				goto out;
 >         +		}

Old core ignored return value of invalidate_inode_pages3(), but new code
check return value.

Dump stack that shows why EIO occurs:

   try_to_release_page
   invalidate_complete_page2
   invalidate_inode_pages3_range
   invalidate_inode_pages3
   :nfs:nfs_revalidate_mapping
   :nfs:nfs_file_read
   do_sync_read
   vfs_read
   sys_read

---
See to try_to_release_page:

  int try_to_release_page(struct page *page, gfp_t gfp_mask)
   {
	struct address_space * const mapping = page->mapping;

	BUG_ON(!PageLocked(page));
	if (PageWriteback(page))
      ^^^^^^^^^^^^^^^^^^^^^^^^
      Here page can suddenly become Writeback. Note page is locked here.
		return 0;
	
	if (mapping && mapping->a_ops->releasepage)
		return mapping->a_ops->releasepage(page, gfp_mask);
	return try_to_free_buffers(page);
  }
  EXPORT_SYMBOL(try_to_release_page);

Comment 1 Vitaliy Gusev 2010-01-21 12:42:55 UTC
In the mainstream kernel this issue was fixed in commmit:

  commit 61822ab5e3ed09fcfc49e37227b655202adf6130
  Author: Trond Myklebust <Trond.Myklebust>
  Date:   Tue Dec 5 00:35:42 2006 -0500

    NFS: Ensure we only call set_page_writeback() under the page lock
    
    Signed-off-by: Trond Myklebust <Trond.Myklebust>

Comment 4 Guru Anbalagane 2010-03-17 20:23:28 UTC
We also hit this bugz and will test it and post results.

Comment 5 Guru Anbalagane 2010-03-23 22:18:22 UTC
Created attachment 402159 [details]
patch to revert retcode check in nfs_revalidate_mapping()

Comment 6 Guru Anbalagane 2010-03-23 22:19:47 UTC
The above patch is needed to overcome EIO error as the  backport of commit 61822ab5e3ed09fcfc49e37227b655202adf6130 to el5 does not look simple.

Comment 7 Jeff Layton 2010-04-12 12:24:49 UTC
Agreed. Backporting that commit looks decidedly non-trivial. Thanks for the patch.

Comment 8 Jeff Layton 2010-04-13 14:48:48 UTC
Was able to get this to reproduce with the following command:

env LTPROOT=/usr/lib/ltp PATH=$PATH:. ./rwtest -N iogen01 -i 600s -s read,write -Da -Dv -n 10 500b:/mnt/dantu/rwtest/doio.f1.$$ 1000b:/mnt/dantu/rwtest/doio.f2.$$

...now to make sure the patch fixes it.

Comment 9 Jeff Layton 2010-04-14 12:39:09 UTC
Looks like it does. Interestingly the test doesn't like running under screen for some reason. That's unrelated to this bug however.

Comment 19 Jarod Wilson 2010-05-25 21:11:33 UTC
in kernel-2.6.18-200.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 22 Douglas Silas 2010-06-28 20:24:50 UTC
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.

New Contents:
On NFS, the read(2) system call could have returned an unexpected EIO (input/ouput error) value.

Comment 24 yanfu,wang 2010-10-27 06:51:37 UTC
can reproduced on kernel 2.6.18-194.el5:
# uname -a
Linux hp-dl120g6-01.rhts.eng.bos.redhat.com 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

[root@hp-dl120g6-01 ltp-full-20100831]# env LTPROOT=/opt/20100831/ltp-full-20100831 PATH=$PATH:. ./testcases/kernel/fs/doio/rwtest -N iogen01 -i 600s -s read,write -Da -Dv -n 10 500b:/mnt/dantu/rwtest/doio.f1.$$ 1000b:/mnt/dantu/rwtest/doio.f2.$$
/opt/20100831/ltp-full-20100831/testcases/bin/iogen -N iogen01 -i 600s -s read,write 500b:/mnt/dantu/rwtest/doio.f1.20329 1000b:/mnt/dantu/rwtest/doio.f2.20329 | /opt/20100831/ltp-full-20100831/testcases/bin/doio -N iogen01 -a -v -n 10 -k

iogen(iogen01) starting up with the following:

Out-pipe:              stdout
Iterations:            600 seconds
Seed:                  24396
Offset-Mode:           sequential
Overlap Flag:          off
Mintrans:              1           (1 blocks)
Maxtrans:              131072      (256 blocks)
O_RAW/O_SSD Multiple:  (Determined by device)
Syscalls:              read write 
Aio completion types:  none 
Flags:                 buffered sync 

Test Files:  

Path                                          Length    iou   raw iou file
                                              (bytes) (bytes) (bytes) type
-----------------------------------------------------------------------------
/mnt/dantu/rwtest/doio.f1.20329                256000       1     512 regular
/mnt/dantu/rwtest/doio.f2.20329                512000       1     512 regular

doio(iogen01) (24401) 05:36:38
---------------------
fcntl(3, 7, 0775450520) failed for file /mnt/dantu/rwtest/doio.f1.20329, lock type 1, offset 234813, length 2038:  No locks available (37), open flags: 0100001

doio(iogen01) (24397) 05:36:38
---------------------
fcntl(4, 7, 0775450520) failed for file /mnt/dantu/rwtest/doio.f1.20329, lock type 1, offset 255677, length 110:  No locks available (37), open flags: 0110001

doio(iogen01) (24402) 05:36:38
---------------------
fcntl(3, 7, 0775450520) failed for file /mnt/dantu/rwtest/doio.f1.20329, lock type 1, offset 254169, length 1508:  No locks available (37), open flags: 0110001

doio(iogen01) (24403) 05:36:38
---------------------
fcntl(3, 7, 0775450520) failed for file /mnt/dantu/rwtest/doio.f1.20329, lock type 1, offset 236851, length 17318:  No locks available (37), open flags: 0110001

doio(iogen01) (24398) 05:36:38
---------------------
fcntl(3, 7, 0775450520) failed for file /mnt/dantu/rwtest/doio.f2.20329, lock type 1, offset 0, length 125104:  No locks available (37), open flags: 0100001

doio(iogen01) (24395) 05:36:38
---------------------
(parent) pid 24401 exited because of an internal error

doio(iogen01) (24395) 05:36:38
---------------------
(parent) pid 24397 exited because of an internal error

doio(iogen01) (24399) 05:36:38
---------------------
fcntl(5, 7, 0775450520) failed for file /mnt/dantu/rwtest/doio.f1.20329, lock type 1, offset 255831, length 24:  No locks available (37), open flags: 0100001

doio(iogen01) (24404) 05:36:38
---------------------
fcntl(5, 7, 0775450520) failed for file /mnt/dantu/rwtest/doio.f1.20329, lock type 1, offset 255979, length 19:  No locks available (37), open flags: 0100001

doio(iogen01) (24406) 05:36:38
---------------------
fcntl(4, 7, 0775450520) failed for file /mnt/dantu/rwtest/doio.f2.20329, lock type 1, offset 415812, length 74404:  No locks available (37), open flags: 0110001

doio(iogen01) (24395) 05:36:38
---------------------
(parent) pid 24398 exited because of an internal error

doio(iogen01) (24395) 05:36:38
---------------------
(parent) pid 24399 exited because of an internal error

doio(iogen01) (24395) 05:36:38
---------------------
(parent) pid 24402 exited because of an internal error
...


verified on RHEL5.6-Server-20101014.0 on i386 and x86_64:
[root@dell-per610-02 ltp-full-20100831]# env LTPROOT=/root/ltp-full-20100831 PATH=$PATH:. ./testcases/kernel/fs/doio/rwtest -N iogen01 -i 600s -s read,write -Da -Dv -n 10 500b:/nfs/dantu/rwtest/doio.f1.$$ 1000b:/nfs/dantu/rwtest/doio.f2.$$
/root/ltp-full-20100831/testcases/bin/iogen -N iogen01 -i 600s -s read,write 500b:/nfs/dantu/rwtest/doio.f1.4227 1000b:/nfs/dantu/rwtest/doio.f2.4227 | /root/ltp-full-20100831/testcases/bin/doio -N iogen01 -a -v -n 10 -k

iogen(iogen01) starting up with the following:

Out-pipe:              stdout
Iterations:            600 seconds
Seed:                  2987
Offset-Mode:           sequential
Overlap Flag:          off
Mintrans:              1           (1 blocks)
Maxtrans:              131072      (256 blocks)
O_RAW/O_SSD Multiple:  (Determined by device)
Syscalls:              read write 
Aio completion types:  none 
Flags:                 buffered sync 

Test Files:  

Path                                          Length    iou   raw iou file
                                              (bytes) (bytes) (bytes) type
-----------------------------------------------------------------------------
/nfs/dantu/rwtest/doio.f1.4227                 256000       1     512 regular
/nfs/dantu/rwtest/doio.f2.4227                 512000       1     512 regular

Test passed

Comment 26 Jeff Layton 2011-01-13 16:35:48 UTC
*** Bug 664814 has been marked as a duplicate of this bug. ***

Comment 27 errata-xmlrpc 2011-01-13 21:01:47 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html


Note You need to log in before you can comment on or make changes to this bug.