Description of problem: On NFS, read (2) can return unexpected -EIO Version-Release number of selected component (if applicable): kernel: 2.6.18-164.6.1.el5 x86_64 ltp: 20081130 How reproducible: run "rwtest" program from LTP suite ./rwtest -N iogen01 -i 120s -s read,write -Da -Dv -n 2 500b:doio.f1.$$ 1000b:doio.f2.$$ Note, that for reproducing you perhaps need to run serveral instances of the "rwtest" (in background) Actual results: Expected results: Additional info: This issue introduced by patch: linux-2.6-nfs-fix-cache-invalidation-problems-in-nfs_readdir.patch: > - nfs_inc_stats(inode, NFSIOS_DATAINVALIDATE); > - if (S_ISREG(inode->i_mode)) > - nfs_sync_mapping(mapping); > - invalidate_inode_pages3(mapping); > - > + if (mapping->nrpages != 0) { > + if (S_ISREG(inode->i_mode)) { > + ret = nfs_sync_mapping(mapping); > + if (ret < 0) > + goto out; > + } > + ret = invalidate_inode_pages3(mapping); > + if (ret < 0) > + goto out; > + } Old core ignored return value of invalidate_inode_pages3(), but new code check return value. Dump stack that shows why EIO occurs: try_to_release_page invalidate_complete_page2 invalidate_inode_pages3_range invalidate_inode_pages3 :nfs:nfs_revalidate_mapping :nfs:nfs_file_read do_sync_read vfs_read sys_read --- See to try_to_release_page: int try_to_release_page(struct page *page, gfp_t gfp_mask) { struct address_space * const mapping = page->mapping; BUG_ON(!PageLocked(page)); if (PageWriteback(page)) ^^^^^^^^^^^^^^^^^^^^^^^^ Here page can suddenly become Writeback. Note page is locked here. return 0; if (mapping && mapping->a_ops->releasepage) return mapping->a_ops->releasepage(page, gfp_mask); return try_to_free_buffers(page); } EXPORT_SYMBOL(try_to_release_page);
In the mainstream kernel this issue was fixed in commmit: commit 61822ab5e3ed09fcfc49e37227b655202adf6130 Author: Trond Myklebust <Trond.Myklebust> Date: Tue Dec 5 00:35:42 2006 -0500 NFS: Ensure we only call set_page_writeback() under the page lock Signed-off-by: Trond Myklebust <Trond.Myklebust>
We also hit this bugz and will test it and post results.
Created attachment 402159 [details] patch to revert retcode check in nfs_revalidate_mapping()
The above patch is needed to overcome EIO error as the backport of commit 61822ab5e3ed09fcfc49e37227b655202adf6130 to el5 does not look simple.
Agreed. Backporting that commit looks decidedly non-trivial. Thanks for the patch.
Was able to get this to reproduce with the following command: env LTPROOT=/usr/lib/ltp PATH=$PATH:. ./rwtest -N iogen01 -i 600s -s read,write -Da -Dv -n 10 500b:/mnt/dantu/rwtest/doio.f1.$$ 1000b:/mnt/dantu/rwtest/doio.f2.$$ ...now to make sure the patch fixes it.
Looks like it does. Interestingly the test doesn't like running under screen for some reason. That's unrelated to this bug however.
in kernel-2.6.18-200.el5 You can download this test kernel from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: On NFS, the read(2) system call could have returned an unexpected EIO (input/ouput error) value.
can reproduced on kernel 2.6.18-194.el5: # uname -a Linux hp-dl120g6-01.rhts.eng.bos.redhat.com 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux [root@hp-dl120g6-01 ltp-full-20100831]# env LTPROOT=/opt/20100831/ltp-full-20100831 PATH=$PATH:. ./testcases/kernel/fs/doio/rwtest -N iogen01 -i 600s -s read,write -Da -Dv -n 10 500b:/mnt/dantu/rwtest/doio.f1.$$ 1000b:/mnt/dantu/rwtest/doio.f2.$$ /opt/20100831/ltp-full-20100831/testcases/bin/iogen -N iogen01 -i 600s -s read,write 500b:/mnt/dantu/rwtest/doio.f1.20329 1000b:/mnt/dantu/rwtest/doio.f2.20329 | /opt/20100831/ltp-full-20100831/testcases/bin/doio -N iogen01 -a -v -n 10 -k iogen(iogen01) starting up with the following: Out-pipe: stdout Iterations: 600 seconds Seed: 24396 Offset-Mode: sequential Overlap Flag: off Mintrans: 1 (1 blocks) Maxtrans: 131072 (256 blocks) O_RAW/O_SSD Multiple: (Determined by device) Syscalls: read write Aio completion types: none Flags: buffered sync Test Files: Path Length iou raw iou file (bytes) (bytes) (bytes) type ----------------------------------------------------------------------------- /mnt/dantu/rwtest/doio.f1.20329 256000 1 512 regular /mnt/dantu/rwtest/doio.f2.20329 512000 1 512 regular doio(iogen01) (24401) 05:36:38 --------------------- fcntl(3, 7, 0775450520) failed for file /mnt/dantu/rwtest/doio.f1.20329, lock type 1, offset 234813, length 2038: No locks available (37), open flags: 0100001 doio(iogen01) (24397) 05:36:38 --------------------- fcntl(4, 7, 0775450520) failed for file /mnt/dantu/rwtest/doio.f1.20329, lock type 1, offset 255677, length 110: No locks available (37), open flags: 0110001 doio(iogen01) (24402) 05:36:38 --------------------- fcntl(3, 7, 0775450520) failed for file /mnt/dantu/rwtest/doio.f1.20329, lock type 1, offset 254169, length 1508: No locks available (37), open flags: 0110001 doio(iogen01) (24403) 05:36:38 --------------------- fcntl(3, 7, 0775450520) failed for file /mnt/dantu/rwtest/doio.f1.20329, lock type 1, offset 236851, length 17318: No locks available (37), open flags: 0110001 doio(iogen01) (24398) 05:36:38 --------------------- fcntl(3, 7, 0775450520) failed for file /mnt/dantu/rwtest/doio.f2.20329, lock type 1, offset 0, length 125104: No locks available (37), open flags: 0100001 doio(iogen01) (24395) 05:36:38 --------------------- (parent) pid 24401 exited because of an internal error doio(iogen01) (24395) 05:36:38 --------------------- (parent) pid 24397 exited because of an internal error doio(iogen01) (24399) 05:36:38 --------------------- fcntl(5, 7, 0775450520) failed for file /mnt/dantu/rwtest/doio.f1.20329, lock type 1, offset 255831, length 24: No locks available (37), open flags: 0100001 doio(iogen01) (24404) 05:36:38 --------------------- fcntl(5, 7, 0775450520) failed for file /mnt/dantu/rwtest/doio.f1.20329, lock type 1, offset 255979, length 19: No locks available (37), open flags: 0100001 doio(iogen01) (24406) 05:36:38 --------------------- fcntl(4, 7, 0775450520) failed for file /mnt/dantu/rwtest/doio.f2.20329, lock type 1, offset 415812, length 74404: No locks available (37), open flags: 0110001 doio(iogen01) (24395) 05:36:38 --------------------- (parent) pid 24398 exited because of an internal error doio(iogen01) (24395) 05:36:38 --------------------- (parent) pid 24399 exited because of an internal error doio(iogen01) (24395) 05:36:38 --------------------- (parent) pid 24402 exited because of an internal error ... verified on RHEL5.6-Server-20101014.0 on i386 and x86_64: [root@dell-per610-02 ltp-full-20100831]# env LTPROOT=/root/ltp-full-20100831 PATH=$PATH:. ./testcases/kernel/fs/doio/rwtest -N iogen01 -i 600s -s read,write -Da -Dv -n 10 500b:/nfs/dantu/rwtest/doio.f1.$$ 1000b:/nfs/dantu/rwtest/doio.f2.$$ /root/ltp-full-20100831/testcases/bin/iogen -N iogen01 -i 600s -s read,write 500b:/nfs/dantu/rwtest/doio.f1.4227 1000b:/nfs/dantu/rwtest/doio.f2.4227 | /root/ltp-full-20100831/testcases/bin/doio -N iogen01 -a -v -n 10 -k iogen(iogen01) starting up with the following: Out-pipe: stdout Iterations: 600 seconds Seed: 2987 Offset-Mode: sequential Overlap Flag: off Mintrans: 1 (1 blocks) Maxtrans: 131072 (256 blocks) O_RAW/O_SSD Multiple: (Determined by device) Syscalls: read write Aio completion types: none Flags: buffered sync Test Files: Path Length iou raw iou file (bytes) (bytes) (bytes) type ----------------------------------------------------------------------------- /nfs/dantu/rwtest/doio.f1.4227 256000 1 512 regular /nfs/dantu/rwtest/doio.f2.4227 512000 1 512 regular Test passed
*** Bug 664814 has been marked as a duplicate of this bug. ***
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html