Bug 557423
| Summary: | nfs: sys_read sometimes returns -EIO | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Vitaliy Gusev <vgusev> | ||||
| Component: | kernel | Assignee: | Jeff Layton <jlayton> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 5.4 | CC: | chuck.lever, dhoward, guru.anbalagane, jeder, jlayton, jpirko, plyons, qcai, sardella, steved, tao, vgusev, xemul, yanwang | ||||
| Target Milestone: | rc | Keywords: | Regression, ZStream | ||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: |
On NFS, the read(2) system call could have returned an unexpected EIO (input/ouput error) value.
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2011-01-13 21:01:47 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 594059, 594060, 594061, 605694 | ||||||
| Attachments: |
|
||||||
In the mainstream kernel this issue was fixed in commmit:
commit 61822ab5e3ed09fcfc49e37227b655202adf6130
Author: Trond Myklebust <Trond.Myklebust>
Date: Tue Dec 5 00:35:42 2006 -0500
NFS: Ensure we only call set_page_writeback() under the page lock
Signed-off-by: Trond Myklebust <Trond.Myklebust>
We also hit this bugz and will test it and post results. Created attachment 402159 [details]
patch to revert retcode check in nfs_revalidate_mapping()
The above patch is needed to overcome EIO error as the backport of commit 61822ab5e3ed09fcfc49e37227b655202adf6130 to el5 does not look simple. Agreed. Backporting that commit looks decidedly non-trivial. Thanks for the patch. Was able to get this to reproduce with the following command: env LTPROOT=/usr/lib/ltp PATH=$PATH:. ./rwtest -N iogen01 -i 600s -s read,write -Da -Dv -n 10 500b:/mnt/dantu/rwtest/doio.f1.$$ 1000b:/mnt/dantu/rwtest/doio.f2.$$ ...now to make sure the patch fixes it. Looks like it does. Interestingly the test doesn't like running under screen for some reason. That's unrelated to this bug however. in kernel-2.6.18-200.el5 You can download this test kernel from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed. Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: On NFS, the read(2) system call could have returned an unexpected EIO (input/ouput error) value. can reproduced on kernel 2.6.18-194.el5:
# uname -a
Linux hp-dl120g6-01.rhts.eng.bos.redhat.com 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
[root@hp-dl120g6-01 ltp-full-20100831]# env LTPROOT=/opt/20100831/ltp-full-20100831 PATH=$PATH:. ./testcases/kernel/fs/doio/rwtest -N iogen01 -i 600s -s read,write -Da -Dv -n 10 500b:/mnt/dantu/rwtest/doio.f1.$$ 1000b:/mnt/dantu/rwtest/doio.f2.$$
/opt/20100831/ltp-full-20100831/testcases/bin/iogen -N iogen01 -i 600s -s read,write 500b:/mnt/dantu/rwtest/doio.f1.20329 1000b:/mnt/dantu/rwtest/doio.f2.20329 | /opt/20100831/ltp-full-20100831/testcases/bin/doio -N iogen01 -a -v -n 10 -k
iogen(iogen01) starting up with the following:
Out-pipe: stdout
Iterations: 600 seconds
Seed: 24396
Offset-Mode: sequential
Overlap Flag: off
Mintrans: 1 (1 blocks)
Maxtrans: 131072 (256 blocks)
O_RAW/O_SSD Multiple: (Determined by device)
Syscalls: read write
Aio completion types: none
Flags: buffered sync
Test Files:
Path Length iou raw iou file
(bytes) (bytes) (bytes) type
-----------------------------------------------------------------------------
/mnt/dantu/rwtest/doio.f1.20329 256000 1 512 regular
/mnt/dantu/rwtest/doio.f2.20329 512000 1 512 regular
doio(iogen01) (24401) 05:36:38
---------------------
fcntl(3, 7, 0775450520) failed for file /mnt/dantu/rwtest/doio.f1.20329, lock type 1, offset 234813, length 2038: No locks available (37), open flags: 0100001
doio(iogen01) (24397) 05:36:38
---------------------
fcntl(4, 7, 0775450520) failed for file /mnt/dantu/rwtest/doio.f1.20329, lock type 1, offset 255677, length 110: No locks available (37), open flags: 0110001
doio(iogen01) (24402) 05:36:38
---------------------
fcntl(3, 7, 0775450520) failed for file /mnt/dantu/rwtest/doio.f1.20329, lock type 1, offset 254169, length 1508: No locks available (37), open flags: 0110001
doio(iogen01) (24403) 05:36:38
---------------------
fcntl(3, 7, 0775450520) failed for file /mnt/dantu/rwtest/doio.f1.20329, lock type 1, offset 236851, length 17318: No locks available (37), open flags: 0110001
doio(iogen01) (24398) 05:36:38
---------------------
fcntl(3, 7, 0775450520) failed for file /mnt/dantu/rwtest/doio.f2.20329, lock type 1, offset 0, length 125104: No locks available (37), open flags: 0100001
doio(iogen01) (24395) 05:36:38
---------------------
(parent) pid 24401 exited because of an internal error
doio(iogen01) (24395) 05:36:38
---------------------
(parent) pid 24397 exited because of an internal error
doio(iogen01) (24399) 05:36:38
---------------------
fcntl(5, 7, 0775450520) failed for file /mnt/dantu/rwtest/doio.f1.20329, lock type 1, offset 255831, length 24: No locks available (37), open flags: 0100001
doio(iogen01) (24404) 05:36:38
---------------------
fcntl(5, 7, 0775450520) failed for file /mnt/dantu/rwtest/doio.f1.20329, lock type 1, offset 255979, length 19: No locks available (37), open flags: 0100001
doio(iogen01) (24406) 05:36:38
---------------------
fcntl(4, 7, 0775450520) failed for file /mnt/dantu/rwtest/doio.f2.20329, lock type 1, offset 415812, length 74404: No locks available (37), open flags: 0110001
doio(iogen01) (24395) 05:36:38
---------------------
(parent) pid 24398 exited because of an internal error
doio(iogen01) (24395) 05:36:38
---------------------
(parent) pid 24399 exited because of an internal error
doio(iogen01) (24395) 05:36:38
---------------------
(parent) pid 24402 exited because of an internal error
...
verified on RHEL5.6-Server-20101014.0 on i386 and x86_64:
[root@dell-per610-02 ltp-full-20100831]# env LTPROOT=/root/ltp-full-20100831 PATH=$PATH:. ./testcases/kernel/fs/doio/rwtest -N iogen01 -i 600s -s read,write -Da -Dv -n 10 500b:/nfs/dantu/rwtest/doio.f1.$$ 1000b:/nfs/dantu/rwtest/doio.f2.$$
/root/ltp-full-20100831/testcases/bin/iogen -N iogen01 -i 600s -s read,write 500b:/nfs/dantu/rwtest/doio.f1.4227 1000b:/nfs/dantu/rwtest/doio.f2.4227 | /root/ltp-full-20100831/testcases/bin/doio -N iogen01 -a -v -n 10 -k
iogen(iogen01) starting up with the following:
Out-pipe: stdout
Iterations: 600 seconds
Seed: 2987
Offset-Mode: sequential
Overlap Flag: off
Mintrans: 1 (1 blocks)
Maxtrans: 131072 (256 blocks)
O_RAW/O_SSD Multiple: (Determined by device)
Syscalls: read write
Aio completion types: none
Flags: buffered sync
Test Files:
Path Length iou raw iou file
(bytes) (bytes) (bytes) type
-----------------------------------------------------------------------------
/nfs/dantu/rwtest/doio.f1.4227 256000 1 512 regular
/nfs/dantu/rwtest/doio.f2.4227 512000 1 512 regular
Test passed
*** Bug 664814 has been marked as a duplicate of this bug. *** An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html |
Description of problem: On NFS, read (2) can return unexpected -EIO Version-Release number of selected component (if applicable): kernel: 2.6.18-164.6.1.el5 x86_64 ltp: 20081130 How reproducible: run "rwtest" program from LTP suite ./rwtest -N iogen01 -i 120s -s read,write -Da -Dv -n 2 500b:doio.f1.$$ 1000b:doio.f2.$$ Note, that for reproducing you perhaps need to run serveral instances of the "rwtest" (in background) Actual results: Expected results: Additional info: This issue introduced by patch: linux-2.6-nfs-fix-cache-invalidation-problems-in-nfs_readdir.patch: > - nfs_inc_stats(inode, NFSIOS_DATAINVALIDATE); > - if (S_ISREG(inode->i_mode)) > - nfs_sync_mapping(mapping); > - invalidate_inode_pages3(mapping); > - > + if (mapping->nrpages != 0) { > + if (S_ISREG(inode->i_mode)) { > + ret = nfs_sync_mapping(mapping); > + if (ret < 0) > + goto out; > + } > + ret = invalidate_inode_pages3(mapping); > + if (ret < 0) > + goto out; > + } Old core ignored return value of invalidate_inode_pages3(), but new code check return value. Dump stack that shows why EIO occurs: try_to_release_page invalidate_complete_page2 invalidate_inode_pages3_range invalidate_inode_pages3 :nfs:nfs_revalidate_mapping :nfs:nfs_file_read do_sync_read vfs_read sys_read --- See to try_to_release_page: int try_to_release_page(struct page *page, gfp_t gfp_mask) { struct address_space * const mapping = page->mapping; BUG_ON(!PageLocked(page)); if (PageWriteback(page)) ^^^^^^^^^^^^^^^^^^^^^^^^ Here page can suddenly become Writeback. Note page is locked here. return 0; if (mapping && mapping->a_ops->releasepage) return mapping->a_ops->releasepage(page, gfp_mask); return try_to_free_buffers(page); } EXPORT_SYMBOL(try_to_release_page);