+++ This bug was initially created as a clone of Bug #481371 +++ Description of problem: The problem is that, once you set the PG_error bit for the page cache page, a regular read on the device file will always and forever see an error. I proposed this patch upstream: http://lkml.org/lkml/2009/1/23/288 A simple way around the problem is to mmap the device and read from the locations that are giving I/O errors (but that's hardly acceptable!). Version-Release number of selected component (if applicable): 2.4.9-78.30.EL How reproducible: 100% Steps to Reproduce: 1. Incur an I/O error by, for example, failing all paths to a device and trying to read from it. 2. restore the device to working order 3. try to read the failed sectors Actual results: EIO Expected results: Read succeeds Additional info: See also bug 454872 for one instance where this was seen. --- Additional comment from jmoyer on 2009-01-23 16:33:18 EST --- Created an attachment (id=329884) Clear PG_error before issuing a readpage This should fix the problem. --- Additional comment from bmarzins on 2009-01-23 17:46:42 EST --- I tested both RHEL5 and upstream. This bug also exists in both to them. --- Additional comment from pm-rhel on 2009-06-05 09:47:00 EDT --- This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
*** Bug 563343 has been marked as a duplicate of this bug. ***
Created attachment 417293 [details] clear PG_error before resubmitting readpage
in kernel-2.6.18-203.el5 You can download this test kernel from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Input/output errors can occur due to temporary failures, such as multipath errors or losing network contact with an iSCSI server. In these cases, virtual memory attempts to retry the readpage() function on the memory page. However, the do_generic_file_read() function did not clear PG_error, which resulted in the system being unable to use the data in the page cache page, even if subsequent readpage() calls succeeded. With this update, the do_generic_file_read() function properly clears PG_error so that the page cache can be utilized in the case of input/output errors.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html