+++ This bug was initially created as a clone of Bug #481371 +++
Description of problem:
The problem is that, once you set the PG_error bit for the page cache page, a regular read on the device file will always and forever see an
error. I proposed this patch upstream:
A simple way around the problem is to mmap the device and read from the
locations that are giving I/O errors (but that's hardly acceptable!).
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Incur an I/O error by, for example, failing all paths to a device and trying to read from it.
2. restore the device to working order
3. try to read the failed sectors
See also bug 454872 for one instance where this was seen.
--- Additional comment from firstname.lastname@example.org on 2009-01-23 16:33:18 EST ---
Created an attachment (id=329884)
Clear PG_error before issuing a readpage
This should fix the problem.
--- Additional comment from email@example.com on 2009-01-23 17:46:42 EST ---
I tested both RHEL5 and upstream. This bug also exists in both to them.
--- Additional comment from firstname.lastname@example.org on 2009-06-05 09:47:00 EDT ---
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
*** Bug 563343 has been marked as a duplicate of this bug. ***
Created attachment 417293 [details]
clear PG_error before resubmitting readpage
You can download this test kernel from http://people.redhat.com/jwilson/el5
Detailed testing feedback is always welcomed.
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
Input/output errors can occur due to temporary failures, such as multipath errors or losing network contact with an iSCSI server. In these cases, virtual memory attempts to retry the readpage() function on the memory page. However, the do_generic_file_read() function did not clear PG_error, which resulted in the system being unable to use the data in the page cache page, even if subsequent readpage() calls succeeded. With this update, the do_generic_file_read() function properly clears PG_error so that the page cache can be utilized in the case of input/output errors.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.