Bug 590763 - PG_error bit is never cleared, even when a fresh I/O to the page succeeds
Summary: PG_error bit is never cleared, even when a fresh I/O to the page succeeds
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.6
Hardware: All
OS: Linux
urgent
high
Target Milestone: rc
: ---
Assignee: Rik van Riel
QA Contact: Barry Donahue
URL:
Whiteboard:
: 563343 (view as bug list)
Depends On: 481371
Blocks: 591848 596334 599739
TreeView+ depends on / blocked
 
Reported: 2010-05-10 16:07 UTC by Jeremy West
Modified: 2018-10-27 14:17 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Input/output errors can occur due to temporary failures, such as multipath errors or losing network contact with an iSCSI server. In these cases, virtual memory attempts to retry the readpage() function on the memory page. However, the do_generic_file_read() function did not clear PG_error, which resulted in the system being unable to use the data in the page cache page, even if subsequent readpage() calls succeeded. With this update, the do_generic_file_read() function properly clears PG_error so that the page cache can be utilized in the case of input/output errors.
Clone Of: 481371
Environment:
Last Closed: 2011-01-13 21:31:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
clear PG_error before resubmitting readpage (1.30 KB, patch)
2010-05-27 15:49 UTC, Rik van Riel
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0017 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.6 kernel security and bug fix update 2011-01-13 10:37:42 UTC

Description Jeremy West 2010-05-10 16:07:30 UTC
+++ This bug was initially created as a clone of Bug #481371 +++

Description of problem:

The problem is that, once you set the PG_error bit for the page cache page, a regular read on the device file will always and forever see an
error.  I proposed this patch upstream:
  http://lkml.org/lkml/2009/1/23/288

A simple way around the problem is to mmap the device and read from the
locations that are giving I/O errors (but that's hardly acceptable!).


Version-Release number of selected component (if applicable):
2.4.9-78.30.EL

How reproducible:
100%

Steps to Reproduce:
1. Incur an I/O error by, for example, failing all paths to a device and trying to read from it.
2. restore the device to working order
3. try to read the failed sectors
  
Actual results:
EIO

Expected results:
Read succeeds

Additional info:
See also bug 454872 for one instance where this was seen.

--- Additional comment from jmoyer on 2009-01-23 16:33:18 EST ---

Created an attachment (id=329884)
Clear PG_error before issuing a readpage

This should fix the problem.

--- Additional comment from bmarzins on 2009-01-23 17:46:42 EST ---

I tested both RHEL5 and upstream. This bug also exists in both to them.

--- Additional comment from pm-rhel on 2009-06-05 09:47:00 EDT ---

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 2 Jeremy West 2010-05-10 16:09:15 UTC
*** Bug 563343 has been marked as a duplicate of this bug. ***

Comment 5 Rik van Riel 2010-05-27 15:49:11 UTC
Created attachment 417293 [details]
clear PG_error before resubmitting readpage

Comment 8 Jarod Wilson 2010-06-14 18:23:30 UTC
in kernel-2.6.18-203.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 10 Douglas Silas 2010-06-28 20:47:40 UTC
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.

New Contents:
Input/output errors can occur due to temporary failures, such as multipath errors or losing network contact with an iSCSI server. In these cases, virtual memory attempts to retry the readpage() function on the memory page. However, the do_generic_file_read() function did not clear PG_error, which resulted in the system being unable to use the data in the page cache page, even if subsequent readpage() calls succeeded. With this update, the do_generic_file_read() function properly clears PG_error so that the page cache can be utilized in the case of input/output errors.

Comment 15 errata-xmlrpc 2011-01-13 21:31:15 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html


Note You need to log in before you can comment on or make changes to this bug.