Bug 481371 - PG_error bit is never cleared, even when a fresh I/O to the page succeeds
PG_error bit is never cleared, even when a fresh I/O to the page succeeds
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
All Linux
low Severity medium
: rc
: ---
Assigned To: Rik van Riel
Gris Ge
Depends On:
Blocks: 589295 590763
  Show dependency treegraph
Reported: 2009-01-23 14:41 EST by Jeff Moyer
Modified: 2011-02-16 11:06 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 590763 (view as bug list)
Last Closed: 2011-02-16 11:06:16 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Clear PG_error before issuing a readpage (473 bytes, patch)
2009-01-23 16:33 EST, Jeff Moyer
no flags Details | Diff

  None (edit)
Description Jeff Moyer 2009-01-23 14:41:22 EST
Description of problem:

The problem is that, once you set the PG_error bit for the page cache page, a regular read on the device file will always and forever see an
error.  I proposed this patch upstream:

A simple way around the problem is to mmap the device and read from the
locations that are giving I/O errors (but that's hardly acceptable!).

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Incur an I/O error by, for example, failing all paths to a device and trying to read from it.
2. restore the device to working order
3. try to read the failed sectors
Actual results:

Expected results:
Read succeeds

Additional info:
See also bug 454872 for one instance where this was seen.
Comment 1 Jeff Moyer 2009-01-23 16:33:18 EST
Created attachment 329884 [details]
Clear PG_error before issuing a readpage

This should fix the problem.
Comment 2 Ben Marzinski 2009-01-23 17:46:42 EST
I tested both RHEL5 and upstream. This bug also exists in both to them.
Comment 3 RHEL Product and Program Management 2009-06-05 09:47:00 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
Comment 4 Vivek Goyal 2010-09-23 09:01:51 EDT
Committed in 89.37.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
Comment 9 Gris Ge 2011-01-26 02:40:25 EST
Reproduced this problem on kernel-2.6.9-78.30.EL
This is the steps for testing on HBA multipath device:
mkfs.ext3 /dev/mapper/mpath0

mount /dev/mapper/mpath0 /mnt
echo "test" > /mnt/testfile
umount /mnt
echo 3 > /proc/sys/vm/drop_caches
mount /dev/mapper/mpath0 /mnt
ls /mnt
#bring FC link down from switch side.
strace dd if=/mnt/testfile
#got read(0, 0x50b000, 512)                  = -1 EIO (Input/output error)
#bring FC link up from switch side.
#mulitpath show link was revived.
strace dd if=/mnt/testfile
#still the same error.

kernel-2.6.9-96.EL fixed this issue.
Comment 10 errata-xmlrpc 2011-02-16 11:06:16 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.