481371 – PG_error bit is never cleared, even when a fresh I/O to the page succeeds

Bug 481371 - PG_error bit is never cleared, even when a fresh I/O to the page succeeds

Summary: PG_error bit is never cleared, even when a fresh I/O to the page succeeds

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.8
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Rik van Riel
QA Contact:	Gris Ge
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	589295 590763
TreeView+	depends on / blocked

Reported:	2009-01-23 19:41 UTC by Jeff Moyer
Modified:	2011-02-16 16:06 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	590763 (view as bug list)
Environment:
Last Closed:	2011-02-16 16:06:16 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Clear PG_error before issuing a readpage (473 bytes, patch) 2009-01-23 21:33 UTC, Jeff Moyer	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2011:0263	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 4.9 kernel security and bug fix update	2011-02-16 15:14:55 UTC

Description Jeff Moyer 2009-01-23 19:41:22 UTC

Description of problem:

The problem is that, once you set the PG_error bit for the page cache page, a regular read on the device file will always and forever see an
error.  I proposed this patch upstream:
  http://lkml.org/lkml/2009/1/23/288

A simple way around the problem is to mmap the device and read from the
locations that are giving I/O errors (but that's hardly acceptable!).


Version-Release number of selected component (if applicable):
2.4.9-78.30.EL

How reproducible:
100%

Steps to Reproduce:
1. Incur an I/O error by, for example, failing all paths to a device and trying to read from it.
2. restore the device to working order
3. try to read the failed sectors
  
Actual results:
EIO

Expected results:
Read succeeds

Additional info:
See also bug 454872 for one instance where this was seen.

Comment 1 Jeff Moyer 2009-01-23 21:33:18 UTC

Created attachment 329884 [details]
Clear PG_error before issuing a readpage

This should fix the problem.

Comment 2 Ben Marzinski 2009-01-23 22:46:42 UTC

I tested both RHEL5 and upstream. This bug also exists in both to them.

Comment 3 RHEL Program Management 2009-06-05 13:47:00 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 4 Vivek Goyal 2010-09-23 13:01:51 UTC

Committed in 89.37.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 9 Gris Ge 2011-01-26 07:40:25 UTC

Reproduced this problem on kernel-2.6.9-78.30.EL
This is the steps for testing on HBA multipath device:
=================================
mkfs.ext3 /dev/mapper/mpath0

mount /dev/mapper/mpath0 /mnt
echo "test" > /mnt/testfile
umount /mnt
sync
echo 3 > /proc/sys/vm/drop_caches
mount /dev/mapper/mpath0 /mnt
ls /mnt
#bring FC link down from switch side.
strace dd if=/mnt/testfile
#got read(0, 0x50b000, 512)                  = -1 EIO (Input/output error)
#bring FC link up from switch side.
#mulitpath show link was revived.
strace dd if=/mnt/testfile
#still the same error.

kernel-2.6.9-96.EL fixed this issue.
Verified.

Comment 10 errata-xmlrpc 2011-02-16 16:06:16 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0263.html

Note You need to log in before you can comment on or make changes to this bug.