Description of problem: The problem is that, once you set the PG_error bit for the page cache page, a regular read on the device file will always and forever see an error. I proposed this patch upstream: http://lkml.org/lkml/2009/1/23/288 A simple way around the problem is to mmap the device and read from the locations that are giving I/O errors (but that's hardly acceptable!). Version-Release number of selected component (if applicable): 2.4.9-78.30.EL How reproducible: 100% Steps to Reproduce: 1. Incur an I/O error by, for example, failing all paths to a device and trying to read from it. 2. restore the device to working order 3. try to read the failed sectors Actual results: EIO Expected results: Read succeeds Additional info: See also bug 454872 for one instance where this was seen.
Created attachment 329884 [details] Clear PG_error before issuing a readpage This should fix the problem.
I tested both RHEL5 and upstream. This bug also exists in both to them.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Committed in 89.37.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
Reproduced this problem on kernel-2.6.9-78.30.EL This is the steps for testing on HBA multipath device: ================================= mkfs.ext3 /dev/mapper/mpath0 mount /dev/mapper/mpath0 /mnt echo "test" > /mnt/testfile umount /mnt sync echo 3 > /proc/sys/vm/drop_caches mount /dev/mapper/mpath0 /mnt ls /mnt #bring FC link down from switch side. strace dd if=/mnt/testfile #got read(0, 0x50b000, 512) = -1 EIO (Input/output error) #bring FC link up from switch side. #mulitpath show link was revived. strace dd if=/mnt/testfile #still the same error. kernel-2.6.9-96.EL fixed this issue. Verified.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0263.html