Description of problem: Data corruption occurs by temporal errors during recovery, because all errors during recovery are ignored. Version-Release number of selected component: kernel-2.6.9-34.EL How reproducible: Always Steps to Reproduce: 1. Prepare some PVs (more than 2) and create VG from them. Example) - /dev/sda, /dev/sdb, /dev/sdc as PVs - vg0 contains these 3 PVs 2. Create a mirror LV and activate it. # lvcreate -L 200M -n lv0 -m 1 vg0 3. Make filesystem on the mirror LV. # mke2fs -j /dev/mapper/vg0-lv0 4. Disconnect one of PVs used for the mirror LV. # echo offline > /sys/block/sdb/device/state This step must be completed before the recovery has been finished. 5. Re-connect the PV. # echo running > /sys/block/sdb/device/state 6. Wait the recovery has been finished. 7. Check if the filesystem is fine. This check should be done many times because errors may not be detected by read balance. # while true; do e2fsck -f /dev/mapper/vg0-lv0; done Actual results: e2fsck complains file system errors, while there is no error is recorded in kernel log and 'dmsetup status' shows no failure on mirror. This happens because the temporal PV failure from Step 4 through Step 5 is ignored in the kernel. Expected results: e2fsck should not detect any error. In the kernel side, errors during recovery should be handled. Additional info:
Additional info: The error handler should mark the failed device as "failed". And the status of the region having errors in recovery should be out-of-sync until the failed device is removed from the mirror map or it is restored and recovered correctly. If the status is "in-sync", other data corruption should occur like below: - Errors by temporal device failure are detected during recovery. - The errors are handled and the failed device is marked as "failed", but corresponding regions are marked as "in-sync". - System down before the dmeventd takes action. - The temporal device failure becomes fine during system down. - Bootup and the mirror map is activated with "no error" and "no recovery".
committed in stream U4 build 34.26. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0575.html