Description of problem: [RHEL4 U3] dm-mirror: read stalls if all mirrors failed Version-Release number of selected component: kernel-2.6.9-34.EL How reproducible: Always Steps to Reproduce: 1. Create a dm-mirror device 2. Fail all underlying devices 3. Read from the device Actual results: Read will stall. Expected results: Read should fail. Hardware info: No dependency to hardware. Additional Info: This bug can be one of the causes of BZ#185751. This is caused by calling bio_endio with 0 size. Fixing this problem reveals other bug. Attached set of patches will fix the problem.
Created attachment 126995 [details] Testcase: Try to read from failed mirror device # sh mirror-read-fail-test.sh 0 256 mirror core 1 16 2 /dev/mapper/err1 0 /dev/mapper/err2 0 Read from failed mirror. If you don't see 'PASS', the test fails. <If the bug is not, we'll stop here> dd: reading `/dev/mapper/error-mirror': Input/output error 0+0 records in 0+0 records out PASS
Typical backtrace of the stalled process: crash> bt 22493 PID: 22493 TASK: 101aeda57f0 CPU: 1 COMMAND: "dd" #0 [10164b65af8] schedule at ffffffff80304a85 #1 [10164b65bd0] io_schedule at ffffffff803053ef #2 [10164b65bf0] __lock_page at ffffffff80159215 #3 [10164b65c70] find_get_page at ffffffff8015929c #4 [10164b65c90] do_generic_mapping_read at ffffffff80159771 #5 [10164b65d90] __generic_file_aio_read at ffffffff8015b53c #6 [10164b65e10] generic_file_read at ffffffff8015b6d7 #7 [10164b65f10] vfs_read at ffffffff80177a83 #8 [10164b65f40] sys_read at ffffffff80177cda #9 [10164b65f80] system_call at ffffffff801101c6
Created attachment 126996 [details] pass correct size to bio_endio Complete the failed bio with correct size. Otherwise, reads to the all-failed mirror never completes.
Created attachment 127002 [details] Fix error message When the failed bio completes, we'll see the following kernel message: Out of memory causing inability to retry read. The patch tries to fix the message.
Corey, The attachment in comment #1 should be added to the mirror test suite
committed in stream U4 build 34.26. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0575.html