Bug 187249 - [RHEL4 U3] dm-mirror: read stalls if all mirrors failed
Summary: [RHEL4 U3] dm-mirror: read stalls if all mirrors failed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Alasdair Kergon
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 181409 186476
TreeView+ depends on / blocked
 
Reported: 2006-03-29 15:28 UTC by Jun'ichi NOMURA
Modified: 2007-11-30 22:07 UTC (History)
10 users (show)

Fixed In Version: RHSA-2006-0575
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-08-10 22:58:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Testcase: Try to read from failed mirror device (578 bytes, application/x-shellscript)
2006-03-29 15:36 UTC, Jun'ichi NOMURA
no flags Details
pass correct size to bio_endio (746 bytes, patch)
2006-03-29 15:42 UTC, Jun'ichi NOMURA
no flags Details | Diff
Fix error message (729 bytes, patch)
2006-03-29 16:38 UTC, Jun'ichi NOMURA
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2006:0575 0 normal SHIPPED_LIVE Important: Updated kernel packages available for Red Hat Enterprise Linux 4 Update 4 2006-08-10 04:00:00 UTC

Description Jun'ichi NOMURA 2006-03-29 15:28:20 UTC
Description of problem:
  [RHEL4 U3] dm-mirror: read stalls if all mirrors failed

Version-Release number of selected component:
  kernel-2.6.9-34.EL

How reproducible:
  Always

Steps to Reproduce:
  1. Create a dm-mirror device
  2. Fail all underlying devices
  3. Read from the device

Actual results:
  Read will stall.

Expected results:
  Read should fail.

Hardware info:
  No dependency to hardware.

Additional Info:
  This bug can be one of the causes of BZ#185751.

  This is caused by calling bio_endio with 0 size.
  Fixing this problem reveals other bug.
  Attached set of patches will fix the problem.

Comment 1 Jun'ichi NOMURA 2006-03-29 15:36:54 UTC
Created attachment 126995 [details]
Testcase: Try to read from failed mirror device

# sh mirror-read-fail-test.sh
0 256 mirror core 1 16 2  /dev/mapper/err1 0 /dev/mapper/err2 0
Read from failed mirror.
If you don't see 'PASS', the test fails.

<If the bug is not, we'll stop here>

dd: reading `/dev/mapper/error-mirror': Input/output error
0+0 records in
0+0 records out
PASS

Comment 2 Jun'ichi NOMURA 2006-03-29 15:39:49 UTC
Typical backtrace of the stalled process:

crash> bt 22493
PID: 22493  TASK: 101aeda57f0       CPU: 1   COMMAND: "dd"
 #0 [10164b65af8] schedule at ffffffff80304a85
 #1 [10164b65bd0] io_schedule at ffffffff803053ef
 #2 [10164b65bf0] __lock_page at ffffffff80159215
 #3 [10164b65c70] find_get_page at ffffffff8015929c
 #4 [10164b65c90] do_generic_mapping_read at ffffffff80159771
 #5 [10164b65d90] __generic_file_aio_read at ffffffff8015b53c
 #6 [10164b65e10] generic_file_read at ffffffff8015b6d7
 #7 [10164b65f10] vfs_read at ffffffff80177a83
 #8 [10164b65f40] sys_read at ffffffff80177cda
 #9 [10164b65f80] system_call at ffffffff801101c6


Comment 3 Jun'ichi NOMURA 2006-03-29 15:42:05 UTC
Created attachment 126996 [details]
pass correct size to bio_endio

Complete the failed bio with correct size.
Otherwise, reads to the all-failed mirror never completes.

Comment 4 Jun'ichi NOMURA 2006-03-29 16:38:43 UTC
Created attachment 127002 [details]
Fix error message

When the failed bio completes, we'll see the following
kernel message:
  Out of memory causing inability to retry read.

The patch tries to fix the message.

Comment 5 Jonathan Earl Brassow 2006-04-01 17:36:52 UTC
Corey,

The attachment in comment #1 should be added to the mirror test suite

Comment 6 Jason Baron 2006-04-28 17:29:09 UTC
committed in stream U4 build 34.26. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 10 Red Hat Bugzilla 2006-08-10 22:58:16 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0575.html



Note You need to log in before you can comment on or make changes to this bug.