Bug 187249

Summary: [RHEL4 U3] dm-mirror: read stalls if all mirrors failed
Product: Red Hat Enterprise Linux 4 Reporter: Jun'ichi NOMURA <junichi.nomura>
Component: kernelAssignee: Alasdair Kergon <agk>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: agk, christophe.varoqui, cmarthal, egoggin, jbrassow, kueda, lmb, mbroz, tao, tranlan
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2006-0575 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-08-10 22:58:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 181409, 186476    
Attachments:
Description Flags
Testcase: Try to read from failed mirror device
none
pass correct size to bio_endio
none
Fix error message none

Description Jun'ichi NOMURA 2006-03-29 15:28:20 UTC
Description of problem:
  [RHEL4 U3] dm-mirror: read stalls if all mirrors failed

Version-Release number of selected component:
  kernel-2.6.9-34.EL

How reproducible:
  Always

Steps to Reproduce:
  1. Create a dm-mirror device
  2. Fail all underlying devices
  3. Read from the device

Actual results:
  Read will stall.

Expected results:
  Read should fail.

Hardware info:
  No dependency to hardware.

Additional Info:
  This bug can be one of the causes of BZ#185751.

  This is caused by calling bio_endio with 0 size.
  Fixing this problem reveals other bug.
  Attached set of patches will fix the problem.

Comment 1 Jun'ichi NOMURA 2006-03-29 15:36:54 UTC
Created attachment 126995 [details]
Testcase: Try to read from failed mirror device

# sh mirror-read-fail-test.sh
0 256 mirror core 1 16 2  /dev/mapper/err1 0 /dev/mapper/err2 0
Read from failed mirror.
If you don't see 'PASS', the test fails.

<If the bug is not, we'll stop here>

dd: reading `/dev/mapper/error-mirror': Input/output error
0+0 records in
0+0 records out
PASS

Comment 2 Jun'ichi NOMURA 2006-03-29 15:39:49 UTC
Typical backtrace of the stalled process:

crash> bt 22493
PID: 22493  TASK: 101aeda57f0       CPU: 1   COMMAND: "dd"
 #0 [10164b65af8] schedule at ffffffff80304a85
 #1 [10164b65bd0] io_schedule at ffffffff803053ef
 #2 [10164b65bf0] __lock_page at ffffffff80159215
 #3 [10164b65c70] find_get_page at ffffffff8015929c
 #4 [10164b65c90] do_generic_mapping_read at ffffffff80159771
 #5 [10164b65d90] __generic_file_aio_read at ffffffff8015b53c
 #6 [10164b65e10] generic_file_read at ffffffff8015b6d7
 #7 [10164b65f10] vfs_read at ffffffff80177a83
 #8 [10164b65f40] sys_read at ffffffff80177cda
 #9 [10164b65f80] system_call at ffffffff801101c6


Comment 3 Jun'ichi NOMURA 2006-03-29 15:42:05 UTC
Created attachment 126996 [details]
pass correct size to bio_endio

Complete the failed bio with correct size.
Otherwise, reads to the all-failed mirror never completes.

Comment 4 Jun'ichi NOMURA 2006-03-29 16:38:43 UTC
Created attachment 127002 [details]
Fix error message

When the failed bio completes, we'll see the following
kernel message:
  Out of memory causing inability to retry read.

The patch tries to fix the message.

Comment 5 Jonathan Earl Brassow 2006-04-01 17:36:52 UTC
Corey,

The attachment in comment #1 should be added to the mirror test suite

Comment 6 Jason Baron 2006-04-28 17:29:09 UTC
committed in stream U4 build 34.26. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 10 Red Hat Bugzilla 2006-08-10 22:58:16 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0575.html