Bug 185754

Summary:	[RHEL4 U3] kernel dm mirror: unrelated mirror devices stall if any log device fails
Product:	Red Hat Enterprise Linux 4	Reporter:	Kiyoshi Ueda <kueda>
Component:	kernel	Assignee:	Alasdair Kergon <agk>
Status:	CLOSED ERRATA	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	4.0	CC:	agk, christophe.varoqui, coughlan, egoggin, jbrassow, jnomura, lmb, mbroz, tao, tranlan
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	RHSA-2006-0575	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2006-08-10 22:45:58 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	181409, 186476

Description Kiyoshi Ueda 2006-03-17 16:52:36 UTC

Description of problem:
If a log device fails, *ALL* mirror devices stall.
(The "ALL" includes other mirror devices which doesn't use the
 log device.)


Version-Release number of selected component:
kernel-2.6.9-34.EL


How reproducible:
Always


Steps to Reproduce:
 1. Prepare some PVs (more than 5) and create 2 VGs from them.
    Example)
      - /dev/sda, /dev/sdb, /dev/sdc, /dev/sdd, /dev/sde, /dev/sdf as PVs
      - vg0 contains 3 PVs, /dev/sda, /dev/sdb, /dev/sdc
      - vg1 contains 3 PVs, /dev/sdd, /dev/sde, /dev/sdf
 2. Create a mirror LV on each VG and activate it.
      # lvcreate -L 12M -n lv0 -m 1 vg0
      # lvcreate -L 12M -n lv1 -m 1 vg1
 3. Issue I/Os to the mirror LVs and continue that.
      # while true; do
      > dd if=/dev/zero of=/dev/mapper/vg0-lv0 bs=512 count=1 >& /dev/null
      > dd if=/dev/zero of=/dev/mapper/vg1-lv1 bs=512 count=1 >& /dev/null
      > done
 4. Disconnect one of PVs used for the log device of one of the mirror LVs.
    Example) If /dev/sdc is used for the log device of the vg0-lv0:
      # echo offline > /sys/block/sdc/device/state
 5. Check if I/Os to the vg1-lv1 are processed.
      # iostat 1


Actual results:
I/Os to the vg1-lv1 are not processed.


Expected results:
I/Os to the vg1-lv1 are processed, because all PVs for the vg1-lv1
are fine.


Additional info:
This problem seems to be in kmirrord.
kmirrord is blocked in disk_flush() if update of the log fails.
Back trace of kmirrord are attached below.

-----------------------------------------------------------------------
crash> bt 2115
PID: 2115   TASK: 101aff8a030       CPU: 3   COMMAND: "kmirrord"
 #0 [101ac01bb58] schedule at ffffffff80304a85
 #1 [101ac01bc30] wait_for_completion at ffffffff80304cbd
 #2 [101ac01bc90] dm_table_event at ffffffffa00ea343
 #3 [101ac01bcb0] disk_flush at ffffffffa01019ce
 #4 [101ac01bcd0] do_work at ffffffffa0102ce5
 #5 [101ac01bd10] move_tasks at ffffffff8013257f
 #6 [101ac01bda0] thread_return at ffffffff80304add
 #7 [101ac01be70] worker_thread at ffffffff80146e1e
 #8 [101ac01bf20] kthread at ffffffff8014aa93
 #9 [101ac01bf50] kernel_thread at ffffffff80110e17
crash>
-----------------------------------------------------------------------

Comment 1 Kiyoshi Ueda 2006-03-23 21:11:17 UTC

Additional info:
I'd like to say this is kernel issue, not dmeventd issue.
To reproduce the kernel issue, the following setting is needed
before Step 1 of the reproduction steps.

  0. Modify /etc/lvm/lvm.conf not to launch the dmeventd like below.
        dmeventd {
            mirror_library = "none"
        }

If this step isn't done, dmeventd may handle the log device failure.

Comment 2 Jonathan Earl Brassow 2006-03-23 21:57:36 UTC

w/o changes I've been working on, log failures are not handled by the userspace code.

Comment 5 Jason Baron 2006-05-09 17:07:33 UTC

committed in stream U4 build 34.26. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/

Comment 8 Red Hat Bugzilla 2006-08-10 22:45:58 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0575.html