Bug 185754

Summary: [RHEL4 U3] kernel dm mirror: unrelated mirror devices stall if any log device fails
Product: Red Hat Enterprise Linux 4 Reporter: Kiyoshi Ueda <kueda>
Component: kernelAssignee: Alasdair Kergon <agk>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: agk, christophe.varoqui, coughlan, egoggin, jbrassow, jnomura, lmb, mbroz, tao, tranlan
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2006-0575 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-08-10 22:45:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 181409, 186476    

Description Kiyoshi Ueda 2006-03-17 16:52:36 UTC
Description of problem:
If a log device fails, *ALL* mirror devices stall.
(The "ALL" includes other mirror devices which doesn't use the
 log device.)


Version-Release number of selected component:
kernel-2.6.9-34.EL


How reproducible:
Always


Steps to Reproduce:
 1. Prepare some PVs (more than 5) and create 2 VGs from them.
    Example)
      - /dev/sda, /dev/sdb, /dev/sdc, /dev/sdd, /dev/sde, /dev/sdf as PVs
      - vg0 contains 3 PVs, /dev/sda, /dev/sdb, /dev/sdc
      - vg1 contains 3 PVs, /dev/sdd, /dev/sde, /dev/sdf
 2. Create a mirror LV on each VG and activate it.
      # lvcreate -L 12M -n lv0 -m 1 vg0
      # lvcreate -L 12M -n lv1 -m 1 vg1
 3. Issue I/Os to the mirror LVs and continue that.
      # while true; do
      > dd if=/dev/zero of=/dev/mapper/vg0-lv0 bs=512 count=1 >& /dev/null
      > dd if=/dev/zero of=/dev/mapper/vg1-lv1 bs=512 count=1 >& /dev/null
      > done
 4. Disconnect one of PVs used for the log device of one of the mirror LVs.
    Example) If /dev/sdc is used for the log device of the vg0-lv0:
      # echo offline > /sys/block/sdc/device/state
 5. Check if I/Os to the vg1-lv1 are processed.
      # iostat 1


Actual results:
I/Os to the vg1-lv1 are not processed.


Expected results:
I/Os to the vg1-lv1 are processed, because all PVs for the vg1-lv1
are fine.


Additional info:
This problem seems to be in kmirrord.
kmirrord is blocked in disk_flush() if update of the log fails.
Back trace of kmirrord are attached below.

-----------------------------------------------------------------------
crash> bt 2115
PID: 2115   TASK: 101aff8a030       CPU: 3   COMMAND: "kmirrord"
 #0 [101ac01bb58] schedule at ffffffff80304a85
 #1 [101ac01bc30] wait_for_completion at ffffffff80304cbd
 #2 [101ac01bc90] dm_table_event at ffffffffa00ea343
 #3 [101ac01bcb0] disk_flush at ffffffffa01019ce
 #4 [101ac01bcd0] do_work at ffffffffa0102ce5
 #5 [101ac01bd10] move_tasks at ffffffff8013257f
 #6 [101ac01bda0] thread_return at ffffffff80304add
 #7 [101ac01be70] worker_thread at ffffffff80146e1e
 #8 [101ac01bf20] kthread at ffffffff8014aa93
 #9 [101ac01bf50] kernel_thread at ffffffff80110e17
crash>
-----------------------------------------------------------------------

Comment 1 Kiyoshi Ueda 2006-03-23 21:11:17 UTC
Additional info:
I'd like to say this is kernel issue, not dmeventd issue.
To reproduce the kernel issue, the following setting is needed
before Step 1 of the reproduction steps.

  0. Modify /etc/lvm/lvm.conf not to launch the dmeventd like below.
        dmeventd {
            mirror_library = "none"
        }

If this step isn't done, dmeventd may handle the log device failure.

Comment 2 Jonathan Earl Brassow 2006-03-23 21:57:36 UTC
w/o changes I've been working on, log failures are not handled by the userspace code.

Comment 5 Jason Baron 2006-05-09 17:07:33 UTC
committed in stream U4 build 34.26. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 8 Red Hat Bugzilla 2006-08-10 22:45:58 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0575.html