Description of problem: If a log device fails, *ALL* mirror devices stall. (The "ALL" includes other mirror devices which doesn't use the log device.) Version-Release number of selected component: kernel-2.6.9-34.EL How reproducible: Always Steps to Reproduce: 1. Prepare some PVs (more than 5) and create 2 VGs from them. Example) - /dev/sda, /dev/sdb, /dev/sdc, /dev/sdd, /dev/sde, /dev/sdf as PVs - vg0 contains 3 PVs, /dev/sda, /dev/sdb, /dev/sdc - vg1 contains 3 PVs, /dev/sdd, /dev/sde, /dev/sdf 2. Create a mirror LV on each VG and activate it. # lvcreate -L 12M -n lv0 -m 1 vg0 # lvcreate -L 12M -n lv1 -m 1 vg1 3. Issue I/Os to the mirror LVs and continue that. # while true; do > dd if=/dev/zero of=/dev/mapper/vg0-lv0 bs=512 count=1 >& /dev/null > dd if=/dev/zero of=/dev/mapper/vg1-lv1 bs=512 count=1 >& /dev/null > done 4. Disconnect one of PVs used for the log device of one of the mirror LVs. Example) If /dev/sdc is used for the log device of the vg0-lv0: # echo offline > /sys/block/sdc/device/state 5. Check if I/Os to the vg1-lv1 are processed. # iostat 1 Actual results: I/Os to the vg1-lv1 are not processed. Expected results: I/Os to the vg1-lv1 are processed, because all PVs for the vg1-lv1 are fine. Additional info: This problem seems to be in kmirrord. kmirrord is blocked in disk_flush() if update of the log fails. Back trace of kmirrord are attached below. ----------------------------------------------------------------------- crash> bt 2115 PID: 2115 TASK: 101aff8a030 CPU: 3 COMMAND: "kmirrord" #0 [101ac01bb58] schedule at ffffffff80304a85 #1 [101ac01bc30] wait_for_completion at ffffffff80304cbd #2 [101ac01bc90] dm_table_event at ffffffffa00ea343 #3 [101ac01bcb0] disk_flush at ffffffffa01019ce #4 [101ac01bcd0] do_work at ffffffffa0102ce5 #5 [101ac01bd10] move_tasks at ffffffff8013257f #6 [101ac01bda0] thread_return at ffffffff80304add #7 [101ac01be70] worker_thread at ffffffff80146e1e #8 [101ac01bf20] kthread at ffffffff8014aa93 #9 [101ac01bf50] kernel_thread at ffffffff80110e17 crash> -----------------------------------------------------------------------
Additional info: I'd like to say this is kernel issue, not dmeventd issue. To reproduce the kernel issue, the following setting is needed before Step 1 of the reproduction steps. 0. Modify /etc/lvm/lvm.conf not to launch the dmeventd like below. dmeventd { mirror_library = "none" } If this step isn't done, dmeventd may handle the log device failure.
w/o changes I've been working on, log failures are not handled by the userspace code.
committed in stream U4 build 34.26. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0575.html