Description of problem:
If a log device fails, *ALL* mirror devices stall.
(The "ALL" includes other mirror devices which doesn't use the
log device.)
Version-Release number of selected component:
kernel-2.6.9-34.EL
How reproducible:
Always
Steps to Reproduce:
1. Prepare some PVs (more than 5) and create 2 VGs from them.
Example)
- /dev/sda, /dev/sdb, /dev/sdc, /dev/sdd, /dev/sde, /dev/sdf as PVs
- vg0 contains 3 PVs, /dev/sda, /dev/sdb, /dev/sdc
- vg1 contains 3 PVs, /dev/sdd, /dev/sde, /dev/sdf
2. Create a mirror LV on each VG and activate it.
# lvcreate -L 12M -n lv0 -m 1 vg0
# lvcreate -L 12M -n lv1 -m 1 vg1
3. Issue I/Os to the mirror LVs and continue that.
# while true; do
> dd if=/dev/zero of=/dev/mapper/vg0-lv0 bs=512 count=1 >& /dev/null
> dd if=/dev/zero of=/dev/mapper/vg1-lv1 bs=512 count=1 >& /dev/null
> done
4. Disconnect one of PVs used for the log device of one of the mirror LVs.
Example) If /dev/sdc is used for the log device of the vg0-lv0:
# echo offline > /sys/block/sdc/device/state
5. Check if I/Os to the vg1-lv1 are processed.
# iostat 1
Actual results:
I/Os to the vg1-lv1 are not processed.
Expected results:
I/Os to the vg1-lv1 are processed, because all PVs for the vg1-lv1
are fine.
Additional info:
This problem seems to be in kmirrord.
kmirrord is blocked in disk_flush() if update of the log fails.
Back trace of kmirrord are attached below.
-----------------------------------------------------------------------
crash> bt 2115
PID: 2115 TASK: 101aff8a030 CPU: 3 COMMAND: "kmirrord"
#0 [101ac01bb58] schedule at ffffffff80304a85
#1 [101ac01bc30] wait_for_completion at ffffffff80304cbd
#2 [101ac01bc90] dm_table_event at ffffffffa00ea343
#3 [101ac01bcb0] disk_flush at ffffffffa01019ce
#4 [101ac01bcd0] do_work at ffffffffa0102ce5
#5 [101ac01bd10] move_tasks at ffffffff8013257f
#6 [101ac01bda0] thread_return at ffffffff80304add
#7 [101ac01be70] worker_thread at ffffffff80146e1e
#8 [101ac01bf20] kthread at ffffffff8014aa93
#9 [101ac01bf50] kernel_thread at ffffffff80110e17
crash>
-----------------------------------------------------------------------
Additional info:
I'd like to say this is kernel issue, not dmeventd issue.
To reproduce the kernel issue, the following setting is needed
before Step 1 of the reproduction steps.
0. Modify /etc/lvm/lvm.conf not to launch the dmeventd like below.
dmeventd {
mirror_library = "none"
}
If this step isn't done, dmeventd may handle the log device failure.
Comment 2Jonathan Earl Brassow
2006-03-23 21:57:36 UTC
w/o changes I've been working on, log failures are not handled by the userspace code.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.
http://rhn.redhat.com/errata/RHSA-2006-0575.html
Description of problem: If a log device fails, *ALL* mirror devices stall. (The "ALL" includes other mirror devices which doesn't use the log device.) Version-Release number of selected component: kernel-2.6.9-34.EL How reproducible: Always Steps to Reproduce: 1. Prepare some PVs (more than 5) and create 2 VGs from them. Example) - /dev/sda, /dev/sdb, /dev/sdc, /dev/sdd, /dev/sde, /dev/sdf as PVs - vg0 contains 3 PVs, /dev/sda, /dev/sdb, /dev/sdc - vg1 contains 3 PVs, /dev/sdd, /dev/sde, /dev/sdf 2. Create a mirror LV on each VG and activate it. # lvcreate -L 12M -n lv0 -m 1 vg0 # lvcreate -L 12M -n lv1 -m 1 vg1 3. Issue I/Os to the mirror LVs and continue that. # while true; do > dd if=/dev/zero of=/dev/mapper/vg0-lv0 bs=512 count=1 >& /dev/null > dd if=/dev/zero of=/dev/mapper/vg1-lv1 bs=512 count=1 >& /dev/null > done 4. Disconnect one of PVs used for the log device of one of the mirror LVs. Example) If /dev/sdc is used for the log device of the vg0-lv0: # echo offline > /sys/block/sdc/device/state 5. Check if I/Os to the vg1-lv1 are processed. # iostat 1 Actual results: I/Os to the vg1-lv1 are not processed. Expected results: I/Os to the vg1-lv1 are processed, because all PVs for the vg1-lv1 are fine. Additional info: This problem seems to be in kmirrord. kmirrord is blocked in disk_flush() if update of the log fails. Back trace of kmirrord are attached below. ----------------------------------------------------------------------- crash> bt 2115 PID: 2115 TASK: 101aff8a030 CPU: 3 COMMAND: "kmirrord" #0 [101ac01bb58] schedule at ffffffff80304a85 #1 [101ac01bc30] wait_for_completion at ffffffff80304cbd #2 [101ac01bc90] dm_table_event at ffffffffa00ea343 #3 [101ac01bcb0] disk_flush at ffffffffa01019ce #4 [101ac01bcd0] do_work at ffffffffa0102ce5 #5 [101ac01bd10] move_tasks at ffffffff8013257f #6 [101ac01bda0] thread_return at ffffffff80304add #7 [101ac01be70] worker_thread at ffffffff80146e1e #8 [101ac01bf20] kthread at ffffffff8014aa93 #9 [101ac01bf50] kernel_thread at ffffffff80110e17 crash> -----------------------------------------------------------------------