Bug 185754 - [RHEL4 U3] kernel dm mirror: unrelated mirror devices stall if any log device fails
Summary: [RHEL4 U3] kernel dm mirror: unrelated mirror devices stall if any log device...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Alasdair Kergon
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 181409 186476
TreeView+ depends on / blocked
 
Reported: 2006-03-17 16:52 UTC by Kiyoshi Ueda
Modified: 2013-04-02 23:51 UTC (History)
10 users (show)

Fixed In Version: RHSA-2006-0575
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-08-10 22:45:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2006:0575 0 normal SHIPPED_LIVE Important: Updated kernel packages available for Red Hat Enterprise Linux 4 Update 4 2006-08-10 04:00:00 UTC

Description Kiyoshi Ueda 2006-03-17 16:52:36 UTC
Description of problem:
If a log device fails, *ALL* mirror devices stall.
(The "ALL" includes other mirror devices which doesn't use the
 log device.)


Version-Release number of selected component:
kernel-2.6.9-34.EL


How reproducible:
Always


Steps to Reproduce:
 1. Prepare some PVs (more than 5) and create 2 VGs from them.
    Example)
      - /dev/sda, /dev/sdb, /dev/sdc, /dev/sdd, /dev/sde, /dev/sdf as PVs
      - vg0 contains 3 PVs, /dev/sda, /dev/sdb, /dev/sdc
      - vg1 contains 3 PVs, /dev/sdd, /dev/sde, /dev/sdf
 2. Create a mirror LV on each VG and activate it.
      # lvcreate -L 12M -n lv0 -m 1 vg0
      # lvcreate -L 12M -n lv1 -m 1 vg1
 3. Issue I/Os to the mirror LVs and continue that.
      # while true; do
      > dd if=/dev/zero of=/dev/mapper/vg0-lv0 bs=512 count=1 >& /dev/null
      > dd if=/dev/zero of=/dev/mapper/vg1-lv1 bs=512 count=1 >& /dev/null
      > done
 4. Disconnect one of PVs used for the log device of one of the mirror LVs.
    Example) If /dev/sdc is used for the log device of the vg0-lv0:
      # echo offline > /sys/block/sdc/device/state
 5. Check if I/Os to the vg1-lv1 are processed.
      # iostat 1


Actual results:
I/Os to the vg1-lv1 are not processed.


Expected results:
I/Os to the vg1-lv1 are processed, because all PVs for the vg1-lv1
are fine.


Additional info:
This problem seems to be in kmirrord.
kmirrord is blocked in disk_flush() if update of the log fails.
Back trace of kmirrord are attached below.

-----------------------------------------------------------------------
crash> bt 2115
PID: 2115   TASK: 101aff8a030       CPU: 3   COMMAND: "kmirrord"
 #0 [101ac01bb58] schedule at ffffffff80304a85
 #1 [101ac01bc30] wait_for_completion at ffffffff80304cbd
 #2 [101ac01bc90] dm_table_event at ffffffffa00ea343
 #3 [101ac01bcb0] disk_flush at ffffffffa01019ce
 #4 [101ac01bcd0] do_work at ffffffffa0102ce5
 #5 [101ac01bd10] move_tasks at ffffffff8013257f
 #6 [101ac01bda0] thread_return at ffffffff80304add
 #7 [101ac01be70] worker_thread at ffffffff80146e1e
 #8 [101ac01bf20] kthread at ffffffff8014aa93
 #9 [101ac01bf50] kernel_thread at ffffffff80110e17
crash>
-----------------------------------------------------------------------

Comment 1 Kiyoshi Ueda 2006-03-23 21:11:17 UTC
Additional info:
I'd like to say this is kernel issue, not dmeventd issue.
To reproduce the kernel issue, the following setting is needed
before Step 1 of the reproduction steps.

  0. Modify /etc/lvm/lvm.conf not to launch the dmeventd like below.
        dmeventd {
            mirror_library = "none"
        }

If this step isn't done, dmeventd may handle the log device failure.

Comment 2 Jonathan Earl Brassow 2006-03-23 21:57:36 UTC
w/o changes I've been working on, log failures are not handled by the userspace code.

Comment 5 Jason Baron 2006-05-09 17:07:33 UTC
committed in stream U4 build 34.26. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 8 Red Hat Bugzilla 2006-08-10 22:45:58 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0575.html



Note You need to log in before you can comment on or make changes to this bug.