Bug 185754 - [RHEL4 U3] kernel dm mirror: unrelated mirror devices stall if any log device fails
[RHEL4 U3] kernel dm mirror: unrelated mirror devices stall if any log device...
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
All Linux
medium Severity medium
: ---
: ---
Assigned To: Alasdair Kergon
Depends On:
Blocks: 181409 186476
  Show dependency treegraph
Reported: 2006-03-17 11:52 EST by Kiyoshi Ueda
Modified: 2013-04-02 19:51 EDT (History)
10 users (show)

See Also:
Fixed In Version: RHSA-2006-0575
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2006-08-10 18:45:58 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Kiyoshi Ueda 2006-03-17 11:52:36 EST
Description of problem:
If a log device fails, *ALL* mirror devices stall.
(The "ALL" includes other mirror devices which doesn't use the
 log device.)

Version-Release number of selected component:

How reproducible:

Steps to Reproduce:
 1. Prepare some PVs (more than 5) and create 2 VGs from them.
      - /dev/sda, /dev/sdb, /dev/sdc, /dev/sdd, /dev/sde, /dev/sdf as PVs
      - vg0 contains 3 PVs, /dev/sda, /dev/sdb, /dev/sdc
      - vg1 contains 3 PVs, /dev/sdd, /dev/sde, /dev/sdf
 2. Create a mirror LV on each VG and activate it.
      # lvcreate -L 12M -n lv0 -m 1 vg0
      # lvcreate -L 12M -n lv1 -m 1 vg1
 3. Issue I/Os to the mirror LVs and continue that.
      # while true; do
      > dd if=/dev/zero of=/dev/mapper/vg0-lv0 bs=512 count=1 >& /dev/null
      > dd if=/dev/zero of=/dev/mapper/vg1-lv1 bs=512 count=1 >& /dev/null
      > done
 4. Disconnect one of PVs used for the log device of one of the mirror LVs.
    Example) If /dev/sdc is used for the log device of the vg0-lv0:
      # echo offline > /sys/block/sdc/device/state
 5. Check if I/Os to the vg1-lv1 are processed.
      # iostat 1

Actual results:
I/Os to the vg1-lv1 are not processed.

Expected results:
I/Os to the vg1-lv1 are processed, because all PVs for the vg1-lv1
are fine.

Additional info:
This problem seems to be in kmirrord.
kmirrord is blocked in disk_flush() if update of the log fails.
Back trace of kmirrord are attached below.

crash> bt 2115
PID: 2115   TASK: 101aff8a030       CPU: 3   COMMAND: "kmirrord"
 #0 [101ac01bb58] schedule at ffffffff80304a85
 #1 [101ac01bc30] wait_for_completion at ffffffff80304cbd
 #2 [101ac01bc90] dm_table_event at ffffffffa00ea343
 #3 [101ac01bcb0] disk_flush at ffffffffa01019ce
 #4 [101ac01bcd0] do_work at ffffffffa0102ce5
 #5 [101ac01bd10] move_tasks at ffffffff8013257f
 #6 [101ac01bda0] thread_return at ffffffff80304add
 #7 [101ac01be70] worker_thread at ffffffff80146e1e
 #8 [101ac01bf20] kthread at ffffffff8014aa93
 #9 [101ac01bf50] kernel_thread at ffffffff80110e17
Comment 1 Kiyoshi Ueda 2006-03-23 16:11:17 EST
Additional info:
I'd like to say this is kernel issue, not dmeventd issue.
To reproduce the kernel issue, the following setting is needed
before Step 1 of the reproduction steps.

  0. Modify /etc/lvm/lvm.conf not to launch the dmeventd like below.
        dmeventd {
            mirror_library = "none"

If this step isn't done, dmeventd may handle the log device failure.
Comment 2 Jonathan Earl Brassow 2006-03-23 16:57:36 EST
w/o changes I've been working on, log failures are not handled by the userspace code.
Comment 5 Jason Baron 2006-05-09 13:07:33 EDT
committed in stream U4 build 34.26. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/
Comment 8 Red Hat Bugzilla 2006-08-10 18:45:58 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.