Bug 568138 - lvconvert --repair does not deal with transient errors
Summary: lvconvert --repair does not deal with transient errors
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: lvm2
Version: 5.5
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Petr Rockai
QA Contact: Cluster QE
URL:
Whiteboard:
: 568139 (view as bug list)
Depends On:
Blocks: 614758
TreeView+ depends on / blocked
 
Reported: 2010-02-24 21:10 UTC by Takahiro Yasui
Modified: 2014-07-25 05:08 UTC (History)
18 users (show)

Fixed In Version: lvm2-2.02.73-1.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-01-13 22:40:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0052 0 normal SHIPPED_LIVE lvm2 bug fix and enhancement update 2011-01-12 17:15:25 UTC

Description Takahiro Yasui 2010-02-24 21:10:10 UTC
Description of problem:
  lvm mirror: I/O hungup at medium error

Version-Release number of selected component (if applicable):
  2.6.18-187.el5

How reproducible:
  See the following steps.

Steps to Reproduce:
  This issue is related to an failure of phyical device, but we can
  reproduce a similar situation.

 1. create two way mirror
    # dmsetup ls
    vg00-lv00_mimage_1 (253, 2)
    vg00-lv00_mimage_0 (253, 1)
    vg00-lv00_mlog (253, 0)
    vg00-lv00 (253, 3)

  2. disable one of the mirror leg
    # echo offline > /sys/block/<dev>/device/state

  3. kill dmeventd
    # ps -ef | grep dmeventd
    root      3378     1  0 11:49 ?        00:00:00 [dmeventd]
    # kill -9 3378

  4. write I/O to the mirror device
    # dd if=/dev/zero of=/dev/vg00/lv00 bs=4096 count=1
    *** This command doesn't return. ***

  5. check a status of the mirror device
    # dmsetup status
    vg00-lv00_mimage_1: 0 24576 linear
    vg00-lv00_mimage_0: 0 24576 linear
    vg00-lv00_mlog: 0 8192 linear
    vg00-lv00: 0 24576 mirror 2 253:1 253:2 23/24 1 DA 3 disk 253:0 A

  6. enable the mirror leg which was disabled in step 2
    # echo online > /sys/block/<dev>/device/state

  7. Execute lvconvert command on behalf of dmeventd
    # lvconvert --config devices{ignore_suspended_devices=1} --repair \
       --use-policies vg00/lv00
    The mirror is consistent. Nothing to repair.

  8. Check a status of the mirror device again
    # dmsetup status
    vg00-lv00_mimage_1: 0 24576 linear
    vg00-lv00_mimage_0: 0 24576 linear
    vg00-lv00_mlog: 0 8192 linear
    vg00-lv00: 0 24576 mirror 2 253:1 253:2 23/24 1 DA 3 disk 253:0 A
    *** Error hasn't been recovered ***

Actual results:
  An error of mirror device is not recovered by dmeventd (lvconvert)
  and I/Os issued by application hungup and never finish.

Expected results:
  dmeventd (lvconvert) removes the failed mirror device and I/Os
  issued by applicaion completes.

Additional info:
  - I/O hungup doesn't happen on the 5.4 kernel (2.6.18-164.el5)
    (Introduced by the patch fixsing bug 502927)
  - This issue was reported on dm-devel in November, 2009
    https://www.redhat.com/archives/dm-devel/2009-November/msg00288.html

Comment 1 Takahiro Yasui 2010-02-24 21:13:09 UTC
This bug is related to bug 568139.

Comment 3 Milan Broz 2010-02-26 09:33:48 UTC
*** Bug 568139 has been marked as a duplicate of this bug. ***

Comment 5 Milan Broz 2010-02-26 12:12:30 UTC
ok, my understanding is that

1) medium/transient error happens (some sector on data area)

2) kernel dm-mirror marks device with 'D' status, sends event and waits for userspace to handle problem

3) device reappears _or_ the lvm metadata area is still readable without error

4) dmeventd parses status, identifies failed device and run lvconvert

5) but because lvconvert see all devices and metadata on them, no action happens and mirror is left in frozen state

Comment 6 Takahiro Yasui 2010-02-26 12:30:40 UTC
(In reply to comment #5)
Thank you for describing the sequence. Yes, it is correct. lvm command can't handle medium/transient error because it can check only lvm metadata area.

I'm thinking how to fix, but the only way I have is to remove/replace the failed device *forcibly* if lvconvert failed to recover and 'D' state remained.

Comment 7 Jonathan Earl Brassow 2010-02-26 15:33:11 UTC
what is really needed here is a suspend/resume to kick things off again.

Comment 8 Milan Broz 2010-02-26 15:40:13 UTC
I am able to reproduce this on -164 kernel too, removing regression keyword.
It is for sure bug, but not regression by its definition.

(And yes, suspend/resume helps.)

Comment 9 Jonathan Earl Brassow 2010-02-26 16:29:39 UTC
ok, so we could either recheck the 'dmsetup status' from dmeventd and if the 'lvconvert --repair' was successful and the status line still shows problems, we could issue another LVM command to "refresh" the LV.

It may be simpler to assume that 'lvconvert --repair' was called for a reason, and even though nothing has changed, a suspend/resume is in order.  Then, dmeventd would not need to perform the extra work.  The suspend/resume would have to affect the entire cluster.

Comment 10 Mikuláš Patočka 2010-02-26 16:41:06 UTC
Suspend/resume fixes the hung I/O but it still leaves the mirror device in a state when it thinks that the leg is failed. You need to reload completely.

Comment 11 Jonathan Earl Brassow 2010-02-26 18:16:06 UTC
do you think we should reset those when the device is resumed (would require a kernel patch)?

Comment 12 Mikuláš Patočka 2010-02-26 19:13:57 UTC
Resetting the error flags would break snapshots of mirrors. They suspend the origin too and we don't want to forget errors in this case.

Comment 13 Takahiro Yasui 2010-02-26 22:20:37 UTC
(In reply to comment #8)
> I am able to reproduce this on -164 kernel too, removing regression keyword.
> It is for sure bug, but not regression by its definition.

Could you really reproduced this issue? There is no mechanism holding bio in -164 and I/O successfully completes, so I/Os (step 4) aren't blocked in -164. This issue is "regression." Let me add the keyword, "regression," again.

Comment 14 Takahiro Yasui 2010-02-26 22:24:20 UTC
(In reply to comment #11)
> do you think we should reset those when the device is resumed (would require a
> kernel patch)?    

"reset" would solve "transient" error but won't solve medium error. In case of medium error, I/Os will be blocked again and again and the issue will never be 
solved.

we should remove/replace the device marked 'D' *forcibly* according to configuration policy, shouldn't we?

Comment 15 Milan Broz 2010-02-27 08:45:05 UTC
Well, my script simulates transient log device fail, and there is even no 'D' status in the end, just waiting proceses.

#!/bin/bash -x

FAILED=sdf

pvcreate /dev/sd[cdef]
vgcreate -c n vg_bar /dev/sd[cdef]

lvcreate -m 2 -n lv1 -L 600M vg_bar /dev/sd[cdef]

echo "offline">/sys/block/$FAILED/device/state

dmsetup status
{ sleep 1 ; dd if=/dev/zero of=/dev/vg_bar/lv1 bs=512 count=1; } &
dmsetup status
sleep 2
echo "running">/sys/block/$FAILED/device/state
dmsetup status

vgs

lvremove -f vg_bar/lv1
vgremove -ff vg_bar
pvremove /dev/sd[cdef]

Produces:
# /mirror_transient
+ FAILED=sdf
+ pvcreate /dev/sdc /dev/sdd /dev/sde /dev/sdf
  Physical volume "/dev/sdc" successfully created
  Physical volume "/dev/sdd" successfully created
  Physical volume "/dev/sde" successfully created
  Physical volume "/dev/sdf" successfully created
+ vgcreate -c n vg_bar /dev/sdc /dev/sdd /dev/sde /dev/sdf
  Non-clustered volume group "vg_bar" successfully created
+ lvcreate -m 2 -n lv1 -L 600M vg_bar /dev/sdc /dev/sdd /dev/sde /dev/sdf
  Logical volume "lv1" created
+ echo offline
+ dmsetup status
vg_bar-lv1_mlog: 0 8192 linear
vg_bar-lv1_mimage_2: 0 1228800 linear
vg_bar-lv1_mimage_1: 0 1228800 linear
vg_bar-lv1: 0 1228800 mirror 3 253:3 253:4 253:5 176/1200 1 AAA 3 disk 253:2 A
vg_bar-lv1_mimage_0: 0 1228800 linear
+ sleep 1
+ dmsetup status
vg_bar-lv1_mlog: 0 8192 linear
vg_bar-lv1_mimage_2: 0 1228800 linear
vg_bar-lv1_mimage_1: 0 1228800 linear
vg_bar-lv1: 0 1228800 mirror 3 253:3 253:4 253:5 176/1200 1 AAA 3 disk 253:2 A
vg_bar-lv1_mimage_0: 0 1228800 linear
+ sleep 2
+ dd if=/dev/zero of=/dev/vg_bar/lv1 bs=512 count=1
+ echo running
+ dmsetup status
vg_bar-lv1_mlog: 0 8192 linear
vg_bar-lv1_mimage_2: 0 1228800 linear
vg_bar-lv1_mimage_1: 0 1228800 linear
vg_bar-lv1: 0 1228800 mirror 3 253:3 253:4 253:5 223/1200 1 AAA 3 disk 253:2 A
vg_bar-lv1_mimage_0: 0 1228800 linear
+ vgs
<HANG>

syslog:
Feb 27 09:35:07 bar-01 dmeventd[3622]: dmeventd ready for processing.
Feb 27 09:35:12 bar-01 lvm[3622]: Monitoring mirror device vg_bar-lv1 for events
Feb 27 09:35:12 bar-01 kernel: sd 1:0:0:3: rejecting I/O to offline device      
Feb 27 09:35:12 bar-01 lvm[3622]: Log device, 253:2, has failed.                
Feb 27 09:35:12 bar-01 lvm[3622]: Device failure in vg_bar-lv1                  
Feb 27 09:35:12 bar-01 kernel: sd 1:0:0:3: rejecting I/O to offline device      
Feb 27 09:35:15 bar-01 last message repeated 52 times                           
Feb 27 09:35:29 bar-01 lvm[3622]: Repair of mirrored LV vg_bar/lv1 finished successfully.

ps ax:
 3598 pts/0    S+     0:00 /bin/bash -x /mirror_transient
 3609 ?        S<     0:00 [kmirrord]
 3610 ?        S<     0:00 [kcopyd]
 3622 ?        S<Lsl   0:00 [dmeventd]
 3626 pts/0    S+     0:00 /bin/bash -x /mirror_transient
 3630 pts/0    D+     0:00 dd if /dev/zero of /dev/vg_bar/lv1 bs 512 count 1
 3633 pts/0    D+     0:00 vgs

# uname -a
Linux bar-01 2.6.18-164.11.1.el5 #1 SMP Wed Jan 6 13:26:31 EST 2010 i686 i686 i386 GNU/Linux

Everything continues after

# dmsetup suspend vg_bar-lv1
# dmsetup resume vg_bar-lv1

I see this still like old bug not regression, just now it is worse with mirror images probably?

Can anyone reproduce that too?

Comment 16 Takahiro Yasui 2010-02-27 14:46:55 UTC
(In reply to comment #15)
> I see this still like old bug not regression, just now it is worse with mirror
> images probably?

Thank you for the explanation. I/Os are blocked if a log device got broken, but it isn't the point of *regression* here. It might be easier to think core log.

In 5.4, transient/medium errors aren't recovered, but systems can keep running with a device marked failed if there is still valid mirror legs. It is an expected behavior for mirroring.

On the other hand, in 5.5 beta1, application which issued I/Os is stuck and system can't keep running. This happens because dm-raid1 was updated so that I/Os are blocked until a failure is recovered.

But yes, as you mentioned, I compared behaviors in the case that mirror images failed.

> Can anyone reproduce that too?

I'm trying but I couldn't reproduce it. I continue trying, but please tell me if there is some configuration required.

Comment 17 Takahiro Yasui 2010-02-27 22:31:29 UTC
> # uname -a
> Linux bar-01 2.6.18-164.11.1.el5 #1 SMP Wed Jan 6 13:26:31 EST 2010 i686 i686
> i386 GNU/Linux

I was very surprised to see the big difference between 2.6.18-164.11.1.el5 kernel and 2.6.18-187.el5 kernel. Is the 2.6.18-164.11.1.el5 kernel a regular kernel?

> + vgcreate -c n vg_bar /dev/sdc /dev/sdd /dev/sde /dev/sdf
>   Non-clustered volume group "vg_bar" successfully created

Your lvm2 is likely newer than one in RHEL5.4. lvm2-2.02.46-8.el5 doesn't show such a message. Is it a lvm2 in RHEL5.5 beta1?

Comment 18 Milan Broz 2010-02-28 09:30:51 UTC
(In reply to comment #17)
> I was very surprised to see the big difference between 2.6.18-164.11.1.el5
> kernel and 2.6.18-187.el5 kernel. Is the 2.6.18-164.11.1.el5 kernel a regular
> kernel?

This is latest EUS 5.4.z kernel from RHN.

> Your lvm2 is likely newer than one in RHEL5.4. lvm2-2.02.46-8.el5 doesn't show
> such a message. Is it a lvm2 in RHEL5.5 beta1?    
>   Non-clustered volume group "vg_bar" successfully created

Of course I am using beta1 packages. But this message appears only when you have cluster locking switched on.
(We added that to code because clustered VG bit is by default set when using cluster locking and many peple were confused when they created clustered VG later.)

Anyway, if you run your testcase on -164 kernel, what is the dmsetup status after that testcase finished? I guess there still must be 'D' for mirror image because lvconvert did not changed anything. (So the mirror was not repaired even on -164 kernel, just application did not hang in IO, am I correct?)

Comment 19 Takahiro Yasui 2010-02-28 15:24:56 UTC
(In reply to comment #18)
> Anyway, if you run your testcase on -164 kernel, what is the dmsetup status
> after that testcase finished? I guess there still must be 'D' for mirror image
> because lvconvert did not changed anything. (So the mirror was not repaired
> even on -164 kernel, just application did not hang in IO, am I correct?)    

Yes, you are correct. When I run my testcase in -164 kernel,

  - the mirror was not repaired
  - apllication did not hang in IO
  - the status of the mirror image was 'D'

The mirror of -164 kernel can process IO if there is some valid mirror legs.
(It can't process in case of log error and "mirrored" log is very important feature to enhance mirror's availability.)

Comment 20 Takahiro Yasui 2010-02-28 22:16:43 UTC
(In reply to comment #15)
> Well, my script simulates transient log device fail, and there is even no 'D'
> status in the end, just waiting proceses.

In RHEL5 kernel (-164, 176), log status can be restored if log device processed IO successfully. (see dm-log.c:disk_flush()) In your log, I guess that the status of log device was showed after it was restored. On the other hand, in upstream kernel (2.6.33), the log status is never restored and remains in the failed state 'D'.

The behavior you encounter in RHEL5 kernel is better to be fixed because it is very confusing. Users will also get confused.

Comment 23 Takahiro Yasui 2010-03-04 19:24:04 UTC
(In reply to comment #20)
I opened bug 570583 for mismatch between a log state and its behavior.

Comment 24 Petr Rockai 2010-03-24 16:27:39 UTC
I am working on the lvconvert --repair extension needed for this to work. Re-titling bug to reflect what is needed.

Comment 25 Alasdair Kergon 2010-04-16 17:15:52 UTC
How close is this to solved?

Comment 26 Petr Rockai 2010-05-19 14:35:42 UTC
We are now quite close to merging the required patch upstream. Hopefully today, a couple days at most.

Comment 27 Petr Rockai 2010-06-02 14:36:52 UTC
Has been merged.

Comment 30 Milan Broz 2010-08-30 10:39:41 UTC
Fix in lvm2-2.02.73-1.el5.

Comment 32 Corey Marthaler 2010-11-09 21:03:55 UTC
I ran the test mentioned in comment #15 and it passed, however there were issues with failing to lock the failed vg which may have caused one of the repair attempts to fail. I'll file another bug for that issue and mark this one verified.


2.6.18-227.el5
lvm2-2.02.74-2.el5    BUILT: Tue Nov  9 08:03:06 CST 2010
lvm2-cluster-2.02.74-3.el5    BUILT: Tue Nov  9 08:01:59 CST 2010
device-mapper-1.02.55-2.el5    BUILT: Tue Nov  9 06:41:00 CST 2010
cmirror-1.1.39-10.el5    BUILT: Wed Sep  8 16:32:05 CDT 2010
kmod-cmirror-0.1.22-3.el5    BUILT: Tue Dec 22 13:39:47 CST 2009



+ FAILED=sdf
+ pvcreate /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1
  Physical volume "/dev/sdc1" successfully created
  Physical volume "/dev/sdd1" successfully created
  Physical volume "/dev/sde1" successfully created
  Physical volume "/dev/sdf1" successfully created
+ vgcreate -c n vg_bar /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1
  Volume group "vg_bar" successfully created
+ lvcreate -m 2 -n lv1 -L 600M vg_bar /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1
  Logical volume "lv1" created
+ lvs -a -o +devices
  LV             VG         Attr   LSize   Log      Copy%  Devices
  lv1            vg_bar     mwi-a- 600.00M lv1_mlog  1.33  lv1_mimage_0(0),lv1_mimage_1(0),lv1_mimage_2(0)
  [lv1_mimage_0] vg_bar     Iwi-ao 600.00M                 /dev/sdc1(0)
  [lv1_mimage_1] vg_bar     Iwi-ao 600.00M                 /dev/sdd1(0)
  [lv1_mimage_2] vg_bar     Iwi-ao 600.00M                 /dev/sde1(0)
  [lv1_mlog]     vg_bar     lwi-ao   4.00M                 /dev/sdf1(0)
+ echo offline
+ dmsetup status
vg_bar-lv1_mlog: 0 8192 linear
vg_bar-lv1_mimage_2: 0 1228800 linear
vg_bar-lv1_mimage_1: 0 1228800 linear
vg_bar-lv1: 0 1228800 mirror 3 253:3 253:4 253:5 20/1200 1 AAA 3 disk 253:2 A
vg_bar-lv1_mimage_0: 0 1228800 linear
+ sleep 1
+ dmsetup status
vg_bar-lv1_mlog: 0 8192 linear
vg_bar-lv1_mimage_2: 0 1228800 linear
vg_bar-lv1_mimage_1: 0 1228800 linear
vg_bar-lv1: 0 1228800 mirror 3 253:3 253:4 253:5 20/1200 1 AAA 3 disk 253:2 A
vg_bar-lv1_mimage_0: 0 1228800 linear
+ sleep 2
+ dd if=/dev/zero of=/dev/vg_bar/lv1 bs=512 count=1
+ echo running
+ dmsetup status
vg_bar-lv1_mlog: 0 8192 linear
vg_bar-lv1_mimage_2: 0 1228800 linear
vg_bar-lv1_mimage_1: 0 1228800 linear
vg_bar-lv1: 0 1228800 mirror 3 253:3 253:4 253:5 141/1200 1 AAA 3 disk 253:2 D
vg_bar-lv1_mimage_0: 0 1228800 linear
+ lvs -a -o +devices
1+0 records in
1+0 records out
512 bytes (512 B) copied, 17.6062 seconds, 0.0 kB/s
  LV             VG         Attr   LSize   Log      Copy%  Devices
  lv1            vg_bar     mwi-a- 600.00M lv1_mlog 100.00 lv1_mimage_0(0),lv1_mimage_1(0),lv1_mimage_2(0)
  [lv1_mimage_0] vg_bar     iwi-ao 600.00M                 /dev/sdc1(0)
  [lv1_mimage_1] vg_bar     iwi-ao 600.00M                 /dev/sdd1(0)
  [lv1_mimage_2] vg_bar     iwi-ao 600.00M                 /dev/sde1(0)
  [lv1_mlog]     vg_bar     lwi-ao   4.00M                 /dev/sdf1(0)
+ vgs
  VG         #PV #LV #SN Attr   VSize   VFree
  VolGroup00   1   2   0 wz--n-  68.12G      0
  vg_bar       4   1   0 wz--n- 542.66G 540.89G
+ lvremove -f vg_bar/lv1
  Logical volume "lv1" successfully removed
+ vgremove -ff vg_bar
  Volume group "vg_bar" successfully removed
+ pvremove /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1
  Labels on physical volume "/dev/sdc1" successfully wiped
  Can't open /dev/sdd1 exclusively - not removing. Mounted filesystem?
  Can't open /dev/sde1 exclusively - not removing. Mounted filesystem?
  Labels on physical volume "/dev/sdf1" successfully wiped


repair log:
lvm[8287]: Mirror log status: 1 of 1 images failed - switching to core
lvm[8287]: Inconsistent pre-commit metadata copies for volume group vg_bar
lvm[8287]: Inconsistent pre-commit metadata copies for volume group vg_bar
lvm[8287]: Volume group for uuid not found: lyvMWjSlDSHcStrES2dAYkYjdZzhQN70PLdinAeC7D1aBRMc7dvd4CCxnaw57eeo
lvm[8287]: Failed to lock lv1
lvm[8287]: Repair of mirrored LV vg_bar/lv1 failed.
lvm[8287]: Failed to remove faulty devices in vg_bar-lv1.
lvm[8287]: Log device 253:2 has failed (D).
lvm[8287]: Device failure in vg_bar-lv1.
lvm[8287]: Mirror log status: 1 of 1 images failed - switching to core
lvm[8287]: Monitoring mirror device vg_bar-lv1 for events.
lvm[8287]: Another thread is handling an event. Waiting...
lvm[8287]: Monitoring mirror device vg_bar-lv1 for events.
lvm[8287]: Another thread is handling an event. Waiting...
lvm[8287]: Trying to up-convert to 3 images, 1 logs.
lvm[8287]: Monitoring mirror device vg_bar-lv1 for events.
lvm[8287]: Another thread is handling an event. Waiting...
lvm[8287]: Repair of mirrored LV vg_bar/lv1 finished successfully.
lvm[8287]: vg_bar-lv1 is now in-sync.

Comment 33 Corey Marthaler 2010-11-09 21:13:53 UTC
bug 651590 was filed for the issue listed above.

Comment 35 errata-xmlrpc 2011-01-13 22:40:37 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0052.html


Note You need to log in before you can comment on or make changes to this bug.