Bug 732098 - '-missing' dm devices continue to exist after mirror leg and log leg failure
Summary: '-missing' dm devices continue to exist after mirror leg and log leg failure
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2
Version: 6.2
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Jonathan Earl Brassow
QA Contact: Corey Marthaler
URL:
Whiteboard:
Depends On:
Blocks: 756082
TreeView+ depends on / blocked
 
Reported: 2011-08-19 19:03 UTC by Corey Marthaler
Modified: 2012-07-13 19:02 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-07-13 19:02:48 UTC


Attachments (Terms of Use)

Description Corey Marthaler 2011-08-19 19:03:18 UTC
Description of problem:
This iteration of device failure included a 2 legged mirror with a redundant mirror log. A few issues appeared this iteration when the primary leg and primary log were failed.  
1. after the repair finished, there are remaining '-missing_' dm devices
2. there are remaining "leg devices" that are not mapped to any physical device
3. the expected mimage numbers after the failure should have remained mimage_0 and mimage_1 if everything had finish properly, instead they became mimage_2 and mimage_3.


Scenario: Kill primary leg and primary log of synced 2 leg redundant log mirror(s)

********* Mirror hash info for this scenario *********
* names:              syncd_pri_leg_pri_log_2legs_2logs_1
* sync:               1
* striped:            0
* leg devices:        /dev/sdd1 /dev/sdg1
* log devices:        /dev/sdc1 /dev/sdf1
* no MDA devices:     
* failpv(s):          /dev/sdd1 /dev/sdc1
* failnode(s):        taft-01
* leg fault policy:   allocate
* log fault policy:   allocate
******************************************************

Creating mirror(s) on taft-01...
taft-01: lvcreate --mirrorlog mirrored -m 1 -n syncd_pri_leg_pri_log_2legs_2logs_1 -L 300M helter_skelter /dev/sdd1:0-1000 /dev/sdg1:0-1000 /dev/sdc1:0-150 /dev/sdf1:0-150

PV=/dev/sdd1
        syncd_pri_leg_pri_log_2legs_2logs_1_mimage_0: 5.1
PV=/dev/sdc1
        syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_0: 1.3
PV=/dev/sdd1
        syncd_pri_leg_pri_log_2legs_2logs_1_mimage_0: 5.1
PV=/dev/sdc1
        syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_0: 1.3

Waiting until all mirrors become fully syncd...
   1/1 mirror(s) are fully synced: ( 100.00% )

Creating ext on top of mirror(s) on taft-01...
mke2fs 1.41.12 (17-May-2010)
Mounting mirrored ext filesystems on taft-01...

Writing verification files (checkit) to mirror(s) on...
        ---- taft-01 ----

Sleeping 10 seconds to get some outsanding EXT I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- taft-01 ----

Disabling device sdd on taft-01
Disabling device sdc on taft-01

Attempting I/O to cause mirror down conversion(s) on taft-01
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.460628 s, 91.1 MB/s
Verifying current sanity of lvm after the failure
Verifying FAILED device /dev/sdd1 is *NOT* in the volume(s)
Verifying FAILED device /dev/sdc1 is *NOT* in the volume(s)
olog: 1
Verifying LOG device(s) /dev/sdf1 *ARE* in the mirror(s)
Verifying LEG device /dev/sdg1 *IS* in the volume(s)
verify the newly allocated dm devices were added as a result of the failures
Checking EXISTENCE of syncd_pri_leg_pri_log_2legs_2logs_1_mimage_0 on:  taft-01
Checking EXISTENCE of syncd_pri_leg_pri_log_2legs_2logs_1_mimage_1 on:  taft-01
Checking EXISTENCE of syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_0 on:  taft-01

Verifying files (checkit) on mirror(s) on...
        ---- taft-01 ----

Enabling device sdd on taft-01
Enabling device sdc on taft-01

Verify that each of the mirror repairs finished successfully

Checking for leftover '-missing_0_0' or 'unknown devices'
there should no longer be any 'missing' dm devices on taft-01




[root@taft-01 ~]# grep lvm\\[ /var/log/messages
Aug 19 12:20:41 taft-01 lvm[2188]: Monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog for events.
Aug 19 12:20:41 taft-01 lvm[2188]: helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog is now in-sync.
Aug 19 12:20:41 taft-01 lvm[2188]: No longer monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog for events.
Aug 19 12:20:41 taft-01 lvm[2188]: Monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog for events.
Aug 19 12:20:41 taft-01 lvm[2188]: Monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1 for events.
Aug 19 12:20:42 taft-01 lvm[2188]: helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog is now in-sync.
Aug 19 12:20:50 taft-01 lvm[2188]: helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1 is now in-sync.
Aug 19 12:21:28 taft-01 lvm[2188]: Primary mirror device 253:3 has failed (D).
Aug 19 12:21:28 taft-01 lvm[2188]: Device failure in helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog.
Aug 19 12:21:28 taft-01 lvm[2188]: /dev/sdc1: read failed after 0 of 512 at 145669554176: Input/output error
Aug 19 12:21:28 taft-01 lvm[2188]: /dev/sdc1: read failed after 0 of 512 at 145669664768: Input/output error
Aug 19 12:21:28 taft-01 lvm[2188]: /dev/sdc1: read failed after 0 of 512 at 0: Input/output error
Aug 19 12:21:28 taft-01 lvm[2188]: /dev/sdc1: read failed after 0 of 512 at 4096: Input/output error
Aug 19 12:21:28 taft-01 lvm[2188]: /dev/sdc1: read failed after 0 of 2048 at 0: Input/output error
Aug 19 12:21:28 taft-01 lvm[2188]: /dev/sdd1: read failed after 0 of 512 at 145669554176: Input/output error
Aug 19 12:21:28 taft-01 lvm[2188]: /dev/sdd1: read failed after 0 of 512 at 145669664768: Input/output error
Aug 19 12:21:28 taft-01 lvm[2188]: /dev/sdd1: read failed after 0 of 512 at 0: Input/output error
Aug 19 12:21:28 taft-01 lvm[2188]: /dev/sdd1: read failed after 0 of 512 at 4096: Input/output error
Aug 19 12:21:28 taft-01 lvm[2188]: /dev/sdd1: read failed after 0 of 2048 at 0: Input/output error
Aug 19 12:21:28 taft-01 lvm[2188]: Couldn't find device with uuid A1ftXq-5gK6-0Adu-RibR-CycP-S6tP-KQDoN8.
Aug 19 12:21:28 taft-01 lvm[2188]: Couldn't find device with uuid Gr10TR-Zrvh-afk6-Jp8q-MdGi-0SJB-jQC28l.
Aug 19 12:21:29 taft-01 lvm[2188]: Another thread is handling an event. Waiting...
Aug 19 12:21:31 taft-01 lvm[2188]: LV helter_skelter/syncd_pri_leg_pri_log_2legs_2logs_1_mimage_0 in use: not deactivating
Aug 19 12:21:31 taft-01 lvm[2188]: Mirror status: 1 of 2 images failed.
Aug 19 12:21:31 taft-01 lvm[2188]: Mirror log status: 1 of 2 images failed.
Aug 19 12:21:31 taft-01 lvm[2188]: Trying to up-convert to 2 images, 2 logs.
Aug 19 12:21:31 taft-01 lvm[2188]: Unable to create LV syncd_pri_leg_pri_log_2legs_2logs_1_mlog in Volume Group helter_skelter: name already in use.
Aug 19 12:21:31 taft-01 lvm[2188]: Failed to create mirror log.
Aug 19 12:21:31 taft-01 lvm[2188]: Trying to up-convert to 2 images, 1 logs.
Aug 19 12:21:31 taft-01 lvm[2188]: Unable to create LV syncd_pri_leg_pri_log_2legs_2logs_1_mlog in Volume Group helter_skelter: name already in use.
Aug 19 12:21:31 taft-01 lvm[2188]: Failed to create mirror log.
Aug 19 12:21:31 taft-01 lvm[2188]: Trying to up-convert to 2 images, 0 logs.
Aug 19 12:21:32 taft-01 lvm[2188]: WARNING: Failed to replace 2 of 2 logs in volume syncd_pri_leg_pri_log_2legs_2logs_1
Aug 19 12:21:32 taft-01 lvm[2188]: helter_skelter/syncd_pri_leg_pri_log_2legs_2logs_1: Converted: 2.7%
Aug 19 12:21:47 taft-01 lvm[2188]: helter_skelter/syncd_pri_leg_pri_log_2legs_2logs_1: Converted: 100.0%
Aug 19 12:22:02 taft-01 lvm[2188]: Repair of mirrored LV helter_skelter/syncd_pri_leg_pri_log_2legs_2logs_1 finished successfully.
Aug 19 12:22:02 taft-01 lvm[2188]: Primary mirror device 253:6 read failed.
Aug 19 12:22:02 taft-01 lvm[2188]: helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1 is now in-sync.
Aug 19 12:22:02 taft-01 lvm[2188]: helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1 is now in-sync.
Aug 19 12:22:04 taft-01 lvm[2188]: No longer monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog for events.



[root@taft-01 ~]# dmsetup ls
helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1      (253, 8)
vg_taft01-lv_home       (253, 2)
helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mimage_3     (253, 11)
helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_1        (253, 4)
vg_taft01-lv_swap       (253, 1)
vg_taft01-lv_root       (253, 0)
helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mimage_2     (253, 10)
helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_0        (253, 3)
helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mimage_1     (253, 7)
helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mimage_0     (253, 6)
helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog (253, 5)
helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_0-missing_0_0    (253, 9)


[root@taft-01 ~]# lvs -a -o +devices
 LV                                                Attr   LSize   Log Copy%  Devices                                           
 syncd_pri_leg_pri_log_2legs_2logs_1               mwi-ao 300.00m     100.00 syncd_pri_leg_pri_log_2legs_2logs_1_mimage_2(0),syncd_pri_leg_pri_log_2legs_2logs_1_mimage_3(0)
 syncd_pri_leg_pri_log_2legs_2logs_1_mimage_0      -wi-a- 300.00m            /dev/sdd1(0)                                           
 syncd_pri_leg_pri_log_2legs_2logs_1_mimage_1      vwi-a- 300.00m                
 [syncd_pri_leg_pri_log_2legs_2logs_1_mimage_2]    iwi-ao 300.00m            /dev/sdg1(0)                                           
 [syncd_pri_leg_pri_log_2legs_2logs_1_mimage_3]    iwi-ao 300.00m            /dev/sdh1(0)                                           
 syncd_pri_leg_pri_log_2legs_2logs_1_mlog          vwi-a-   4.00m                
 syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_0 -wi-a-   4.00m            /dev/sdc1(0)                                           
 syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_1 -wi-a-   4.00m            /dev/sdf1(0)                                           



Version-Release number of selected component (if applicable):
2.6.32-188.el6.x86_64

lvm2-2.02.87-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
lvm2-libs-2.02.87-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
lvm2-cluster-2.02.87-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
udev-147-2.37.el6    BUILT: Wed Aug 10 07:48:15 CDT 2011
device-mapper-1.02.66-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
device-mapper-libs-1.02.66-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
device-mapper-event-1.02.66-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
device-mapper-event-libs-1.02.66-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
cmirror-2.02.87-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011

Comment 1 Jonathan Earl Brassow 2011-10-17 16:36:21 UTC
I have not hit this bug in a weekend's worth of testing (using the helter_skelter test).

I hit a different bug where some of the mirror sub-LVs did not come out of suspension.  This may have blocked me from seeing this bug, but I have tested for a cumulative duration of 72+ hours and not hit this bug.

I'll continue to look for it, but in the absence of hitting it myself, it will have to be verified by the reporter - after fixes for 746254/743112 are in place.  (I am currently testing with the proposed fixes for 746254/743112.)

Marking as NEEDINFO until either:
1) I hit it with my continued helter_skelter testing
or
2) The reporter is able to get new rpms with the aforementioned patches and is able to confirm this bug.

Comment 4 Corey Marthaler 2012-03-21 19:49:53 UTC
Attempting to reproduce this issue is blocked until redundant log mirrors work
again (bug 794904).

Comment 5 Jonathan Earl Brassow 2012-04-18 14:29:04 UTC
bug 794904 has been cleared.

Comment 8 Corey Marthaler 2012-07-13 19:02:48 UTC
I can no longer reproduce this with the latest rpms.

2.6.32-278.el6.x86_64
lvm2-2.02.95-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
lvm2-libs-2.02.95-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
lvm2-cluster-2.02.95-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
udev-147-2.41.el6    BUILT: Thu Mar  1 13:01:08 CST 2012
device-mapper-1.02.74-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
device-mapper-libs-1.02.74-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
device-mapper-event-1.02.74-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
device-mapper-event-libs-1.02.74-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
cmirror-2.02.95-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012


Note You need to log in before you can comment on or make changes to this bug.