Bug 574448 - mirror repair failed after leg device(s) were failed in three 4-way mirrors
Summary: mirror repair failed after leg device(s) were failed in three 4-way mirrors
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: lvm2
Version: 5.5
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Petr Rockai
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-03-17 14:56 UTC by Corey Marthaler
Modified: 2010-11-09 12:59 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-07-16 22:19:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
log from taft-01 (23.92 KB, text/plain)
2010-03-17 16:25 UTC, Corey Marthaler
no flags Details
log from taft-02 (21.41 KB, text/plain)
2010-03-17 16:25 UTC, Corey Marthaler
no flags Details
log from taft-03 (21.11 KB, text/plain)
2010-03-17 16:28 UTC, Corey Marthaler
no flags Details
log from taft-04 (35.04 KB, text/plain)
2010-03-17 16:29 UTC, Corey Marthaler
no flags Details

Description Corey Marthaler 2010-03-17 14:56:22 UTC
Description of problem:
Scenario: Kill secondary leg of non synced 4 leg mirror(s)                      

********* Mirror hash info for this scenario *********
* names:              nonsyncd_secondary_4legs_1 nonsyncd_secondary_4legs_2 nonsyncd_secondary_4legs_3
* sync:               0
* disklog:            /dev/sde1
* failpv(s):          /dev/sdd1
* failnode(s):        taft-01 taft-02 taft-03 taft-04
* leg devices:        /dev/sdh1 /dev/sdd1 /dev/sdg1 /dev/sdb1
* leg fault policy:   allocate
* log fault policy:   allocate
******************************************************

Creating mirror(s) on taft-04...
taft-04: lvcreate -m 3 -n nonsyncd_secondary_4legs_1 -L 600M helter_skelter /dev/sdh1:0-1000 /dev/sdd1:0-1000 /dev/sdg1:0-1000 /dev/sdb1:0-1000 /dev/sde1:0-150                                                                                                                                                                                                                                    
taft-04: lvcreate -m 3 -n nonsyncd_secondary_4legs_2 -L 600M helter_skelter /dev/sdh1:0-1000 /dev/sdd1:0-1000 /dev/sdg1:0-1000 /dev/sdb1:0-1000 /dev/sde1:0-150                                                                                                                                                                                                                                    
taft-04: lvcreate -m 3 -n nonsyncd_secondary_4legs_3 -L 600M helter_skelter /dev/sdh1:0-1000 /dev/sdd1:0-1000 /dev/sdg1:0-1000 /dev/sdb1:0-1000 /dev/sde1:0-150                                                                                                                                                                                                                                    
Continuing on without fully syncd mirrors, currently at...                                                                       
        ( 49.17% 40.08% 36.75% )                                                                                                 

Creating gfs on top of mirror(s) on taft-01...
Mounting mirrored gfs filesystems on taft-01...
Mounting mirrored gfs filesystems on taft-02...
Mounting mirrored gfs filesystems on taft-03...
Mounting mirrored gfs filesystems on taft-04...

Writing verification files (checkit) to mirror(s) on...
        ---- taft-01 ----                              
        ---- taft-02 ----                              
        ---- taft-03 ----                              
        ---- taft-04 ----                              

Sleeping 10 seconds to get some outsanding GFS I/O locks before the failure
Verifying files (checkit) on mirror(s) on...
        ---- taft-01 ----
        ---- taft-02 ----
        ---- taft-03 ----
        ---- taft-04 ----

Disabling device sdd on taft-01
Disabling device sdd on taft-02
Disabling device sdd on taft-03
Disabling device sdd on taft-04

Attempting I/O to cause mirror down conversion(s) on taft-01
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.209265 seconds, 200 MB/s
Verifying current sanity of lvm after the failure
  /dev/sdd1: open failed: No such device or address
Verifying FAILED device /dev/sdd1 is *NOT* in the volume(s)
  /dev/sdd1: open failed: No such device or address
Verifying LOG device /dev/sde1 *IS* in the mirror(s)
  /dev/sdd1: open failed: No such device or address
Verifying LEG device /dev/sdh1 *IS* in the volume(s)
  /dev/sdd1: open failed: No such device or address
Verifying LEG device /dev/sdg1 *IS* in the volume(s)
  /dev/sdd1: open failed: No such device or address
Verifying LEG device /dev/sdb1 *IS* in the volume(s)
  /dev/sdd1: open failed: No such device or address
verify the dm devices associated with /dev/sdd1 have been removed as expected
Checking REMOVAL of nonsyncd_secondary_4legs_1_mimage_1 on:  taft-01 taft-02 taft-03 taft-04
Checking REMOVAL of nonsyncd_secondary_4legs_2_mimage_1 on:  taft-01 taft-02 taft-03 taft-04
Checking REMOVAL of nonsyncd_secondary_4legs_3_mimage_1 on:  taft-01 taft-02 taft-03 taft-04
verify the newly allocated dm devices were added as a result of the failures
Checking EXISTENCE of nonsyncd_secondary_4legs_1_mimage_4 on:  taft-01 taft-02 taft-03 taft-04
Checking EXISTENCE of nonsyncd_secondary_4legs_2_mimage_4 on:  taft-01 taft-02 taft-03 taft-04
Checking EXISTENCE of nonsyncd_secondary_4legs_3_mimage_4 on:  taft-01 taft-02 taft-03 taft-04

Verify that the mirror image order remains the same after the down conversion
Verify that each of the mirror repairs finished successfully
repair of mirrored LV nonsyncd_secondary_4legs_1 failed on taft-01


Here's the mirror layouts before the 2nd leg (/dev/sdd1) was failed:
nonsyncd_secondary_4legs_1            helter_skelter mwi-ao 600.00M   nonsyncd_secondary_4legs_1_mlog 100.00  nonsyncd_secondary_4legs_1_mimage_0(0),nonsyncd_secondary_4legs_1_mimage_1(0),nonsyncd_secondary_4legs_1_mimage_2(0),nonsyncd_secondary_4legs_1_mimage_3(0)
[nonsyncd_secondary_4legs_1_mimage_0] helter_skelter iwi-ao 600.00M                                           /dev/sdh1(0)
[nonsyncd_secondary_4legs_1_mimage_1] helter_skelter iwi-ao 600.00M                                           /dev/sdd1(0)
[nonsyncd_secondary_4legs_1_mimage_2] helter_skelter iwi-ao 600.00M                                           /dev/sdg1(0)
[nonsyncd_secondary_4legs_1_mimage_3] helter_skelter iwi-ao 600.00M                                           /dev/sdb1(0)
[nonsyncd_secondary_4legs_1_mlog]     helter_skelter lwi-ao   4.00M                                           /dev/sde1(0)

nonsyncd_secondary_4legs_2            helter_skelter mwi-ao 600.00M   nonsyncd_secondary_4legs_2_mlog 100.00  nonsyncd_secondary_4legs_2_mimage_0(0),nonsyncd_secondary_4legs_2_mimage_1(0),nonsyncd_secondary_4legs_2_mimage_2(0),nonsyncd_secondary_4legs_2_mimage_3(0)
[nonsyncd_secondary_4legs_2_mimage_0] helter_skelter iwi-ao 600.00M                                           /dev/sdh1(150)
[nonsyncd_secondary_4legs_2_mimage_1] helter_skelter iwi-ao 600.00M                                           /dev/sdd1(150)
[nonsyncd_secondary_4legs_2_mimage_2] helter_skelter iwi-ao 600.00M                                           /dev/sdg1(150)
[nonsyncd_secondary_4legs_2_mimage_3] helter_skelter iwi-ao 600.00M                                           /dev/sdb1(150)
[nonsyncd_secondary_4legs_2_mlog]     helter_skelter lwi-ao   4.00M                                           /dev/sde1(1)

nonsyncd_secondary_4legs_3            helter_skelter mwi-ao 600.00M   nonsyncd_secondary_4legs_3_mlog 100.00  nonsyncd_secondary_4legs_3_mimage_0(0),nonsyncd_secondary_4legs_3_mimage_1(0),nonsyncd_secondary_4legs_3_mimage_2(0),nonsyncd_secondary_4legs_3_mimage_3(0)
[nonsyncd_secondary_4legs_3_mimage_0] helter_skelter iwi-ao 600.00M                                           /dev/sdh1(300)
[nonsyncd_secondary_4legs_3_mimage_1] helter_skelter iwi-ao 600.00M                                           /dev/sdd1(300)
[nonsyncd_secondary_4legs_3_mimage_2] helter_skelter iwi-ao 600.00M                                           /dev/sdg1(300)
[nonsyncd_secondary_4legs_3_mimage_3] helter_skelter iwi-ao 600.00M                                           /dev/sdb1(300)
[nonsyncd_secondary_4legs_3_mlog]     helter_skelter lwi-ao   4.00M                                           /dev/sde1(2)


Here's the mirror layouts after the failure:
nonsyncd_secondary_4legs_1            helter_skelter mwi-ao 600.00M  nonsyncd_secondary_4legs_1_mlog 100.00  nonsyncd_secondary_4legs_1_mimage_0(0),nonsyncd_secondary_4legs_1_mimage_2(0),nonsyncd_secondary_4legs_1_mimage_3(0),nonsyncd_secondary_4legs_1_mimage_4(0)
[nonsyncd_secondary_4legs_1_mimage_0] helter_skelter iwi-ao 600.00M                                          /dev/sdh1(0)
[nonsyncd_secondary_4legs_1_mimage_2] helter_skelter iwi-ao 600.00M                                          /dev/sdg1(0)
[nonsyncd_secondary_4legs_1_mimage_3] helter_skelter iwi-ao 600.00M                                          /dev/sdb1(0)
[nonsyncd_secondary_4legs_1_mimage_4] helter_skelter iwi-ao 600.00M                                          /dev/sdf1(0)
[nonsyncd_secondary_4legs_1_mlog]     helter_skelter lwi-ao   4.00M                                          /dev/sde1(0)

nonsyncd_secondary_4legs_2            helter_skelter mwi-ao 600.00M  nonsyncd_secondary_4legs_2_mlog 100.00  nonsyncd_secondary_4legs_2_mimage_0(0),nonsyncd_secondary_4legs_2_mimage_2(0),nonsyncd_secondary_4legs_2_mimage_3(0),nonsyncd_secondary_4legs_2_mimage_4(0)
[nonsyncd_secondary_4legs_2_mimage_0] helter_skelter iwi-ao 600.00M                                          /dev/sdh1(150)
[nonsyncd_secondary_4legs_2_mimage_2] helter_skelter iwi-ao 600.00M                                          /dev/sdg1(150)
[nonsyncd_secondary_4legs_2_mimage_3] helter_skelter iwi-ao 600.00M                                          /dev/sdb1(150)
[nonsyncd_secondary_4legs_2_mimage_4] helter_skelter iwi-ao 600.00M                                          /dev/sdc1(0)
[nonsyncd_secondary_4legs_2_mlog]     helter_skelter lwi-ao   4.00M                                          /dev/sde1(1)

nonsyncd_secondary_4legs_3            helter_skelter mwi-ao 600.00M  nonsyncd_secondary_4legs_3_mlog 100.00  nonsyncd_secondary_4legs_3_mimage_0(0),nonsyncd_secondary_4legs_3_mimage_2(0),nonsyncd_secondary_4legs_3_mimage_3(0),nonsyncd_secondary_4legs_3_mimage_4(0)
[nonsyncd_secondary_4legs_3_mimage_0] helter_skelter iwi-ao 600.00M                                          /dev/sdh1(300)
[nonsyncd_secondary_4legs_3_mimage_2] helter_skelter iwi-ao 600.00M                                          /dev/sdg1(300)
[nonsyncd_secondary_4legs_3_mimage_3] helter_skelter iwi-ao 600.00M                                          /dev/sdb1(300)
[nonsyncd_secondary_4legs_3_mimage_4] helter_skelter iwi-ao 600.00M                                          /dev/sdf1(150)
[nonsyncd_secondary_4legs_3_mlog]     helter_skelter lwi-ao   4.00M                                          /dev/sde1(2)


[root@taft-01 ~]# grep Repair /var/log/messages 
Mar 17 09:12:40 taft-01 lvm[7656]: Repair of mirrored LV helter_skelter/nonsyncd_secondary_4legs_1 failed.
Mar 17 09:12:52 taft-01 lvm[7656]: Repair of mirrored LV helter_skelter/nonsyncd_secondary_4legs_2 failed.
Mar 17 09:13:31 taft-01 lvm[7656]: Repair of mirrored LV helter_skelter/nonsyncd_secondary_4legs_3 failed.

[root@taft-01 ~]# grep dm_task_run /var/log/messages 
Mar 17 09:12:52 taft-01 lvm[7656]: dm_task_run failed, errno = 24, Too many open files


Version-Release number of selected component (if applicable):
2.6.18-190.el5

lvm2-2.02.56-8.el5    BUILT: Fri Feb 12 02:40:43 CST 2010
lvm2-cluster-2.02.56-7.el5    BUILT: Mon Feb  8 10:24:29 CST 2010
device-mapper-1.02.39-1.el5    BUILT: Wed Nov 11 12:31:44 CST 2009
cmirror-1.1.39-8.el5    BUILT: Wed Mar  3 09:31:58 CST 2010
kmod-cmirror-0.1.22-3.el5    BUILT: Tue Dec 22 13:39:47 CST 2009

Comment 1 Corey Marthaler 2010-03-17 16:25:09 UTC
Created attachment 400818 [details]
log from taft-01

Comment 2 Corey Marthaler 2010-03-17 16:25:41 UTC
Created attachment 400819 [details]
log from taft-02

Comment 3 Corey Marthaler 2010-03-17 16:28:43 UTC
Created attachment 400820 [details]
log from taft-03

Comment 4 Corey Marthaler 2010-03-17 16:29:16 UTC
Created attachment 400821 [details]
log from taft-04

Comment 5 Corey Marthaler 2010-03-18 18:54:17 UTC
I reproduced this without failing the secondary leg.

taft-01:
Mar 18 13:26:57 taft-01 lvm[8732]: Repair of mirrored LV helter_skelter/syncd_multiple_legs_4legs_2 failed.
Mar 18 13:32:19 taft-01 lvm[8732]: Repair of mirrored LV helter_skelter/syncd_multiple_legs_4legs_3 finished successfully.

taft-02:
Mar 18 13:29:14 taft-02 lvm[8725]: Repair of mirrored LV helter_skelter/syncd_multiple_legs_4legs_1 finished successfully.
Mar 18 13:30:17 taft-02 lvm[8725]: Repair of mirrored LV helter_skelter/syncd_multiple_legs_4legs_3 finished successfully.

taft-03:
Mar 18 13:27:46 taft-03 lvm[8751]: Repair of mirrored LV helter_skelter/syncd_multiple_legs_4legs_1 finished successfully.
Mar 18 13:31:06 taft-03 lvm[8751]: Repair of mirrored LV helter_skelter/syncd_multiple_legs_4legs_2 finished successfully.
Mar 18 13:31:07 taft-03 lvm[8751]: Repair of mirrored LV helter_skelter/syncd_multiple_legs_4legs_3 finished successfully.

taft-04:
Mar 18 13:28:24 taft-04 lvm[8721]: Repair of mirrored LV helter_skelter/syncd_multiple_legs_4legs_1 finished successfully.
Mar 18 13:29:48 taft-04 lvm[8721]: Repair of mirrored LV helter_skelter/syncd_multiple_legs_4legs_2 finished successfully.
Mar 18 13:31:36 taft-04 lvm[8721]: Repair of mirrored LV helter_skelter/syncd_multiple_legs_4legs_3 finished successfully.

Comment 6 Corey Marthaler 2010-03-18 20:35:24 UTC
This issue can cause the allocate fault policy to not work on some of the mirrors being failed.

  syncd_secondary_core_4legs_1            helter_skelter mwi-ao 600.00M  100.00  syncd_secondary_core_4legs_1_mimage_0(0),syncd_secondary_core_4legs_1_mimage_2(0),syncd_secondary_core_4legs_1_mimage_3(0)
  [syncd_secondary_core_4legs_1_mimage_0] helter_skelter iwi-ao 600.00M          /dev/sdd1(0)
  syncd_secondary_core_4legs_1_mimage_1   helter_skelter -wi--- 600.00M          unknown device(0)
  [syncd_secondary_core_4legs_1_mimage_2] helter_skelter iwi-ao 600.00M          /dev/sde1(0)
  [syncd_secondary_core_4legs_1_mimage_3] helter_skelter iwi-ao 600.00M          /dev/sdf1(0)

Comment 7 Corey Marthaler 2010-03-30 15:34:57 UTC
Still hitting this:

Mar 27 03:18:44 taft-04 lvm[7403]: Repair of mirrored LV helter_skelter/syncd_secondary_core_4legs_2 failed.

Comment 8 Corey Marthaler 2010-05-04 19:33:28 UTC
Repo'ed while attempting to hit bug 588441.

May  4 13:34:37 taft-01 lvm[7445]: Repair of mirrored LV helter_skelter/syncd_secondary_core_4legs_3 failed.

Comment 9 Petr Rockai 2010-05-05 07:54:10 UTC
Well, I can't say much about this without seeing what happens in dmeventd/lvconvert --repair. For that, I need dmeventd logging support, but this was rejected upstream. Until dmeventd logging is merged and we can reproduce with logging enabled, I can do little but guess...

Comment 10 Petr Rockai 2010-05-19 14:42:24 UTC
Corey, could you please check whether this is fixed by the same change as bug 588441? I.e. can you try updating the filter in lvm.conf and see if the bug is still reproducible with that? In case it is, debug logs from dmevend would be quite helpful. (Presumably, you can obtain these the same way as in 588441.) Thanks!

Comment 11 Corey Marthaler 2010-07-16 22:19:38 UTC
I'm unable to reproduce this issue with the lastest rpms. I'll close this bug and then reopen if seen again.

lvm2-2.02.56-12.el5    BUILT: Mon Jun  7 05:40:35 CDT 2010
lvm2-cluster-2.02.56-7.el5    BUILT: Mon Feb  8 10:24:29 CST 2010
device-mapper-1.02.39-2.el5    BUILT: Thu Apr 22 04:43:28 CDT 2010
cmirror-1.1.39-8.el5    BUILT: Wed Mar  3 09:31:58 CST 2010
kmod-cmirror-0.1.22-3.el5    BUILT: Tue Dec 22 13:39:47 CST 2009


Note You need to log in before you can comment on or make changes to this bug.