Description of problem: Scenario: Kill secondary leg of non synced 4 leg mirror(s) ********* Mirror hash info for this scenario ********* * names: nonsyncd_secondary_4legs_1 nonsyncd_secondary_4legs_2 nonsyncd_secondary_4legs_3 * sync: 0 * disklog: /dev/sde1 * failpv(s): /dev/sdd1 * failnode(s): taft-01 taft-02 taft-03 taft-04 * leg devices: /dev/sdh1 /dev/sdd1 /dev/sdg1 /dev/sdb1 * leg fault policy: allocate * log fault policy: allocate ****************************************************** Creating mirror(s) on taft-04... taft-04: lvcreate -m 3 -n nonsyncd_secondary_4legs_1 -L 600M helter_skelter /dev/sdh1:0-1000 /dev/sdd1:0-1000 /dev/sdg1:0-1000 /dev/sdb1:0-1000 /dev/sde1:0-150 taft-04: lvcreate -m 3 -n nonsyncd_secondary_4legs_2 -L 600M helter_skelter /dev/sdh1:0-1000 /dev/sdd1:0-1000 /dev/sdg1:0-1000 /dev/sdb1:0-1000 /dev/sde1:0-150 taft-04: lvcreate -m 3 -n nonsyncd_secondary_4legs_3 -L 600M helter_skelter /dev/sdh1:0-1000 /dev/sdd1:0-1000 /dev/sdg1:0-1000 /dev/sdb1:0-1000 /dev/sde1:0-150 Continuing on without fully syncd mirrors, currently at... ( 49.17% 40.08% 36.75% ) Creating gfs on top of mirror(s) on taft-01... Mounting mirrored gfs filesystems on taft-01... Mounting mirrored gfs filesystems on taft-02... Mounting mirrored gfs filesystems on taft-03... Mounting mirrored gfs filesystems on taft-04... Writing verification files (checkit) to mirror(s) on... ---- taft-01 ---- ---- taft-02 ---- ---- taft-03 ---- ---- taft-04 ---- Sleeping 10 seconds to get some outsanding GFS I/O locks before the failure Verifying files (checkit) on mirror(s) on... ---- taft-01 ---- ---- taft-02 ---- ---- taft-03 ---- ---- taft-04 ---- Disabling device sdd on taft-01 Disabling device sdd on taft-02 Disabling device sdd on taft-03 Disabling device sdd on taft-04 Attempting I/O to cause mirror down conversion(s) on taft-01 10+0 records in 10+0 records out 41943040 bytes (42 MB) copied, 0.209265 seconds, 200 MB/s Verifying current sanity of lvm after the failure /dev/sdd1: open failed: No such device or address Verifying FAILED device /dev/sdd1 is *NOT* in the volume(s) /dev/sdd1: open failed: No such device or address Verifying LOG device /dev/sde1 *IS* in the mirror(s) /dev/sdd1: open failed: No such device or address Verifying LEG device /dev/sdh1 *IS* in the volume(s) /dev/sdd1: open failed: No such device or address Verifying LEG device /dev/sdg1 *IS* in the volume(s) /dev/sdd1: open failed: No such device or address Verifying LEG device /dev/sdb1 *IS* in the volume(s) /dev/sdd1: open failed: No such device or address verify the dm devices associated with /dev/sdd1 have been removed as expected Checking REMOVAL of nonsyncd_secondary_4legs_1_mimage_1 on: taft-01 taft-02 taft-03 taft-04 Checking REMOVAL of nonsyncd_secondary_4legs_2_mimage_1 on: taft-01 taft-02 taft-03 taft-04 Checking REMOVAL of nonsyncd_secondary_4legs_3_mimage_1 on: taft-01 taft-02 taft-03 taft-04 verify the newly allocated dm devices were added as a result of the failures Checking EXISTENCE of nonsyncd_secondary_4legs_1_mimage_4 on: taft-01 taft-02 taft-03 taft-04 Checking EXISTENCE of nonsyncd_secondary_4legs_2_mimage_4 on: taft-01 taft-02 taft-03 taft-04 Checking EXISTENCE of nonsyncd_secondary_4legs_3_mimage_4 on: taft-01 taft-02 taft-03 taft-04 Verify that the mirror image order remains the same after the down conversion Verify that each of the mirror repairs finished successfully repair of mirrored LV nonsyncd_secondary_4legs_1 failed on taft-01 Here's the mirror layouts before the 2nd leg (/dev/sdd1) was failed: nonsyncd_secondary_4legs_1 helter_skelter mwi-ao 600.00M nonsyncd_secondary_4legs_1_mlog 100.00 nonsyncd_secondary_4legs_1_mimage_0(0),nonsyncd_secondary_4legs_1_mimage_1(0),nonsyncd_secondary_4legs_1_mimage_2(0),nonsyncd_secondary_4legs_1_mimage_3(0) [nonsyncd_secondary_4legs_1_mimage_0] helter_skelter iwi-ao 600.00M /dev/sdh1(0) [nonsyncd_secondary_4legs_1_mimage_1] helter_skelter iwi-ao 600.00M /dev/sdd1(0) [nonsyncd_secondary_4legs_1_mimage_2] helter_skelter iwi-ao 600.00M /dev/sdg1(0) [nonsyncd_secondary_4legs_1_mimage_3] helter_skelter iwi-ao 600.00M /dev/sdb1(0) [nonsyncd_secondary_4legs_1_mlog] helter_skelter lwi-ao 4.00M /dev/sde1(0) nonsyncd_secondary_4legs_2 helter_skelter mwi-ao 600.00M nonsyncd_secondary_4legs_2_mlog 100.00 nonsyncd_secondary_4legs_2_mimage_0(0),nonsyncd_secondary_4legs_2_mimage_1(0),nonsyncd_secondary_4legs_2_mimage_2(0),nonsyncd_secondary_4legs_2_mimage_3(0) [nonsyncd_secondary_4legs_2_mimage_0] helter_skelter iwi-ao 600.00M /dev/sdh1(150) [nonsyncd_secondary_4legs_2_mimage_1] helter_skelter iwi-ao 600.00M /dev/sdd1(150) [nonsyncd_secondary_4legs_2_mimage_2] helter_skelter iwi-ao 600.00M /dev/sdg1(150) [nonsyncd_secondary_4legs_2_mimage_3] helter_skelter iwi-ao 600.00M /dev/sdb1(150) [nonsyncd_secondary_4legs_2_mlog] helter_skelter lwi-ao 4.00M /dev/sde1(1) nonsyncd_secondary_4legs_3 helter_skelter mwi-ao 600.00M nonsyncd_secondary_4legs_3_mlog 100.00 nonsyncd_secondary_4legs_3_mimage_0(0),nonsyncd_secondary_4legs_3_mimage_1(0),nonsyncd_secondary_4legs_3_mimage_2(0),nonsyncd_secondary_4legs_3_mimage_3(0) [nonsyncd_secondary_4legs_3_mimage_0] helter_skelter iwi-ao 600.00M /dev/sdh1(300) [nonsyncd_secondary_4legs_3_mimage_1] helter_skelter iwi-ao 600.00M /dev/sdd1(300) [nonsyncd_secondary_4legs_3_mimage_2] helter_skelter iwi-ao 600.00M /dev/sdg1(300) [nonsyncd_secondary_4legs_3_mimage_3] helter_skelter iwi-ao 600.00M /dev/sdb1(300) [nonsyncd_secondary_4legs_3_mlog] helter_skelter lwi-ao 4.00M /dev/sde1(2) Here's the mirror layouts after the failure: nonsyncd_secondary_4legs_1 helter_skelter mwi-ao 600.00M nonsyncd_secondary_4legs_1_mlog 100.00 nonsyncd_secondary_4legs_1_mimage_0(0),nonsyncd_secondary_4legs_1_mimage_2(0),nonsyncd_secondary_4legs_1_mimage_3(0),nonsyncd_secondary_4legs_1_mimage_4(0) [nonsyncd_secondary_4legs_1_mimage_0] helter_skelter iwi-ao 600.00M /dev/sdh1(0) [nonsyncd_secondary_4legs_1_mimage_2] helter_skelter iwi-ao 600.00M /dev/sdg1(0) [nonsyncd_secondary_4legs_1_mimage_3] helter_skelter iwi-ao 600.00M /dev/sdb1(0) [nonsyncd_secondary_4legs_1_mimage_4] helter_skelter iwi-ao 600.00M /dev/sdf1(0) [nonsyncd_secondary_4legs_1_mlog] helter_skelter lwi-ao 4.00M /dev/sde1(0) nonsyncd_secondary_4legs_2 helter_skelter mwi-ao 600.00M nonsyncd_secondary_4legs_2_mlog 100.00 nonsyncd_secondary_4legs_2_mimage_0(0),nonsyncd_secondary_4legs_2_mimage_2(0),nonsyncd_secondary_4legs_2_mimage_3(0),nonsyncd_secondary_4legs_2_mimage_4(0) [nonsyncd_secondary_4legs_2_mimage_0] helter_skelter iwi-ao 600.00M /dev/sdh1(150) [nonsyncd_secondary_4legs_2_mimage_2] helter_skelter iwi-ao 600.00M /dev/sdg1(150) [nonsyncd_secondary_4legs_2_mimage_3] helter_skelter iwi-ao 600.00M /dev/sdb1(150) [nonsyncd_secondary_4legs_2_mimage_4] helter_skelter iwi-ao 600.00M /dev/sdc1(0) [nonsyncd_secondary_4legs_2_mlog] helter_skelter lwi-ao 4.00M /dev/sde1(1) nonsyncd_secondary_4legs_3 helter_skelter mwi-ao 600.00M nonsyncd_secondary_4legs_3_mlog 100.00 nonsyncd_secondary_4legs_3_mimage_0(0),nonsyncd_secondary_4legs_3_mimage_2(0),nonsyncd_secondary_4legs_3_mimage_3(0),nonsyncd_secondary_4legs_3_mimage_4(0) [nonsyncd_secondary_4legs_3_mimage_0] helter_skelter iwi-ao 600.00M /dev/sdh1(300) [nonsyncd_secondary_4legs_3_mimage_2] helter_skelter iwi-ao 600.00M /dev/sdg1(300) [nonsyncd_secondary_4legs_3_mimage_3] helter_skelter iwi-ao 600.00M /dev/sdb1(300) [nonsyncd_secondary_4legs_3_mimage_4] helter_skelter iwi-ao 600.00M /dev/sdf1(150) [nonsyncd_secondary_4legs_3_mlog] helter_skelter lwi-ao 4.00M /dev/sde1(2) [root@taft-01 ~]# grep Repair /var/log/messages Mar 17 09:12:40 taft-01 lvm[7656]: Repair of mirrored LV helter_skelter/nonsyncd_secondary_4legs_1 failed. Mar 17 09:12:52 taft-01 lvm[7656]: Repair of mirrored LV helter_skelter/nonsyncd_secondary_4legs_2 failed. Mar 17 09:13:31 taft-01 lvm[7656]: Repair of mirrored LV helter_skelter/nonsyncd_secondary_4legs_3 failed. [root@taft-01 ~]# grep dm_task_run /var/log/messages Mar 17 09:12:52 taft-01 lvm[7656]: dm_task_run failed, errno = 24, Too many open files Version-Release number of selected component (if applicable): 2.6.18-190.el5 lvm2-2.02.56-8.el5 BUILT: Fri Feb 12 02:40:43 CST 2010 lvm2-cluster-2.02.56-7.el5 BUILT: Mon Feb 8 10:24:29 CST 2010 device-mapper-1.02.39-1.el5 BUILT: Wed Nov 11 12:31:44 CST 2009 cmirror-1.1.39-8.el5 BUILT: Wed Mar 3 09:31:58 CST 2010 kmod-cmirror-0.1.22-3.el5 BUILT: Tue Dec 22 13:39:47 CST 2009
Created attachment 400818 [details] log from taft-01
Created attachment 400819 [details] log from taft-02
Created attachment 400820 [details] log from taft-03
Created attachment 400821 [details] log from taft-04
I reproduced this without failing the secondary leg. taft-01: Mar 18 13:26:57 taft-01 lvm[8732]: Repair of mirrored LV helter_skelter/syncd_multiple_legs_4legs_2 failed. Mar 18 13:32:19 taft-01 lvm[8732]: Repair of mirrored LV helter_skelter/syncd_multiple_legs_4legs_3 finished successfully. taft-02: Mar 18 13:29:14 taft-02 lvm[8725]: Repair of mirrored LV helter_skelter/syncd_multiple_legs_4legs_1 finished successfully. Mar 18 13:30:17 taft-02 lvm[8725]: Repair of mirrored LV helter_skelter/syncd_multiple_legs_4legs_3 finished successfully. taft-03: Mar 18 13:27:46 taft-03 lvm[8751]: Repair of mirrored LV helter_skelter/syncd_multiple_legs_4legs_1 finished successfully. Mar 18 13:31:06 taft-03 lvm[8751]: Repair of mirrored LV helter_skelter/syncd_multiple_legs_4legs_2 finished successfully. Mar 18 13:31:07 taft-03 lvm[8751]: Repair of mirrored LV helter_skelter/syncd_multiple_legs_4legs_3 finished successfully. taft-04: Mar 18 13:28:24 taft-04 lvm[8721]: Repair of mirrored LV helter_skelter/syncd_multiple_legs_4legs_1 finished successfully. Mar 18 13:29:48 taft-04 lvm[8721]: Repair of mirrored LV helter_skelter/syncd_multiple_legs_4legs_2 finished successfully. Mar 18 13:31:36 taft-04 lvm[8721]: Repair of mirrored LV helter_skelter/syncd_multiple_legs_4legs_3 finished successfully.
This issue can cause the allocate fault policy to not work on some of the mirrors being failed. syncd_secondary_core_4legs_1 helter_skelter mwi-ao 600.00M 100.00 syncd_secondary_core_4legs_1_mimage_0(0),syncd_secondary_core_4legs_1_mimage_2(0),syncd_secondary_core_4legs_1_mimage_3(0) [syncd_secondary_core_4legs_1_mimage_0] helter_skelter iwi-ao 600.00M /dev/sdd1(0) syncd_secondary_core_4legs_1_mimage_1 helter_skelter -wi--- 600.00M unknown device(0) [syncd_secondary_core_4legs_1_mimage_2] helter_skelter iwi-ao 600.00M /dev/sde1(0) [syncd_secondary_core_4legs_1_mimage_3] helter_skelter iwi-ao 600.00M /dev/sdf1(0)
Still hitting this: Mar 27 03:18:44 taft-04 lvm[7403]: Repair of mirrored LV helter_skelter/syncd_secondary_core_4legs_2 failed.
Repo'ed while attempting to hit bug 588441. May 4 13:34:37 taft-01 lvm[7445]: Repair of mirrored LV helter_skelter/syncd_secondary_core_4legs_3 failed.
Well, I can't say much about this without seeing what happens in dmeventd/lvconvert --repair. For that, I need dmeventd logging support, but this was rejected upstream. Until dmeventd logging is merged and we can reproduce with logging enabled, I can do little but guess...
Corey, could you please check whether this is fixed by the same change as bug 588441? I.e. can you try updating the filter in lvm.conf and see if the bug is still reproducible with that? In case it is, debug logs from dmevend would be quite helpful. (Presumably, you can obtain these the same way as in 588441.) Thanks!
I'm unable to reproduce this issue with the lastest rpms. I'll close this bug and then reopen if seen again. lvm2-2.02.56-12.el5 BUILT: Mon Jun 7 05:40:35 CDT 2010 lvm2-cluster-2.02.56-7.el5 BUILT: Mon Feb 8 10:24:29 CST 2010 device-mapper-1.02.39-2.el5 BUILT: Thu Apr 22 04:43:28 CDT 2010 cmirror-1.1.39-8.el5 BUILT: Wed Mar 3 09:31:58 CST 2010 kmod-cmirror-0.1.22-3.el5 BUILT: Tue Dec 22 13:39:47 CST 2009