Description of problem: This appears to be a rhel5 version of BZ 381081/290821. Just like in those bugs, the stuck sync process is causing the test helter_skelter to hang. Unlike those bugs however, one of the three mirror didn't complete it's resyncing process after the device failure. Since this issue has only been reproduced with 3-way mirrors so far, it will be marked as a 5.4 bz, and not 5.3. Scenario: Kill primary leg of synced 3 leg mirror(s) ****** Mirror hash info for this scenario ****** * name: syncd_primary_3legs * sync: 1 * mirrors: 3 * disklog: 1 * failpv: /dev/sdf1 * legs: 3 * pvs: /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sde1 ************************************************ Creating mirror(s) on taft-02... taft-02: lvcreate -m 2 -n syncd_primary_3legs_1 -L 600M helter_skelter /dev/sdf1:0-1000 /dev/sdg1:0-1000 /dev/sdh1:0-1000 /dev/sde1:0-150 taft-02: lvcreate -m 2 -n syncd_primary_3legs_2 -L 600M helter_skelter /dev/sdf1:0-1000 /dev/sdg1:0-1000 /dev/sdh1:0-1000 /dev/sde1:0-150 taft-02: lvcreate -m 2 -n syncd_primary_3legs_3 -L 600M helter_skelter /dev/sdf1:0-1000 /dev/sdg1:0-1000 /dev/sdh1:0-1000 /dev/sde1:0-150 Waiting until all mirrors become fully syncd... 0/3 mirror(s) are fully synced: ( 1=15.58% 2=6.58% 3=0.25% ) 0/3 mirror(s) are fully synced: ( 1=35.83% 2=27.92% 3=19.67% ) 0/3 mirror(s) are fully synced: ( 1=53.75% 2=50.25% 3=36.33% ) 0/3 mirror(s) are fully synced: ( 1=72.50% 2=71.25% 3=53.75% ) 0/3 mirror(s) are fully synced: ( 1=91.83% 2=92.33% 3=71.50% ) 2/3 mirror(s) are fully synced: ( 1=100.00% 2=100.00% 3=98.92% ) 3/3 mirror(s) are fully synced: ( 1=100.00% 2=100.00% 3=100.00% ) Creating gfs on top of mirror(s) on taft-01... Mounting mirrored gfs filesystems on taft-01... Mounting mirrored gfs filesystems on taft-02... Mounting mirrored gfs filesystems on taft-03... Mounting mirrored gfs filesystems on taft-04... Writing verification files (checkit) to mirror(s) on... ---- taft-01 ---- ---- taft-02 ---- ---- taft-03 ---- ---- taft-04 ---- Sleeping 10 seconds to get some outsanding GFS I/O locks before the failure Verifying files (checkit) on mirror(s) on... ---- taft-01 ---- ---- taft-02 ---- ---- taft-03 ---- ---- taft-04 ---- Disabling device sdf on taft-01 Disabling device sdf on taft-02 Disabling device sdf on taft-03 Disabling device sdf on taft-04 Attempting I/O to cause mirror down conversion(s) on taft-02 10+0 records in 10+0 records out 41943040 bytes (42 MB) copied, 0.188472 seconds, 223 MB/s [ HANG ] [root@taft-02 ~]# ps -elf | grep sync 0 D root 30378 30377 0 78 0 - 14724 sync_b 00:36 ? 00:00:00 sync [root@taft-03 ~]# lvs -a -o +devices LV VG Attr LSize Origin Snap% Move Log Copy% Convert Devices syncd_primary_3legs_1 helter_skelter mwi-ao 600.00M syncd_primary_3legs_1_mlog 100.00 syncd_primary_3legs_1_mimage_2(0),syncd_primary_3legs_1_mimage_1(0) [syncd_primary_3legs_1_mimage_1] helter_skelter iwi-ao 600.00M /dev/sdg1(0) [syncd_primary_3legs_1_mimage_2] helter_skelter iwi-ao 600.00M /dev/sdh1(0) [syncd_primary_3legs_1_mlog] helter_skelter lwi-ao 4.00M /dev/sde1(0) syncd_primary_3legs_2 helter_skelter mwi-ao 600.00M syncd_primary_3legs_2_mlog 99.33 syncd_primary_3legs_2_mimage_2(0),syncd_primary_3legs_2_mimage_1(0) [syncd_primary_3legs_2_mimage_1] helter_skelter Iwi-ao 600.00M /dev/sdg1(150) [syncd_primary_3legs_2_mimage_2] helter_skelter Iwi-ao 600.00M /dev/sdh1(150) [syncd_primary_3legs_2_mlog] helter_skelter lwi-ao 4.00M /dev/sde1(1) syncd_primary_3legs_3 helter_skelter mwi-ao 600.00M syncd_primary_3legs_3_mlog 100.00 syncd_primary_3legs_3_mimage_2(0),syncd_primary_3legs_3_mimage_1(0) [syncd_primary_3legs_3_mimage_1] helter_skelter iwi-ao 600.00M /dev/sdg1(300) [syncd_primary_3legs_3_mimage_2] helter_skelter iwi-ao 600.00M /dev/sdh1(300) [syncd_primary_3legs_3_mlog] helter_skelter lwi-ao 4.00M /dev/sde1(2) I'll attach the logs/kern dumps from the 4 nodes... Version-Release number of selected component (if applicable): 2.6.18-98.el5 lvm2-2.02.39-2.el5 BUILT: Wed Jul 9 07:26:29 CDT 2008 lvm2-cluster-2.02.39-1.el5 BUILT: Thu Jul 3 09:31:57 CDT 2008 device-mapper-1.02.27-1.el5 BUILT: Thu Jul 3 03:22:29 CDT 2008 cmirror-1.1.22-1.el5 BUILT: Thu Jul 24 15:59:03 CDT 2008 kmod-cmirror-0.1.13-2.el5 BUILT: Thu Jul 24 16:00:48 CDT 2008
Created attachment 316241 [details] log from taft-01
Created attachment 316242 [details] log from taft-02
Created attachment 316244 [details] log from taft-03
Created attachment 316245 [details] log from taft-04
Critical info in this case is 'dmsetup info; dmsetup info | grep SUS'. This will give us a good indication as to the reason for the deadlock.
I ran helter_skelter all night and was unable to reproduce this issue, plus it hasn't been seen in over 6 months, so closing...