Bug 461661 - RHEL5 cmirror tracker: cmirror write path appears deadlocked after recovery is successful
Summary: RHEL5 cmirror tracker: cmirror write path appears deadlocked after recovery ...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cmirror
Version: 5.3
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Jonathan Earl Brassow
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-09-09 19:08 UTC by Corey Marthaler
Modified: 2010-01-12 02:08 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-04-01 14:22:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
log from taft-01 (357.02 KB, text/plain)
2008-09-09 19:19 UTC, Corey Marthaler
no flags Details
log from taft-02 (227.95 KB, text/plain)
2008-09-09 19:19 UTC, Corey Marthaler
no flags Details
log from taft-03 (205.44 KB, text/plain)
2008-09-09 19:19 UTC, Corey Marthaler
no flags Details
log from taft-04 (227.06 KB, text/plain)
2008-09-09 19:20 UTC, Corey Marthaler
no flags Details

Description Corey Marthaler 2008-09-09 19:08:49 UTC
Description of problem:
This appears to be a rhel5 version of BZ 381081/290821. Just like in those bugs, the stuck sync process is causing the test helter_skelter to hang. Unlike those bugs however, one of the three mirror didn't complete it's resyncing process after the device failure.

Since this issue has only been reproduced with 3-way mirrors so far, it will be marked as a 5.4 bz, and not 5.3.


Scenario: Kill primary leg of synced 3 leg mirror(s)

****** Mirror hash info for this scenario ******
* name:      syncd_primary_3legs
* sync:      1
* mirrors:   3
* disklog:   1
* failpv:    /dev/sdf1
* legs:      3
* pvs:       /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sde1
************************************************

Creating mirror(s) on taft-02...
taft-02: lvcreate -m 2 -n syncd_primary_3legs_1 -L 600M helter_skelter /dev/sdf1:0-1000 /dev/sdg1:0-1000 /dev/sdh1:0-1000 /dev/sde1:0-150
taft-02: lvcreate -m 2 -n syncd_primary_3legs_2 -L 600M helter_skelter /dev/sdf1:0-1000 /dev/sdg1:0-1000 /dev/sdh1:0-1000 /dev/sde1:0-150
taft-02: lvcreate -m 2 -n syncd_primary_3legs_3 -L 600M helter_skelter /dev/sdf1:0-1000 /dev/sdg1:0-1000 /dev/sdh1:0-1000 /dev/sde1:0-150

Waiting until all mirrors become fully syncd...
        0/3 mirror(s) are fully synced: ( 1=15.58% 2=6.58% 3=0.25% )
        0/3 mirror(s) are fully synced: ( 1=35.83% 2=27.92% 3=19.67% )
        0/3 mirror(s) are fully synced: ( 1=53.75% 2=50.25% 3=36.33% )
        0/3 mirror(s) are fully synced: ( 1=72.50% 2=71.25% 3=53.75% )
        0/3 mirror(s) are fully synced: ( 1=91.83% 2=92.33% 3=71.50% )
        2/3 mirror(s) are fully synced: ( 1=100.00% 2=100.00% 3=98.92% )
        3/3 mirror(s) are fully synced: ( 1=100.00% 2=100.00% 3=100.00% )

Creating gfs on top of mirror(s) on taft-01...
Mounting mirrored gfs filesystems on taft-01...
Mounting mirrored gfs filesystems on taft-02...
Mounting mirrored gfs filesystems on taft-03...
Mounting mirrored gfs filesystems on taft-04...

Writing verification files (checkit) to mirror(s) on...
        ---- taft-01 ----
        ---- taft-02 ----
        ---- taft-03 ----
        ---- taft-04 ----

Sleeping 10 seconds to get some outsanding GFS I/O locks before the failure
Verifying files (checkit) on mirror(s) on...
        ---- taft-01 ----
        ---- taft-02 ----
        ---- taft-03 ----
        ---- taft-04 ----

Disabling device sdf on taft-01
Disabling device sdf on taft-02
Disabling device sdf on taft-03
Disabling device sdf on taft-04

Attempting I/O to cause mirror down conversion(s) on taft-02
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.188472 seconds, 223 MB/s
[ HANG ]


[root@taft-02 ~]# ps -elf | grep sync
0 D root     30378 30377  0  78   0 - 14724 sync_b 00:36 ?        00:00:00 sync


[root@taft-03 ~]# lvs -a -o +devices
  LV                               VG             Attr   LSize   Origin Snap%  Move Log                        Copy%  Convert Devices                                                            
  syncd_primary_3legs_1            helter_skelter mwi-ao 600.00M                    syncd_primary_3legs_1_mlog 100.00         syncd_primary_3legs_1_mimage_2(0),syncd_primary_3legs_1_mimage_1(0)
  [syncd_primary_3legs_1_mimage_1] helter_skelter iwi-ao 600.00M                                                              /dev/sdg1(0)                                                       
  [syncd_primary_3legs_1_mimage_2] helter_skelter iwi-ao 600.00M                                                              /dev/sdh1(0)                                                       
  [syncd_primary_3legs_1_mlog]     helter_skelter lwi-ao   4.00M                                                              /dev/sde1(0)                                                       
  syncd_primary_3legs_2            helter_skelter mwi-ao 600.00M                    syncd_primary_3legs_2_mlog  99.33         syncd_primary_3legs_2_mimage_2(0),syncd_primary_3legs_2_mimage_1(0)
  [syncd_primary_3legs_2_mimage_1] helter_skelter Iwi-ao 600.00M                                                              /dev/sdg1(150)                                                     
  [syncd_primary_3legs_2_mimage_2] helter_skelter Iwi-ao 600.00M                                                              /dev/sdh1(150)                                                     
  [syncd_primary_3legs_2_mlog]     helter_skelter lwi-ao   4.00M                                                              /dev/sde1(1)                                                       
  syncd_primary_3legs_3            helter_skelter mwi-ao 600.00M                    syncd_primary_3legs_3_mlog 100.00         syncd_primary_3legs_3_mimage_2(0),syncd_primary_3legs_3_mimage_1(0)
  [syncd_primary_3legs_3_mimage_1] helter_skelter iwi-ao 600.00M                                                              /dev/sdg1(300)                                                     
  [syncd_primary_3legs_3_mimage_2] helter_skelter iwi-ao 600.00M                                                              /dev/sdh1(300)                                                     
  [syncd_primary_3legs_3_mlog]     helter_skelter lwi-ao   4.00M                                                              /dev/sde1(2)


I'll attach the logs/kern dumps from the 4 nodes...


Version-Release number of selected component (if applicable):
2.6.18-98.el5

lvm2-2.02.39-2.el5    BUILT: Wed Jul  9 07:26:29 CDT 2008
lvm2-cluster-2.02.39-1.el5    BUILT: Thu Jul  3 09:31:57 CDT 2008
device-mapper-1.02.27-1.el5    BUILT: Thu Jul  3 03:22:29 CDT 2008
cmirror-1.1.22-1.el5    BUILT: Thu Jul 24 15:59:03 CDT 2008
kmod-cmirror-0.1.13-2.el5    BUILT: Thu Jul 24 16:00:48 CDT 2008

Comment 1 Corey Marthaler 2008-09-09 19:19:08 UTC
Created attachment 316241 [details]
log from taft-01

Comment 2 Corey Marthaler 2008-09-09 19:19:32 UTC
Created attachment 316242 [details]
log from taft-02

Comment 3 Corey Marthaler 2008-09-09 19:19:54 UTC
Created attachment 316244 [details]
log from taft-03

Comment 4 Corey Marthaler 2008-09-09 19:20:53 UTC
Created attachment 316245 [details]
log from taft-04

Comment 5 Jonathan Earl Brassow 2008-09-29 21:53:05 UTC
Critical info in this case is 'dmsetup info; dmsetup info | grep SUS'.  This will give us a good indication as to the reason for the deadlock.

Comment 6 Corey Marthaler 2009-04-01 14:22:44 UTC
I ran helter_skelter all night and was unable to reproduce this issue, plus it hasn't been seen in over 6 months, so closing...


Note You need to log in before you can comment on or make changes to this bug.