Bug 431696

Summary: RHEL5 cmirror tracker: any device failure on a 3-way mirror can leave write path deadlocked
Product: Red Hat Enterprise Linux 5 Reporter: Corey Marthaler <cmarthal>
Component: cmirrorAssignee: Jonathan Earl Brassow <jbrassow>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: high    
Version: 5.2CC: agk, ccaulfie, dwysocha, edamato, heinzm, mbroz
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-01-20 21:25:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 430797    
Attachments:
Description Flags
messages from taft-01 during the failure
none
messages from taft-02 during the failure
none
messages from taft-03 during the failure
none
messages from taft-04 during the failure none

Description Corey Marthaler 2008-02-06 16:02:12 UTC
Description of problem:
I created a 3 legged mirror, failed the primary leg, and then noticed I/O was
deadlocked to that mirror (as well a copy percent stuck at less then 100%).

Test output:
Senario: Kill primary leg of synced 3 leg mirror(s)

****** Mirror hash info for this scenario ******
* name:      syncd_primary_3legs
* sync:      1
* mirrors:   1
* disklog:   1
* failpv:    /dev/sdg1
* legs:      3
* pvs:       /dev/sdg1 /dev/sdd1 /dev/sdb1 /dev/sde1
************************************************

Creating mirror(s) on taft-02...
taft-02: lvcreate -m 2 -n syncd_primary_3legs_1 -L 800M helter_skelter
/dev/sdg1:0-1000 /dev/sdd1:0-1000 /dev/sdb1:0-1000 /dev/sde1:0-150

Waiting until all mirrors become fully syncd...
        0/1 mirror(s) are fully synced: ( 1=0.00% )
        0/1 mirror(s) are fully synced: ( 1=28.50% )
        0/1 mirror(s) are fully synced: ( 1=56.50% )
        0/1 mirror(s) are fully synced: ( 1=85.00% )
        1/1 mirror(s) are fully synced: ( 1=100.00% )

Creating gfs on top of mirror(s) on taft-02...
Mounting mirrored gfs filesystems on taft-01...
Mounting mirrored gfs filesystems on taft-02...
Mounting mirrored gfs filesystems on taft-03...
Mounting mirrored gfs filesystems on taft-04...

Writing verification files (checkit) to mirror(s) on...
        ---- taft-01 ----
checkit starting with:
CREATE
Num files:          100
Random Seed:        10068
Verify XIOR Stream: /tmp/checkit_syncd_primary_3legs_1
Working dir:        /mnt/syncd_primary_3legs_1/checkit

        ---- taft-02 ----
checkit starting with:
CREATE
Num files:          100
Random Seed:        10652
Verify XIOR Stream: /tmp/checkit_syncd_primary_3legs_1
Working dir:        /mnt/syncd_primary_3legs_1/checkit

        ---- taft-03 ----
checkit starting with:
CREATE
Num files:          100
Random Seed:        10449
Verify XIOR Stream: /tmp/checkit_syncd_primary_3legs_1
Working dir:        /mnt/syncd_primary_3legs_1/checkit

        ---- taft-04 ----
checkit starting with:
CREATE
Num files:          100
Random Seed:        10384
Verify XIOR Stream: /tmp/checkit_syncd_primary_3legs_1
Working dir:        /mnt/syncd_primary_3legs_1/checkit


<start name="taft-01_1" pid="31737" time="Wed Feb  6 09:43:50 2008" type="cmd" />
<start name="taft-02_1" pid="31739" time="Wed Feb  6 09:43:50 2008" type="cmd" />
<start name="taft-03_1" pid="31741" time="Wed Feb  6 09:43:50 2008" type="cmd" />
<start name="taft-04_1" pid="31743" time="Wed Feb  6 09:43:50 2008" type="cmd" />

Disabling device sdg on taft-01
Disabling device sdg on taft-02
Disabling device sdg on taft-03
Disabling device sdg on taft-04

Attempting I/O to cause mirror down conversion(s) on taft-02
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.079951 seconds, 525 MB/s
Verifying the down conversion of the failed mirror(s)
  /dev/sdg1: open failed: No such device or address
Verifying FAILED device /dev/sdg1 is *NOT* in the volume(s)
  /dev/sdg1: open failed: No such device or address
Verifying LEG device /dev/sdd1 *IS* in the volume(s)
  /dev/sdg1: open failed: No such device or address
Verifying LEG device /dev/sdb1 *IS* in the volume(s)
  /dev/sdg1: open failed: No such device or address
Verifying LOG device /dev/sde1 *IS* in the mirror(s)
  /dev/sdg1: open failed: No such device or address

Verifying files (checkit) on mirror(s) on...
        ---- taft-01 ----
Got conflicting hbeat times (0 and 1198016520), discarding both
Got conflicting hbeat times (0 and 1198016520), discarding both
Got conflicting hbeat times (0 and 1198016520), discarding both
[ Here it's stuck ]


[root@taft-02 ~]# lvs
  /dev/sdg1: open failed: No such device or address
  LV                    VG             Attr   LSize   Origin Snap%  Move Log   
                    Copy%  Convert
  LogVol00              VolGroup00     -wi-ao  66.19G
  LogVol01              VolGroup00     -wi-ao   1.94G
  syncd_primary_3legs_1 helter_skelter mwi-ao 800.00M                   
syncd_primary_3legs_1_mlog  99.50


[root@taft-02 ~]# lvs -a -o +devices
  /dev/sdg1: open failed: No such device or address
  LV                               VG             Attr   LSize   Origin Snap% 
Move Log                        Copy%  Convert Devices
  LogVol00                         VolGroup00     -wi-ao  66.19G               
                                              /dev/sda2(0)
  LogVol01                         VolGroup00     -wi-ao   1.94G               
                                              /dev/sda2(2118)
  syncd_primary_3legs_1            helter_skelter mwi-ao 800.00M               
    syncd_primary_3legs_1_mlog  99.50        
syncd_primary_3legs_1_mimage_2(0),syncd_primary_3legs_1_mimage_1(0)
  [syncd_primary_3legs_1_mimage_1] helter_skelter Iwi-ao 800.00M               
                                              /dev/sdd1(0)
  [syncd_primary_3legs_1_mimage_2] helter_skelter Iwi-ao 800.00M               
                                              /dev/sdb1(0)
  [syncd_primary_3legs_1_mlog]     helter_skelter lwi-ao   4.00M               
                                              /dev/sde1(0)


[root@taft-02 ~]# dmsetup ls
helter_skelter-syncd_primary_3legs_1_mlog       (253, 2)
helter_skelter-syncd_primary_3legs_1    (253, 6)
VolGroup00-LogVol01     (253, 1)
VolGroup00-LogVol00     (253, 0)
helter_skelter-syncd_primary_3legs_1_mimage_2   (253, 5)
helter_skelter-syncd_primary_3legs_1_mimage_1   (253, 4)


[root@taft-02 ~]# dmsetup status
helter_skelter-syncd_primary_3legs_1_mlog: 0 8192 linear
helter_skelter-syncd_primary_3legs_1: 0 1638400 mirror 2 253:5 253:4 1596/1600 1
AA 3 clustered_disk 253:2 A
VolGroup00-LogVol01: 0 4063232 linear
VolGroup00-LogVol00: 0 138805248 linear
helter_skelter-syncd_primary_3legs_1_mimage_2: 0 1638400 linear
helter_skelter-syncd_primary_3legs_1_mimage_1: 0 1638400 linear


Version-Release number of selected component (if applicable):
cmirror-1.1.11-1.el5
kmod-cmirror-0.1.5-2.el5
lvm2-2.02.32-1.el5
lvm2-cluster-2.02.32-1.el5

Comment 1 Corey Marthaler 2008-02-06 16:09:58 UTC
Created attachment 294120 [details]
messages from taft-01 during the failure

Comment 2 Corey Marthaler 2008-02-06 16:10:25 UTC
Created attachment 294121 [details]
messages from taft-02 during the failure

Comment 3 Corey Marthaler 2008-02-06 16:11:02 UTC
Created attachment 294123 [details]
messages from taft-03 during the failure

Comment 4 Corey Marthaler 2008-02-06 16:11:18 UTC
Created attachment 294124 [details]
messages from taft-04 during the failure

Comment 5 RHEL Program Management 2008-02-06 16:17:44 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 6 Corey Marthaler 2008-02-06 20:15:27 UTC
This is reproducable.

Comment 7 Corey Marthaler 2008-02-08 17:38:50 UTC
This may be less of a 'primary leg failure' and more of a 'failure to a 3-way
mirror' thing. It's possible that this is related to or the same as bz 432064. I
need to reproduce this again and verify that the write path is indeed deadlocked
and not just really slow.

Comment 8 Corey Marthaler 2008-02-08 23:11:10 UTC
I reproduced this again, and it's definately deadlocked, not just slow. So this
is not the same bz as 432064.

Comment 10 Corey Marthaler 2008-02-12 14:47:14 UTC
I'm fairly confident that this issue is due to the fact I'm using 3-way mirrors.

Comment 11 Corey Marthaler 2008-02-14 15:48:56 UTC
I was able to reproduce this with just 2-way mirrors, so although it doesn't
happen as often, this isn't just a 3-way issue.

Comment 12 Jonathan Earl Brassow 2008-05-02 15:11:47 UTC
Two competing theories:

1) Bug in OpenAIS checkpointing could be causing an inability to reload a mirror
with the same UUID, making it impossible to down-convert from 3-way -> 2-way.

2) One region in the log may not be in-sync.  The machines (for whatever reason)
are not recovering it.  A write to that region would be blocked forever under
cluster mirroring rules, because writes are not allowed to out-of-sync regions.

Looking at the evidence:
1) The mirror never becomes fully synced again
2) There are no messages like:
  clogd[1814]: saCkptSectionIterationNext failure: 27
  clogd[1814]: import_checkpoint: 0 checkpoint sections found

It is a good bet that this is _not_ an openAIS issue, but an issue with mirror
recovery.  It is also a good bet that this was fixed on March 24th, by this checkin:
commit 2200d92f9ebc30fca8f4107929fc4707b57bcebd
Author: Jonathan Brassow <jbrassow>
Date:   Mon Mar 24 16:09:52 2008 -0500

    clogd: do not process requests after calling cpg_leave


However, with the above fix, I can envision seeing messages like:
LOG_ERROR("[%s]  sync_count(%llu) does not match bitmap count(%llu)",
 SHORT_UUID(lc->uuid),
 (unsigned long long)lc->sync_count,
 reset);
LOG_ERROR("[%s]  Resetting sync_count = %llu",
 SHORT_UUID(lc->uuid),
 reset);

If you see those messages, file a new bug (because although the issue is
handled, the messages indicate the underlying problem is not gone).

Comment 13 Corey Marthaler 2008-08-22 15:33:44 UTC
This issue is no longer being seen, marking verified in:

2.6.18-104.el5

lvm2-2.02.32-4.el5    BUILT: Fri Apr  4 06:15:19 CDT 2008
lvm2-cluster-2.02.32-4.el5    BUILT: Wed Apr  2 03:56:50 CDT 2008
cmirror-1.1.22-1.el5    BUILT: Thu Jul 24 15:59:03 CDT 2008
kmod-cmirror-0.1.13-2.el5    BUILT: Thu Jul 24 16:00:48 CDT 2008

Comment 15 errata-xmlrpc 2009-01-20 21:25:36 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0158.html