Bug 431696 - RHEL5 cmirror tracker: any device failure on a 3-way mirror can leave write path deadlocked
RHEL5 cmirror tracker: any device failure on a 3-way mirror can leave write p...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cmirror (Show other bugs)
5.2
All Linux
high Severity high
: rc
: ---
Assigned To: Jonathan Earl Brassow
Cluster QE
:
Depends On:
Blocks: 430797
  Show dependency treegraph
 
Reported: 2008-02-06 11:02 EST by Corey Marthaler
Modified: 2010-01-11 21:06 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-01-20 16:25:36 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
messages from taft-01 during the failure (80.11 KB, text/plain)
2008-02-06 11:09 EST, Corey Marthaler
no flags Details
messages from taft-02 during the failure (81.28 KB, text/plain)
2008-02-06 11:10 EST, Corey Marthaler
no flags Details
messages from taft-03 during the failure (80.36 KB, text/plain)
2008-02-06 11:11 EST, Corey Marthaler
no flags Details
messages from taft-04 during the failure (81.52 KB, text/plain)
2008-02-06 11:11 EST, Corey Marthaler
no flags Details

  None (edit)
Description Corey Marthaler 2008-02-06 11:02:12 EST
Description of problem:
I created a 3 legged mirror, failed the primary leg, and then noticed I/O was
deadlocked to that mirror (as well a copy percent stuck at less then 100%).

Test output:
Senario: Kill primary leg of synced 3 leg mirror(s)

****** Mirror hash info for this scenario ******
* name:      syncd_primary_3legs
* sync:      1
* mirrors:   1
* disklog:   1
* failpv:    /dev/sdg1
* legs:      3
* pvs:       /dev/sdg1 /dev/sdd1 /dev/sdb1 /dev/sde1
************************************************

Creating mirror(s) on taft-02...
taft-02: lvcreate -m 2 -n syncd_primary_3legs_1 -L 800M helter_skelter
/dev/sdg1:0-1000 /dev/sdd1:0-1000 /dev/sdb1:0-1000 /dev/sde1:0-150

Waiting until all mirrors become fully syncd...
        0/1 mirror(s) are fully synced: ( 1=0.00% )
        0/1 mirror(s) are fully synced: ( 1=28.50% )
        0/1 mirror(s) are fully synced: ( 1=56.50% )
        0/1 mirror(s) are fully synced: ( 1=85.00% )
        1/1 mirror(s) are fully synced: ( 1=100.00% )

Creating gfs on top of mirror(s) on taft-02...
Mounting mirrored gfs filesystems on taft-01...
Mounting mirrored gfs filesystems on taft-02...
Mounting mirrored gfs filesystems on taft-03...
Mounting mirrored gfs filesystems on taft-04...

Writing verification files (checkit) to mirror(s) on...
        ---- taft-01 ----
checkit starting with:
CREATE
Num files:          100
Random Seed:        10068
Verify XIOR Stream: /tmp/checkit_syncd_primary_3legs_1
Working dir:        /mnt/syncd_primary_3legs_1/checkit

        ---- taft-02 ----
checkit starting with:
CREATE
Num files:          100
Random Seed:        10652
Verify XIOR Stream: /tmp/checkit_syncd_primary_3legs_1
Working dir:        /mnt/syncd_primary_3legs_1/checkit

        ---- taft-03 ----
checkit starting with:
CREATE
Num files:          100
Random Seed:        10449
Verify XIOR Stream: /tmp/checkit_syncd_primary_3legs_1
Working dir:        /mnt/syncd_primary_3legs_1/checkit

        ---- taft-04 ----
checkit starting with:
CREATE
Num files:          100
Random Seed:        10384
Verify XIOR Stream: /tmp/checkit_syncd_primary_3legs_1
Working dir:        /mnt/syncd_primary_3legs_1/checkit


<start name="taft-01_1" pid="31737" time="Wed Feb  6 09:43:50 2008" type="cmd" />
<start name="taft-02_1" pid="31739" time="Wed Feb  6 09:43:50 2008" type="cmd" />
<start name="taft-03_1" pid="31741" time="Wed Feb  6 09:43:50 2008" type="cmd" />
<start name="taft-04_1" pid="31743" time="Wed Feb  6 09:43:50 2008" type="cmd" />

Disabling device sdg on taft-01
Disabling device sdg on taft-02
Disabling device sdg on taft-03
Disabling device sdg on taft-04

Attempting I/O to cause mirror down conversion(s) on taft-02
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.079951 seconds, 525 MB/s
Verifying the down conversion of the failed mirror(s)
  /dev/sdg1: open failed: No such device or address
Verifying FAILED device /dev/sdg1 is *NOT* in the volume(s)
  /dev/sdg1: open failed: No such device or address
Verifying LEG device /dev/sdd1 *IS* in the volume(s)
  /dev/sdg1: open failed: No such device or address
Verifying LEG device /dev/sdb1 *IS* in the volume(s)
  /dev/sdg1: open failed: No such device or address
Verifying LOG device /dev/sde1 *IS* in the mirror(s)
  /dev/sdg1: open failed: No such device or address

Verifying files (checkit) on mirror(s) on...
        ---- taft-01 ----
Got conflicting hbeat times (0 and 1198016520), discarding both
Got conflicting hbeat times (0 and 1198016520), discarding both
Got conflicting hbeat times (0 and 1198016520), discarding both
[ Here it's stuck ]


[root@taft-02 ~]# lvs
  /dev/sdg1: open failed: No such device or address
  LV                    VG             Attr   LSize   Origin Snap%  Move Log   
                    Copy%  Convert
  LogVol00              VolGroup00     -wi-ao  66.19G
  LogVol01              VolGroup00     -wi-ao   1.94G
  syncd_primary_3legs_1 helter_skelter mwi-ao 800.00M                   
syncd_primary_3legs_1_mlog  99.50


[root@taft-02 ~]# lvs -a -o +devices
  /dev/sdg1: open failed: No such device or address
  LV                               VG             Attr   LSize   Origin Snap% 
Move Log                        Copy%  Convert Devices
  LogVol00                         VolGroup00     -wi-ao  66.19G               
                                              /dev/sda2(0)
  LogVol01                         VolGroup00     -wi-ao   1.94G               
                                              /dev/sda2(2118)
  syncd_primary_3legs_1            helter_skelter mwi-ao 800.00M               
    syncd_primary_3legs_1_mlog  99.50        
syncd_primary_3legs_1_mimage_2(0),syncd_primary_3legs_1_mimage_1(0)
  [syncd_primary_3legs_1_mimage_1] helter_skelter Iwi-ao 800.00M               
                                              /dev/sdd1(0)
  [syncd_primary_3legs_1_mimage_2] helter_skelter Iwi-ao 800.00M               
                                              /dev/sdb1(0)
  [syncd_primary_3legs_1_mlog]     helter_skelter lwi-ao   4.00M               
                                              /dev/sde1(0)


[root@taft-02 ~]# dmsetup ls
helter_skelter-syncd_primary_3legs_1_mlog       (253, 2)
helter_skelter-syncd_primary_3legs_1    (253, 6)
VolGroup00-LogVol01     (253, 1)
VolGroup00-LogVol00     (253, 0)
helter_skelter-syncd_primary_3legs_1_mimage_2   (253, 5)
helter_skelter-syncd_primary_3legs_1_mimage_1   (253, 4)


[root@taft-02 ~]# dmsetup status
helter_skelter-syncd_primary_3legs_1_mlog: 0 8192 linear
helter_skelter-syncd_primary_3legs_1: 0 1638400 mirror 2 253:5 253:4 1596/1600 1
AA 3 clustered_disk 253:2 A
VolGroup00-LogVol01: 0 4063232 linear
VolGroup00-LogVol00: 0 138805248 linear
helter_skelter-syncd_primary_3legs_1_mimage_2: 0 1638400 linear
helter_skelter-syncd_primary_3legs_1_mimage_1: 0 1638400 linear


Version-Release number of selected component (if applicable):
cmirror-1.1.11-1.el5
kmod-cmirror-0.1.5-2.el5
lvm2-2.02.32-1.el5
lvm2-cluster-2.02.32-1.el5
Comment 1 Corey Marthaler 2008-02-06 11:09:58 EST
Created attachment 294120 [details]
messages from taft-01 during the failure
Comment 2 Corey Marthaler 2008-02-06 11:10:25 EST
Created attachment 294121 [details]
messages from taft-02 during the failure
Comment 3 Corey Marthaler 2008-02-06 11:11:02 EST
Created attachment 294123 [details]
messages from taft-03 during the failure
Comment 4 Corey Marthaler 2008-02-06 11:11:18 EST
Created attachment 294124 [details]
messages from taft-04 during the failure
Comment 5 RHEL Product and Program Management 2008-02-06 11:17:44 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 6 Corey Marthaler 2008-02-06 15:15:27 EST
This is reproducable.
Comment 7 Corey Marthaler 2008-02-08 12:38:50 EST
This may be less of a 'primary leg failure' and more of a 'failure to a 3-way
mirror' thing. It's possible that this is related to or the same as bz 432064. I
need to reproduce this again and verify that the write path is indeed deadlocked
and not just really slow.
Comment 8 Corey Marthaler 2008-02-08 18:11:10 EST
I reproduced this again, and it's definately deadlocked, not just slow. So this
is not the same bz as 432064.
Comment 10 Corey Marthaler 2008-02-12 09:47:14 EST
I'm fairly confident that this issue is due to the fact I'm using 3-way mirrors.
Comment 11 Corey Marthaler 2008-02-14 10:48:56 EST
I was able to reproduce this with just 2-way mirrors, so although it doesn't
happen as often, this isn't just a 3-way issue.
Comment 12 Jonathan Earl Brassow 2008-05-02 11:11:47 EDT
Two competing theories:

1) Bug in OpenAIS checkpointing could be causing an inability to reload a mirror
with the same UUID, making it impossible to down-convert from 3-way -> 2-way.

2) One region in the log may not be in-sync.  The machines (for whatever reason)
are not recovering it.  A write to that region would be blocked forever under
cluster mirroring rules, because writes are not allowed to out-of-sync regions.

Looking at the evidence:
1) The mirror never becomes fully synced again
2) There are no messages like:
  clogd[1814]: saCkptSectionIterationNext failure: 27
  clogd[1814]: import_checkpoint: 0 checkpoint sections found

It is a good bet that this is _not_ an openAIS issue, but an issue with mirror
recovery.  It is also a good bet that this was fixed on March 24th, by this checkin:
commit 2200d92f9ebc30fca8f4107929fc4707b57bcebd
Author: Jonathan Brassow <jbrassow@redhat.com>
Date:   Mon Mar 24 16:09:52 2008 -0500

    clogd: do not process requests after calling cpg_leave


However, with the above fix, I can envision seeing messages like:
LOG_ERROR("[%s]  sync_count(%llu) does not match bitmap count(%llu)",
 SHORT_UUID(lc->uuid),
 (unsigned long long)lc->sync_count,
 reset);
LOG_ERROR("[%s]  Resetting sync_count = %llu",
 SHORT_UUID(lc->uuid),
 reset);

If you see those messages, file a new bug (because although the issue is
handled, the messages indicate the underlying problem is not gone).
Comment 13 Corey Marthaler 2008-08-22 11:33:44 EDT
This issue is no longer being seen, marking verified in:

2.6.18-104.el5

lvm2-2.02.32-4.el5    BUILT: Fri Apr  4 06:15:19 CDT 2008
lvm2-cluster-2.02.32-4.el5    BUILT: Wed Apr  2 03:56:50 CDT 2008
cmirror-1.1.22-1.el5    BUILT: Thu Jul 24 15:59:03 CDT 2008
kmod-cmirror-0.1.13-2.el5    BUILT: Thu Jul 24 16:00:48 CDT 2008
Comment 15 errata-xmlrpc 2009-01-20 16:25:36 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0158.html

Note You need to log in before you can comment on or make changes to this bug.