Bug 446255 - RHEL5 cmirror tracker: data corruption after restoring failed log device on three mirrors
RHEL5 cmirror tracker: data corruption after restoring failed log device on t...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cmirror (Show other bugs)
5.3
All Linux
high Severity high
: rc
: ---
Assigned To: Jonathan Earl Brassow
Cluster QE
: TestBlocker
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-05-13 15:04 EDT by Corey Marthaler
Modified: 2010-01-11 21:07 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-01-20 16:25:46 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
log from taft-01 (13.74 KB, text/plain)
2008-05-13 15:56 EDT, Corey Marthaler
no flags Details
log from taft-02 (9.41 KB, text/plain)
2008-05-13 15:56 EDT, Corey Marthaler
no flags Details
log from taft-03 (15.35 KB, text/plain)
2008-05-13 15:57 EDT, Corey Marthaler
no flags Details
log from taft-04 (8.20 KB, text/plain)
2008-05-13 15:57 EDT, Corey Marthaler
no flags Details

  None (edit)
Description Corey Marthaler 2008-05-13 15:04:10 EDT
Description of problem:
The data comparison check of this test passed before the log device failure, as
well as after the failure. However, it failed after the failed log device was
added back and the mirror was restored.

Senario: Kill disk log of synced 2 leg mirror(s)

****** Mirror hash info for this scenario ******
* name:      syncd_log_2legs
* sync:      1
* mirrors:   3
* disklog:   1
* failpv:    /dev/sdh1
* legs:      2
* pvs:       /dev/sde1 /dev/sdf1 /dev/sdh1
************************************************

Creating mirror(s) on taft-03...
taft-03: lvcreate -m 1 -n syncd_log_2legs_1 -L 800M helter_skelter
/dev/sde1:0-1000 /dev/sdf1:0-1000 /dev/sdh1:0-150
taft-03: lvcreate -m 1 -n syncd_log_2legs_2 -L 800M helter_skelter
/dev/sde1:0-1000 /dev/sdf1:0-1000 /dev/sdh1:0-150
taft-03: lvcreate -m 1 -n syncd_log_2legs_3 -L 800M helter_skelter
/dev/sde1:0-1000 /dev/sdf1:0-1000 /dev/sdh1:0-150

Waiting until all mirrors become fully syncd...
        0/3 mirror(s) are fully synced: ( 1=15.50% 2=8.00% 3=1.50% )
        0/3 mirror(s) are fully synced: ( 1=36.50% 2=29.00% 3=22.50% )
        0/3 mirror(s) are fully synced: ( 1=55.00% 2=47.50% 3=40.50% )
        0/3 mirror(s) are fully synced: ( 1=75.50% 2=68.00% 3=60.50% )
        0/3 mirror(s) are fully synced: ( 1=94.50% 2=87.50% 3=80.00% )
        3/3 mirror(s) are fully synced: ( 1=100.00% 2=100.00% 3=100.00% )

Creating gfs on top of mirror(s) on taft-03...
Mounting mirrored gfs filesystems on taft-01...
Mounting mirrored gfs filesystems on taft-02...
Mounting mirrored gfs filesystems on taft-03...
Mounting mirrored gfs filesystems on taft-04...

Writing verification files (checkit) to mirror(s) on...
        ---- taft-01 ----
checkit starting with:
CREATE
Num files:          100
Random Seed:        18077
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_1
Working dir:        /mnt/syncd_log_2legs_1/checkit
checkit starting with:
CREATE
Num files:          100
Random Seed:        18081
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_2
Working dir:        /mnt/syncd_log_2legs_2/checkit
checkit starting with:
CREATE
Num files:          100
Random Seed:        18085
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_3
Working dir:        /mnt/syncd_log_2legs_3/checkit

        ---- taft-02 ----
checkit starting with:
CREATE
Num files:          100
Random Seed:        17612
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_1
Working dir:        /mnt/syncd_log_2legs_1/checkit
checkit starting with:
CREATE
Num files:          100
Random Seed:        17616
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_2
Working dir:        /mnt/syncd_log_2legs_2/checkit
checkit starting with:
CREATE
Num files:          100
Random Seed:        17620
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_3
Working dir:        /mnt/syncd_log_2legs_3/checkit

        ---- taft-03 ----
checkit starting with:
CREATE
Num files:          100
Random Seed:        17379
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_1
Working dir:        /mnt/syncd_log_2legs_1/checkit
checkit starting with:
CREATE
Num files:          100
Random Seed:        17383
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_2
Working dir:        /mnt/syncd_log_2legs_2/checkit
checkit starting with:
CREATE
Num files:          100
Random Seed:        17387
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_3
Working dir:        /mnt/syncd_log_2legs_3/checkit

        ---- taft-04 ----
checkit starting with:
CREATE
Num files:          100
Random Seed:        17784
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_1
Working dir:        /mnt/syncd_log_2legs_1/checkit
checkit starting with:
CREATE
Num files:          100
Random Seed:        17788
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_2
Working dir:        /mnt/syncd_log_2legs_2/checkit
checkit starting with:
CREATE
Num files:          100
Random Seed:        17792
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_3
Working dir:        /mnt/syncd_log_2legs_3/checkit


<start name="taft-01_1" pid="17193" time="Tue May 13 12:31:08 2008" type="cmd" />
<start name="taft-01_2" pid="17195" time="Tue May 13 12:31:08 2008" type="cmd" />
<start name="taft-01_3" pid="17197" time="Tue May 13 12:31:08 2008" type="cmd" />
<start name="taft-02_1" pid="17199" time="Tue May 13 12:31:08 2008" type="cmd" />
<start name="taft-02_2" pid="17201" time="Tue May 13 12:31:08 2008" type="cmd" />
<start name="taft-02_3" pid="17203" time="Tue May 13 12:31:08 2008" type="cmd" />
<start name="taft-03_1" pid="17205" time="Tue May 13 12:31:08 2008" type="cmd" />
<start name="taft-03_2" pid="17207" time="Tue May 13 12:31:08 2008" type="cmd" />
<start name="taft-03_3" pid="17209" time="Tue May 13 12:31:08 2008" type="cmd" />
<start name="taft-04_1" pid="17211" time="Tue May 13 12:31:08 2008" type="cmd" />
<start name="taft-04_2" pid="17213" time="Tue May 13 12:31:08 2008" type="cmd" />
<start name="taft-04_3" pid="17215" time="Tue May 13 12:31:08 2008" type="cmd" />
Sleeping 12 seconds to get some outsanding I/O locks before the failure

Disabling device sdh on taft-01
Disabling device sdh on taft-02
Disabling device sdh on taft-03
Disabling device sdh on taft-04

Attempting I/O to cause mirror down conversion(s) on taft-03
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.1558 seconds, 269 MB/s
Verifying the down conversion of the failed mirror(s)
  /dev/sdh1: open failed: No such device or address
Verifying LEG device /dev/sde1 *IS* in the volume(s)
  /dev/sdh1: open failed: No such device or address
Verifying LEG device /dev/sdf1 *IS* in the volume(s)
  /dev/sdh1: open failed: No such device or address
Verifying FAILED device /dev/sdh1 is *NOT* in the volume(s)
  /dev/sdh1: open failed: No such device or address
Verify the dm devices associated with /dev/sdh1 are no longer present

Verifying files (checkit) on mirror(s) on...
        ---- taft-01 ----
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_1
Working dir:        /mnt/syncd_log_2legs_1/checkit
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_2
Working dir:        /mnt/syncd_log_2legs_2/checkit
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_3
Working dir:        /mnt/syncd_log_2legs_3/checkit

        ---- taft-02 ----
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_1
Working dir:        /mnt/syncd_log_2legs_1/checkit
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_2
Working dir:        /mnt/syncd_log_2legs_2/checkit
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_3
Working dir:        /mnt/syncd_log_2legs_3/checkit

        ---- taft-03 ----
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_1
Working dir:        /mnt/syncd_log_2legs_1/checkit
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_2
Working dir:        /mnt/syncd_log_2legs_2/checkit
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_3
Working dir:        /mnt/syncd_log_2legs_3/checkit

        ---- taft-04 ----
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_1
Working dir:        /mnt/syncd_log_2legs_1/checkit
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_2
Working dir:        /mnt/syncd_log_2legs_2/checkit
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_3
Working dir:        /mnt/syncd_log_2legs_3/checkit

Enabling device sdh on taft-01
Enabling device sdh on taft-02
Enabling device sdh on taft-03
Enabling device sdh on taft-04

Recreating PV /dev/sdh1
  WARNING: Volume group helter_skelter is not consistent
  WARNING: Volume Group helter_skelter is not consistent
  WARNING: Volume group helter_skelter is not consistent
Extending the recreated PV back into VG helter_skelter
Up converting linear(s) back to mirror(s) on taft-03...
taft-03: lvconvert -m 1 -b helter_skelter/syncd_log_2legs_1 /dev/sde1:0-1000
/dev/sdf1:0-1000 /dev/sdh1:0-150
taft-03: lvconvert -m 1 -b helter_skelter/syncd_log_2legs_2 /dev/sde1:0-1000
/dev/sdf1:0-1000 /dev/sdh1:0-150
taft-03: lvconvert -m 1 -b helter_skelter/syncd_log_2legs_3 /dev/sde1:0-1000
/dev/sdf1:0-1000 /dev/sdh1:0-150
Verifying the up conversion from linear(s) to mirror(s)
Verifying device /dev/sde1 *IS* in the mirror(s)
Verifying device /dev/sdf1 *IS* in the mirror(s)
Verifying device /dev/sdh1 *IS* in the mirror(s)

Waiting until all mirrors become fully syncd...
        3/3 mirror(s) are fully synced: ( 1=100.00% 2=100.00% 3=100.00% )

Verifying files (checkit) on mirror(s) on...
        ---- taft-01 ----
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_1
Working dir:        /mnt/syncd_log_2legs_1/checkit
*** DATA COMPARISON ERROR [file:utakiwiqbtcivccwghyxdnbmwsbojxyetyoc] ***
Corrupt regions follow - unprintable chars are represented as '.'
-----------------------------------------------------------------
corrupt bytes starting at file offset 0
    1st 32 expected bytes:  CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
    1st 32 actual bytes:    00000000000000000000000000000000

checkit write verify failed


The volumes appear to be put back together properly and are fully sync'ed:
[root@taft-01 ~]# lvs
  LV                VG             Attr   LSize   Origin Snap%  Move Log       
            Copy%  Convert
  syncd_log_2legs_1 helter_skelter mwi-ao 800.00M                   
syncd_log_2legs_1_mlog 100.00
  syncd_log_2legs_2 helter_skelter mwi-ao 800.00M                   
syncd_log_2legs_2_mlog 100.00
  syncd_log_2legs_3 helter_skelter mwi-ao 800.00M                   
syncd_log_2legs_3_mlog 100.00

[root@taft-01 ~]# dmsetup status
helter_skelter-syncd_log_2legs_3_mimage_1: 0 1638400 linear
helter_skelter-syncd_log_2legs_1_mlog: 0 8192 linear
helter_skelter-syncd_log_2legs_3_mimage_0: 0 1638400 linear
helter_skelter-syncd_log_2legs_2_mimage_1: 0 1638400 linear
helter_skelter-syncd_log_2legs_2_mlog: 0 8192 linear
helter_skelter-syncd_log_2legs_2_mimage_0: 0 1638400 linear
helter_skelter-syncd_log_2legs_1_mimage_1: 0 1638400 linear
helter_skelter-syncd_log_2legs_3: 0 1638400 mirror 2 253:11 253:12 1600/1600 1
AA 3 clustered_disk 253:10 A
helter_skelter-syncd_log_2legs_1_mimage_0: 0 1638400 linear
helter_skelter-syncd_log_2legs_2: 0 1638400 mirror 2 253:7 253:8 1600/1600 1 AA
3 clustered_disk 253:6 A
helter_skelter-syncd_log_2legs_3_mlog: 0 8192 linear
VolGroup00-LogVol01: 0 4063232 linear
helter_skelter-syncd_log_2legs_1: 0 1638400 mirror 2 253:3 253:4 1600/1600 1 AA
3 clustered_disk 253:2 A
VolGroup00-LogVol00: 0 138805248 linear


I did see the following odd CGP messages after the device was enabled:
May 13 12:39:49 taft-01 clogd[6683]: [j9j2Tbjg] Attempt to send request to
cluster while CPG not valid: request = DM_CLOG_FLUSH
May 13 12:39:49 taft-01 clogd[6683]: [j9j2Tbjg] Unable to send DM_CLOG_FLUSH to
cluster: Invalid argument
May 13 12:39:49 taft-01 clogd[6683]: Bad callback on local/4
May 13 12:39:49 taft-01 clogd[6683]: [j9j2Tbjg] Unable to send
DM_CLOG_CLEAR_REGION to cluster: No such file or directory
May 13 12:39:49 taft-01 clogd[6683]: Bad callback on local/4
May 13 12:39:49 taft-01 kernel: device-mapper: dm-log-clustered: [j9j2Tbjg]
Server error while processing request [DM_CLOG_FLUSH]: -22


I'll attach the rest of the logs.
 

Version-Release number of selected component (if applicable):
[root@taft-03 ~]# rpm -qi lvm2
Name        : lvm2                         Relocations: (not relocatable)
Version     : 2.02.32                           Vendor: Red Hat, Inc.
Release     : 4.el5                         Build Date: Fri 04 Apr 2008 06:15:19
AM CDT

[root@taft-03 ~]# rpm -qi lvm2-cluster
Name        : lvm2-cluster                 Relocations: (not relocatable)
Version     : 2.02.32                           Vendor: Red Hat, Inc.
Release     : 4.el5                         Build Date: Wed 02 Apr 2008 03:56:50
AM CDT

[root@taft-03 ~]# rpm -qi cmirror
Name        : cmirror                      Relocations: (not relocatable)
Version     : 1.1.17                            Vendor: Red Hat, Inc.
Release     : 1.el5                         Build Date: Fri 09 May 2008 11:33:43
AM CDT

[root@taft-03 ~]# rpm -qi kmod-cmirror
Name        : kmod-cmirror                 Relocations: (not relocatable)
Version     : 0.1.9                             Vendor: Red Hat, Inc.
Release     : 1.el5                         Build Date: Thu 08 May 2008 02:28:27
PM CDT
Comment 1 Corey Marthaler 2008-05-13 15:56:03 EDT
Created attachment 305290 [details]
log from taft-01
Comment 2 Corey Marthaler 2008-05-13 15:56:35 EDT
Created attachment 305291 [details]
log from taft-02
Comment 3 Corey Marthaler 2008-05-13 15:57:04 EDT
Created attachment 305292 [details]
log from taft-03
Comment 4 Corey Marthaler 2008-05-13 15:57:40 EDT
Created attachment 305293 [details]
log from taft-04
Comment 5 Corey Marthaler 2008-05-14 12:40:59 EDT
After talking with Jon, this bug may just be the rhel5 version of bz 359341
(Mirror corruption after one of three legs fail simultaneously on more than 1
mirror).

Comment 6 RHEL Product and Program Management 2008-07-14 11:01:05 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 8 Jonathan Earl Brassow 2008-09-29 17:41:03 EDT
Modified by the following check-in:

commit 85d1423ec47e48ab844088ebaf4157327b928ae9
Author: Jonathan Brassow <jbrassow@redhat.com>
Date:   Fri Sep 19 16:19:02 2008 -0500

    dm-log-clustered/clogd: Fix off-by-one error and compilation errors

    Needed to tweek included header files to make dm-log-clustered compile
    again.

    Found an off-by-one error that was causing mirror corruption in the
    case where the primary mirror device was killed in a mirror.


This off-by-one error will manifest itself anytime you are doing I/O while the mirror is sync'ing.  This could be during the initial sync or a resync after a failure.
Comment 11 errata-xmlrpc 2009-01-20 16:25:46 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0158.html

Note You need to log in before you can comment on or make changes to this bug.