Description of problem: The data comparison check of this test passed before the log device failure, as well as after the failure. However, it failed after the failed log device was added back and the mirror was restored. Senario: Kill disk log of synced 2 leg mirror(s) ****** Mirror hash info for this scenario ****** * name: syncd_log_2legs * sync: 1 * mirrors: 3 * disklog: 1 * failpv: /dev/sdh1 * legs: 2 * pvs: /dev/sde1 /dev/sdf1 /dev/sdh1 ************************************************ Creating mirror(s) on taft-03... taft-03: lvcreate -m 1 -n syncd_log_2legs_1 -L 800M helter_skelter /dev/sde1:0-1000 /dev/sdf1:0-1000 /dev/sdh1:0-150 taft-03: lvcreate -m 1 -n syncd_log_2legs_2 -L 800M helter_skelter /dev/sde1:0-1000 /dev/sdf1:0-1000 /dev/sdh1:0-150 taft-03: lvcreate -m 1 -n syncd_log_2legs_3 -L 800M helter_skelter /dev/sde1:0-1000 /dev/sdf1:0-1000 /dev/sdh1:0-150 Waiting until all mirrors become fully syncd... 0/3 mirror(s) are fully synced: ( 1=15.50% 2=8.00% 3=1.50% ) 0/3 mirror(s) are fully synced: ( 1=36.50% 2=29.00% 3=22.50% ) 0/3 mirror(s) are fully synced: ( 1=55.00% 2=47.50% 3=40.50% ) 0/3 mirror(s) are fully synced: ( 1=75.50% 2=68.00% 3=60.50% ) 0/3 mirror(s) are fully synced: ( 1=94.50% 2=87.50% 3=80.00% ) 3/3 mirror(s) are fully synced: ( 1=100.00% 2=100.00% 3=100.00% ) Creating gfs on top of mirror(s) on taft-03... Mounting mirrored gfs filesystems on taft-01... Mounting mirrored gfs filesystems on taft-02... Mounting mirrored gfs filesystems on taft-03... Mounting mirrored gfs filesystems on taft-04... Writing verification files (checkit) to mirror(s) on... ---- taft-01 ---- checkit starting with: CREATE Num files: 100 Random Seed: 18077 Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_1 Working dir: /mnt/syncd_log_2legs_1/checkit checkit starting with: CREATE Num files: 100 Random Seed: 18081 Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_2 Working dir: /mnt/syncd_log_2legs_2/checkit checkit starting with: CREATE Num files: 100 Random Seed: 18085 Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_3 Working dir: /mnt/syncd_log_2legs_3/checkit ---- taft-02 ---- checkit starting with: CREATE Num files: 100 Random Seed: 17612 Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_1 Working dir: /mnt/syncd_log_2legs_1/checkit checkit starting with: CREATE Num files: 100 Random Seed: 17616 Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_2 Working dir: /mnt/syncd_log_2legs_2/checkit checkit starting with: CREATE Num files: 100 Random Seed: 17620 Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_3 Working dir: /mnt/syncd_log_2legs_3/checkit ---- taft-03 ---- checkit starting with: CREATE Num files: 100 Random Seed: 17379 Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_1 Working dir: /mnt/syncd_log_2legs_1/checkit checkit starting with: CREATE Num files: 100 Random Seed: 17383 Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_2 Working dir: /mnt/syncd_log_2legs_2/checkit checkit starting with: CREATE Num files: 100 Random Seed: 17387 Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_3 Working dir: /mnt/syncd_log_2legs_3/checkit ---- taft-04 ---- checkit starting with: CREATE Num files: 100 Random Seed: 17784 Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_1 Working dir: /mnt/syncd_log_2legs_1/checkit checkit starting with: CREATE Num files: 100 Random Seed: 17788 Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_2 Working dir: /mnt/syncd_log_2legs_2/checkit checkit starting with: CREATE Num files: 100 Random Seed: 17792 Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_3 Working dir: /mnt/syncd_log_2legs_3/checkit <start name="taft-01_1" pid="17193" time="Tue May 13 12:31:08 2008" type="cmd" /> <start name="taft-01_2" pid="17195" time="Tue May 13 12:31:08 2008" type="cmd" /> <start name="taft-01_3" pid="17197" time="Tue May 13 12:31:08 2008" type="cmd" /> <start name="taft-02_1" pid="17199" time="Tue May 13 12:31:08 2008" type="cmd" /> <start name="taft-02_2" pid="17201" time="Tue May 13 12:31:08 2008" type="cmd" /> <start name="taft-02_3" pid="17203" time="Tue May 13 12:31:08 2008" type="cmd" /> <start name="taft-03_1" pid="17205" time="Tue May 13 12:31:08 2008" type="cmd" /> <start name="taft-03_2" pid="17207" time="Tue May 13 12:31:08 2008" type="cmd" /> <start name="taft-03_3" pid="17209" time="Tue May 13 12:31:08 2008" type="cmd" /> <start name="taft-04_1" pid="17211" time="Tue May 13 12:31:08 2008" type="cmd" /> <start name="taft-04_2" pid="17213" time="Tue May 13 12:31:08 2008" type="cmd" /> <start name="taft-04_3" pid="17215" time="Tue May 13 12:31:08 2008" type="cmd" /> Sleeping 12 seconds to get some outsanding I/O locks before the failure Disabling device sdh on taft-01 Disabling device sdh on taft-02 Disabling device sdh on taft-03 Disabling device sdh on taft-04 Attempting I/O to cause mirror down conversion(s) on taft-03 10+0 records in 10+0 records out 41943040 bytes (42 MB) copied, 0.1558 seconds, 269 MB/s Verifying the down conversion of the failed mirror(s) /dev/sdh1: open failed: No such device or address Verifying LEG device /dev/sde1 *IS* in the volume(s) /dev/sdh1: open failed: No such device or address Verifying LEG device /dev/sdf1 *IS* in the volume(s) /dev/sdh1: open failed: No such device or address Verifying FAILED device /dev/sdh1 is *NOT* in the volume(s) /dev/sdh1: open failed: No such device or address Verify the dm devices associated with /dev/sdh1 are no longer present Verifying files (checkit) on mirror(s) on... ---- taft-01 ---- checkit starting with: VERIFY Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_1 Working dir: /mnt/syncd_log_2legs_1/checkit checkit starting with: VERIFY Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_2 Working dir: /mnt/syncd_log_2legs_2/checkit checkit starting with: VERIFY Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_3 Working dir: /mnt/syncd_log_2legs_3/checkit ---- taft-02 ---- checkit starting with: VERIFY Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_1 Working dir: /mnt/syncd_log_2legs_1/checkit checkit starting with: VERIFY Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_2 Working dir: /mnt/syncd_log_2legs_2/checkit checkit starting with: VERIFY Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_3 Working dir: /mnt/syncd_log_2legs_3/checkit ---- taft-03 ---- checkit starting with: VERIFY Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_1 Working dir: /mnt/syncd_log_2legs_1/checkit checkit starting with: VERIFY Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_2 Working dir: /mnt/syncd_log_2legs_2/checkit checkit starting with: VERIFY Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_3 Working dir: /mnt/syncd_log_2legs_3/checkit ---- taft-04 ---- checkit starting with: VERIFY Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_1 Working dir: /mnt/syncd_log_2legs_1/checkit checkit starting with: VERIFY Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_2 Working dir: /mnt/syncd_log_2legs_2/checkit checkit starting with: VERIFY Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_3 Working dir: /mnt/syncd_log_2legs_3/checkit Enabling device sdh on taft-01 Enabling device sdh on taft-02 Enabling device sdh on taft-03 Enabling device sdh on taft-04 Recreating PV /dev/sdh1 WARNING: Volume group helter_skelter is not consistent WARNING: Volume Group helter_skelter is not consistent WARNING: Volume group helter_skelter is not consistent Extending the recreated PV back into VG helter_skelter Up converting linear(s) back to mirror(s) on taft-03... taft-03: lvconvert -m 1 -b helter_skelter/syncd_log_2legs_1 /dev/sde1:0-1000 /dev/sdf1:0-1000 /dev/sdh1:0-150 taft-03: lvconvert -m 1 -b helter_skelter/syncd_log_2legs_2 /dev/sde1:0-1000 /dev/sdf1:0-1000 /dev/sdh1:0-150 taft-03: lvconvert -m 1 -b helter_skelter/syncd_log_2legs_3 /dev/sde1:0-1000 /dev/sdf1:0-1000 /dev/sdh1:0-150 Verifying the up conversion from linear(s) to mirror(s) Verifying device /dev/sde1 *IS* in the mirror(s) Verifying device /dev/sdf1 *IS* in the mirror(s) Verifying device /dev/sdh1 *IS* in the mirror(s) Waiting until all mirrors become fully syncd... 3/3 mirror(s) are fully synced: ( 1=100.00% 2=100.00% 3=100.00% ) Verifying files (checkit) on mirror(s) on... ---- taft-01 ---- checkit starting with: VERIFY Verify XIOR Stream: /tmp/checkit_syncd_log_2legs_1 Working dir: /mnt/syncd_log_2legs_1/checkit *** DATA COMPARISON ERROR [file:utakiwiqbtcivccwghyxdnbmwsbojxyetyoc] *** Corrupt regions follow - unprintable chars are represented as '.' ----------------------------------------------------------------- corrupt bytes starting at file offset 0 1st 32 expected bytes: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 1st 32 actual bytes: 00000000000000000000000000000000 checkit write verify failed The volumes appear to be put back together properly and are fully sync'ed: [root@taft-01 ~]# lvs LV VG Attr LSize Origin Snap% Move Log Copy% Convert syncd_log_2legs_1 helter_skelter mwi-ao 800.00M syncd_log_2legs_1_mlog 100.00 syncd_log_2legs_2 helter_skelter mwi-ao 800.00M syncd_log_2legs_2_mlog 100.00 syncd_log_2legs_3 helter_skelter mwi-ao 800.00M syncd_log_2legs_3_mlog 100.00 [root@taft-01 ~]# dmsetup status helter_skelter-syncd_log_2legs_3_mimage_1: 0 1638400 linear helter_skelter-syncd_log_2legs_1_mlog: 0 8192 linear helter_skelter-syncd_log_2legs_3_mimage_0: 0 1638400 linear helter_skelter-syncd_log_2legs_2_mimage_1: 0 1638400 linear helter_skelter-syncd_log_2legs_2_mlog: 0 8192 linear helter_skelter-syncd_log_2legs_2_mimage_0: 0 1638400 linear helter_skelter-syncd_log_2legs_1_mimage_1: 0 1638400 linear helter_skelter-syncd_log_2legs_3: 0 1638400 mirror 2 253:11 253:12 1600/1600 1 AA 3 clustered_disk 253:10 A helter_skelter-syncd_log_2legs_1_mimage_0: 0 1638400 linear helter_skelter-syncd_log_2legs_2: 0 1638400 mirror 2 253:7 253:8 1600/1600 1 AA 3 clustered_disk 253:6 A helter_skelter-syncd_log_2legs_3_mlog: 0 8192 linear VolGroup00-LogVol01: 0 4063232 linear helter_skelter-syncd_log_2legs_1: 0 1638400 mirror 2 253:3 253:4 1600/1600 1 AA 3 clustered_disk 253:2 A VolGroup00-LogVol00: 0 138805248 linear I did see the following odd CGP messages after the device was enabled: May 13 12:39:49 taft-01 clogd[6683]: [j9j2Tbjg] Attempt to send request to cluster while CPG not valid: request = DM_CLOG_FLUSH May 13 12:39:49 taft-01 clogd[6683]: [j9j2Tbjg] Unable to send DM_CLOG_FLUSH to cluster: Invalid argument May 13 12:39:49 taft-01 clogd[6683]: Bad callback on local/4 May 13 12:39:49 taft-01 clogd[6683]: [j9j2Tbjg] Unable to send DM_CLOG_CLEAR_REGION to cluster: No such file or directory May 13 12:39:49 taft-01 clogd[6683]: Bad callback on local/4 May 13 12:39:49 taft-01 kernel: device-mapper: dm-log-clustered: [j9j2Tbjg] Server error while processing request [DM_CLOG_FLUSH]: -22 I'll attach the rest of the logs. Version-Release number of selected component (if applicable): [root@taft-03 ~]# rpm -qi lvm2 Name : lvm2 Relocations: (not relocatable) Version : 2.02.32 Vendor: Red Hat, Inc. Release : 4.el5 Build Date: Fri 04 Apr 2008 06:15:19 AM CDT [root@taft-03 ~]# rpm -qi lvm2-cluster Name : lvm2-cluster Relocations: (not relocatable) Version : 2.02.32 Vendor: Red Hat, Inc. Release : 4.el5 Build Date: Wed 02 Apr 2008 03:56:50 AM CDT [root@taft-03 ~]# rpm -qi cmirror Name : cmirror Relocations: (not relocatable) Version : 1.1.17 Vendor: Red Hat, Inc. Release : 1.el5 Build Date: Fri 09 May 2008 11:33:43 AM CDT [root@taft-03 ~]# rpm -qi kmod-cmirror Name : kmod-cmirror Relocations: (not relocatable) Version : 0.1.9 Vendor: Red Hat, Inc. Release : 1.el5 Build Date: Thu 08 May 2008 02:28:27 PM CDT
Created attachment 305290 [details] log from taft-01
Created attachment 305291 [details] log from taft-02
Created attachment 305292 [details] log from taft-03
Created attachment 305293 [details] log from taft-04
After talking with Jon, this bug may just be the rhel5 version of bz 359341 (Mirror corruption after one of three legs fail simultaneously on more than 1 mirror).
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Modified by the following check-in: commit 85d1423ec47e48ab844088ebaf4157327b928ae9 Author: Jonathan Brassow <jbrassow> Date: Fri Sep 19 16:19:02 2008 -0500 dm-log-clustered/clogd: Fix off-by-one error and compilation errors Needed to tweek included header files to make dm-log-clustered compile again. Found an off-by-one error that was causing mirror corruption in the case where the primary mirror device was killed in a mirror. This off-by-one error will manifest itself anytime you are doing I/O while the mirror is sync'ing. This could be during the initial sync or a resync after a failure.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2009-0158.html