+++ This bug was initially created as a clone of Bug #732124 +++ Description of problem: After failing the primary leg and primary log device, the mirrored filesystem was converted to readonly. Scenario: Kill primary leg and primary log of synced 2 leg redundant log mirror(s) ********* Mirror hash info for this scenario ********* * names: syncd_pri_leg_pri_log_2legs_2logs_1 * sync: 1 * striped: 0 * leg devices: /dev/sdd1 /dev/sdg1 * log devices: /dev/sdb1 /dev/sdf1 * no MDA devices: * failpv(s): /dev/sdd1 /dev/sdb1 * failnode(s): taft-01 * leg fault policy: allocate * log fault policy: allocate ****************************************************** Creating mirror(s) on taft-01... taft-01: lvcreate --mirrorlog mirrored -m 1 -n syncd_pri_leg_pri_log_2legs_2logs_1 -L 300M helter_skelter /dev/sdd1:0-1000 /dev/sdg1:0-1000 /dev/sdb1:0-150 /dev/sdf1:0-150 PV=/dev/sdb1 syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_0: 1.3 PV=/dev/sdd1 syncd_pri_leg_pri_log_2legs_2logs_1_mimage_0: 5.1 PV=/dev/sdb1 syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_0: 1.3 PV=/dev/sdd1 syncd_pri_leg_pri_log_2legs_2logs_1_mimage_0: 5.1 Waiting until all mirrors become fully syncd... 1/1 mirror(s) are fully synced: ( 100.00% ) Creating ext on top of mirror(s) on taft-01... mke2fs 1.41.12 (17-May-2010) Mounting mirrored ext filesystems on taft-01... Writing verification files (checkit) to mirror(s) on... ---- taft-01 ---- <start name="taft-01_syncd_pri_leg_pri_log_2legs_2logs_1" pid="10491" time="Fri Aug 19 14:24:52 2011" type="cmd" /> Sleeping 10 seconds to get some outsanding EXT I/O locks before the failure Verifying files (checkit) on mirror(s) on... ---- taft-01 ---- Disabling device sdd on taft-01 Disabling device sdb on taft-01 <fail name="taft-01_syncd_pri_leg_pri_log_2legs_2logs_1" pid="10491" time="Fri Aug 19 14:25:13 2011" type="cmd" duration="21" ec="1" /> ALL STOP! Attempting I/O to cause mirror down conversion(s) on taft-01 dd: opening `/mnt/syncd_pri_leg_pri_log_2legs_2logs_1/ddfile': Read-only file system [root@taft-01 ~]# grep lvm\\[ /var/log/messages Aug 19 14:24:29 taft-01 lvm[2188]: Monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog for events. Aug 19 14:24:29 taft-01 lvm[2188]: helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog is now in-sync. Aug 19 14:24:30 taft-01 lvm[2188]: No longer monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog for events. Aug 19 14:24:30 taft-01 lvm[2188]: Monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog for events. Aug 19 14:24:30 taft-01 lvm[2188]: Monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1 for events. Aug 19 14:24:30 taft-01 lvm[2188]: helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog is now in-sync. Aug 19 14:24:37 taft-01 lvm[2188]: helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1 is now in-sync. Aug 19 14:25:12 taft-01 lvm[2188]: Primary mirror device 253:3 has failed (D). Aug 19 14:25:12 taft-01 lvm[2188]: Device failure in helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog. [...] Aug 19 14:25:12 taft-01 lvm[2188]: Couldn't find device with uuid fuxQDF-cc6B-VaOb-RPsx-vKs1-NFrt-t3GU9F. Aug 19 14:25:12 taft-01 lvm[2188]: Couldn't find device with uuid zlrl4O-67IG-AUY3-p1mq-ULMW-IB4Y-PZEcvo. Aug 19 14:25:13 taft-01 lvm[2188]: Another thread is handling an event. Waiting... Aug 19 14:25:15 taft-01 lvm[2188]: Mirror status: 1 of 2 images failed. Aug 19 14:25:15 taft-01 lvm[2188]: Mirror log status: 1 of 2 images failed. Aug 19 14:25:15 taft-01 lvm[2188]: Trying to up-convert to 2 images, 2 logs. Aug 19 14:25:15 taft-01 lvm[2188]: Monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog for events. Aug 19 14:25:15 taft-01 lvm[2188]: Another thread is handling an event. Waiting... Aug 19 14:25:16 taft-01 lvm[2188]: Monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog for events. Aug 19 14:25:16 taft-01 lvm[2188]: Another thread is handling an event. Waiting... Aug 19 14:25:16 taft-01 lvm[2188]: helter_skelter/syncd_pri_leg_pri_log_2legs_2logs_1: Converted: 1.3% Aug 19 14:25:32 taft-01 lvm[2188]: helter_skelter/syncd_pri_leg_pri_log_2legs_2logs_1: Converted: 100.0% Aug 19 14:25:47 taft-01 lvm[2188]: Repair of mirrored LV helter_skelter/syncd_pri_leg_pri_log_2legs_2logs_1 finished successfully. Aug 19 14:25:47 taft-01 lvm[2188]: Log device 253:5 has failed (D). Aug 19 14:25:47 taft-01 lvm[2188]: Device failure in helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1. Aug 19 14:25:47 taft-01 lvm[2188]: device-mapper: waitevent ioctl failed: No such device or address Aug 19 14:25:47 taft-01 lvm[2188]: dm_task_run failed, errno = 6, No such device or address Aug 19 14:25:47 taft-01 lvm[2188]: helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog disappeared, detaching Aug 19 14:25:47 taft-01 lvm[2188]: No longer monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog for events. Aug 19 14:25:47 taft-01 lvm[2188]: Couldn't find device with uuid fuxQDF-cc6B-VaOb-RPsx-vKs1-NFrt-t3GU9F. Aug 19 14:25:47 taft-01 lvm[2188]: Couldn't find device with uuid zlrl4O-67IG-AUY3-p1mq-ULMW-IB4Y-PZEcvo. Aug 19 14:25:47 taft-01 lvm[2188]: syncd_pri_leg_pri_log_2legs_2logs_1 is consistent. Nothing to repair. Aug 19 14:25:47 taft-01 lvm[2188]: Repair of mirrored LV helter_skelter/syncd_pri_leg_pri_log_2legs_2logs_1 finished successfully. Aug 19 14:25:47 taft-01 lvm[2188]: helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog is now in-sync. Aug 19 14:25:47 taft-01 lvm[2188]: helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog is now in-sync. Aug 19 14:25:47 taft-01 lvm[2188]: helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1 is now in-sync. Aug 19 14:25:48 taft-01 lvm[2188]: No longer monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog for events. couldn't write to syncd_pri_leg_pri_log_2legs_2logs_1 [root@taft-01 ~]# mount /dev/mapper/helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1 on /mnt/syncd_pri_leg_pri_log_2legs_2logs_1 type ext3 (rw) [root@taft-01 ~]# touch /mnt/syncd_pri_leg_pri_log_2legs_2logs_1/testfile touch: cannot touch `/mnt/syncd_pri_leg_pri_log_2legs_2logs_1/testfile': Read-only file system [root@taft-01 ~]# lvs -a -o +devices Couldn't find device with uuid fuxQDF-cc6B-VaOb-RPsx-vKs1-NFrt-t3GU9F. Couldn't find device with uuid zlrl4O-67IG-AUY3-p1mq-ULMW-IB4Y-PZEcvo. LV Attr LSize Log Copy% Devices syncd_pri_leg_pri_log_2legs_2logs_1 mwi-ao 300.00m syncd_pri_leg_pri_log_2legs_2logs_1_mlog 100.00 syncd_pri_leg_pri_log_2legs_2logs_1_mimage_0(0),syncd_pri_leg_pri_log_2legs_2logs_1_mimage_1(0) [syncd_pri_leg_pri_log_2legs_2logs_1_mimage_0] iwi-ao 300.00m /dev/sdg1(0) [syncd_pri_leg_pri_log_2legs_2logs_1_mimage_1] iwi-ao 300.00m /dev/sdh1(0) [syncd_pri_leg_pri_log_2legs_2logs_1_mlog] mwi-ao 4.00m 100.00 syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_0(0),syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_1(0) [syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_0] iwi-ao 4.00m /dev/sde1(0) [syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_1] iwi-ao 4.00m /dev/sdc1(0) Version-Release number of selected component (if applicable): 2.6.32-188.el6.x86_64 lvm2-2.02.87-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 lvm2-libs-2.02.87-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 lvm2-cluster-2.02.87-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 udev-147-2.37.el6 BUILT: Wed Aug 10 07:48:15 CDT 2011 device-mapper-1.02.66-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 device-mapper-libs-1.02.66-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 device-mapper-event-1.02.66-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 device-mapper-event-libs-1.02.66-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 cmirror-2.02.87-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 --- Additional comment from jbrassow on 2011-10-17 12:40:12 EDT --- Similar to bug 732098: I have not hit this bug in a weekend's worth of testing (using the helter_skelter test). I hit a different bug where some of the mirror sub-LVs did not come out of suspension. This may have blocked me from seeing this bug, but I have tested for a cumulative duration of 72+ hours and not hit this bug. I'll continue to look for it, but in the absence of hitting it myself, it will have to be verified by the reporter - after fixes for 746254/743112 are in place. (I am currently testing with the proposed fixes for 746254/743112.) Marking as NEEDINFO until either: 1) I hit it with my continued helter_skelter testing or 2) The reporter is able to get new rpms with the aforementioned patches and is able to confirm this bug.
This exists in the latest 5.8 rpms as well. Scenario: Kill primary leg and primary log of synced 2 leg redundant log mirror(s) ********* Mirror hash info for this scenario ********* * names: syncd_pri_leg_pri_log_2legs_2logs_1 * sync: 1 * striped: 0 * leg devices: /dev/sdc1 /dev/sdd1 * log devices: /dev/sdb1 /dev/sdf1 * no MDA devices: * failpv(s): /dev/sdc1 /dev/sdb1 * failnode(s): taft-01 * leg fault policy: remove * log fault policy: allocate ****************************************************** Creating mirror(s) on taft-01... taft-01: lvcreate --mirrorlog mirrored -m 1 -n syncd_pri_leg_pri_log_2legs_2logs_1 -L 600M helter_skelter /dev/sdc1:0-1000 /dev/sdd1:0-1000 /dev/sdb1:0-150 /dev/sdf1:0-150 PV=/dev/sdb1 syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_0: 1.1 PV=/dev/sdc1 syncd_pri_leg_pri_log_2legs_2logs_1_mimage_0: 6 PV=/dev/sdb1 syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_0: 1.1 PV=/dev/sdc1 syncd_pri_leg_pri_log_2legs_2logs_1_mimage_0: 6 Waiting until all mirrors become fully syncd... 1/1 mirror(s) are fully synced: ( 100.00% ) Creating ext on top of mirror(s) on taft-01... mke2fs 1.39 (29-May-2006) Mounting mirrored ext filesystems on taft-01... Writing verification files (checkit) to mirror(s) on... ---- taft-01 ---- Verifying files (checkit) on mirror(s) on... ---- taft-01 ---- Disabling device sdc on taft-01 Disabling device sdb on taft-01 <fail name="taft-01_syncd_pri_leg_pri_log_2legs_2logs_1" pid="2936" time="Fri Oct 28 11:43:52 2011" type="cmd" duration="30" ec="1" /> Attempting I/O to cause mirror down conversion(s) on taft-01 dd: opening `/mnt/syncd_pri_leg_pri_log_2legs_2logs_1/ddfile': Read-only file system couldn't write to syncd_pri_leg_pri_log_2legs_2logs_1 [root@taft-01 ~]# lvs -a -o +devices Couldn't find device with uuid rg8epN-uz9b-mYUZ-6SmJ-Wjmo-ikmP-Gb5dp6. Couldn't find device with uuid 34ybGk-EjXg-Ivj5-3a4G-Zdiq-qE0s-XV86Bd. LV Attr LSize Copy% Devices syncd_pri_leg_pri_log_2legs_2logs_1 -wi-ao 600.00M /dev/sdd1(0) [root@taft-01 ~]# touch /mnt/syncd_pri_leg_pri_log_2legs_2logs_1/foo touch: cannot touch `/mnt/syncd_pri_leg_pri_log_2legs_2logs_1/foo': Read-only file system Oct 28 11:43:57 taft-01 kernel: lost page write due to I/O error on dm-7 Oct 28 11:43:57 taft-01 kernel: Buffer I/O error on device dm-7, logical block 26626 Oct 28 11:43:57 taft-01 kernel: sd 1:0:0:2: rejecting I/O to offline device Oct 28 11:43:57 taft-01 kernel: device-mapper: raid1: A read failure occurred on a mirror device. Oct 28 11:43:57 taft-01 kernel: device-mapper: raid1: Trying different device. Oct 28 11:43:57 taft-01 kernel: lost page write due to I/O error on dm-7 Oct 28 11:43:57 taft-01 kernel: Buffer I/O error on device dm-7, logical block 26627 Oct 28 11:43:57 taft-01 kernel: lost page write due to I/O error on dm-7 Oct 28 11:43:57 taft-01 kernel: Buffer I/O error on device dm-7, logical block 26628 Oct 28 11:43:57 taft-01 kernel: lost page write due to I/O error on dm-7 Oct 28 11:43:57 taft-01 kernel: Buffer I/O error on device dm-7, logical block 26629 Oct 28 11:43:57 taft-01 kernel: lost page write due to I/O error on dm-7 Oct 28 11:43:57 taft-01 kernel: Buffer I/O error on device dm-7, logical block 26630 Oct 28 11:43:57 taft-01 kernel: lost page write due to I/O error on dm-7 Oct 28 11:43:57 taft-01 kernel: Buffer I/O error on device dm-7, logical block 26631 Oct 28 11:43:57 taft-01 kernel: lost page write due to I/O error on dm-7 Oct 28 11:43:57 taft-01 kernel: Buffer I/O error on device dm-7, logical block 26632 Oct 28 11:43:57 taft-01 kernel: lost page write due to I/O error on dm-7 Oct 28 11:43:57 taft-01 kernel: Buffer I/O error on device dm-7, logical block 26633 Oct 28 11:43:57 taft-01 kernel: lost page write due to I/O error on dm-7 Oct 28 11:43:57 taft-01 kernel: Aborting journal on device dm-7. Oct 28 11:43:57 taft-01 kernel: device-mapper: raid1: log postsuspend failed Oct 28 11:43:57 taft-01 kernel: ext3_abort called. Oct 28 11:43:57 taft-01 kernel: EXT3-fs error (device dm-7): ext3_journal_start_sb: Detected aborted journal Oct 28 11:43:57 taft-01 kernel: Remounting filesystem read-only Oct 28 11:43:57 taft-01 xinetd[6465]: EXIT: qarsh status=0 pid=15273 duration=30(sec) Oct 28 11:43:57 taft-01 kernel: sd 1:0:0:2: rejecting I/O to offline device Oct 28 11:44:00 taft-01 last message repeated 76 times Oct 28 11:44:13 taft-01 lvm[11842]: Mirror status: 1 of 2 images failed. Oct 28 11:44:13 taft-01 lvm[11842]: Mirror log status: 1 of 2 images failed. Oct 28 11:44:13 taft-01 lvm[11842]: Repair of mirrored LV helter_skelter/syncd_pri_leg_pri_log_2legs_2logs_1 finished successfully. Oct 28 11:44:13 taft-01 lvm[11842]: Log device 253:4 has failed (D). Oct 28 11:44:13 taft-01 lvm[11842]: Device failure in helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1. Oct 28 11:44:13 taft-01 lvm[11842]: dm_task_run failed, errno = 6, No such device or address Oct 28 11:44:13 taft-01 lvm[11842]: helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog disappeared, detaching Oct 28 11:44:13 taft-01 lvm[11842]: No longer monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog for events. Oct 28 11:44:13 taft-01 lvm[11842]: Couldn't find device with uuid rg8epN-uz9b-mYUZ-6SmJ-Wjmo-ikmP-Gb5dp6. Oct 28 11:44:13 taft-01 lvm[11842]: Couldn't find device with uuid 34ybGk-EjXg-Ivj5-3a4G-Zdiq-qE0s-XV86Bd. Oct 28 11:44:14 taft-01 lvm[11842]: Repair of mirrored LV helter_skelter/syncd_pri_leg_pri_log_2legs_2logs_1 finished successfully. Oct 28 11:44:14 taft-01 lvm[11842]: helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1 has unmirrored portion.
2.6.18-274.el5 lvm2-2.02.88-2.el5 BUILT: Fri Oct 21 09:48:50 CDT 2011 lvm2-cluster-2.02.88-2.el5 BUILT: Fri Oct 21 09:49:24 CDT 2011 device-mapper-1.02.67-2.el5 BUILT: Mon Oct 17 08:31:56 CDT 2011 device-mapper-event-1.02.67-2.el5 BUILT: Mon Oct 17 08:31:56 CDT 2011 cmirror-1.1.39-10.el5 BUILT: Wed Sep 8 16:32:05 CDT 2010 kmod-cmirror-0.1.22-3.el5 BUILT: Tue Dec 22 13:39:47 CST 2009
Created attachment 551694 [details] Patch for 769731 fix (which this bug was cloned from
Comment on attachment 551694 [details] Patch for 769731 fix (which this bug was cloned from patch put in wrong bug
This bug is the same as (and is a clone of) bug 732124, which was fixed by the upstream commit listed below. This commit can be used to fix this bug also. It applies cleanly (except for WHATS_NEW) to release 2.02.88. commit 54c73b7723713f43413584d59ca0bdd42c1d8241 Author: Jonathan Brassow <jbrassow> Date: Wed Nov 14 14:58:47 2012 -0600 mirror: Mirrored log should be fixed before mirror when double fault occu This patch is intended to fix bug 825323 - FS turns read-only during a dou fault of a mirror leg and mirrored log's leg at the same time. It only affects a 2-way mirror with a mirrored log. 3+-way mirrors and mirrors without a mirrored log are not affected. The problem resulted from the fact that the top level mirror was not using 'noflush' when suspending before its "down-convert". When a mirror image fails, the bios are queue until a suspend is recieved. If it is a 'noflush' suspend, the bios can be safely requeued in the DM core. If 'noflush' is not used, the bios must be pushed through the target and if a device is failed for a mirror, that means issuing an error. When an error is received by a file system, it results in it turning read-only (depending on the FS). Part of the problem was is due to the nature of the stacking involved in using a mirror as a mirror's log. When an image in each fail, the top level mirror stalls because it is waiting for a log flush. The other stalls waiting for corrective action. When the repair command is issued, the entire stacked arrangement is collapsed to a linear LV. The log flush then fails (somewhat uncleanly) and the top-level mirror is suspende without 'noflush' because it is a linear device. This patch allows the log to be repaired first, which in turn allows the top-level mirror's log flush to complete cleanly. The top-level mirror is then secondarily reduced to a linear device - at which time this mirror is suspended properly with 'noflush'.
Tested this with multiple iterations of th efollowing scenario without problems: Scenario kill_pri_log_and_pri_leg_2_legs_2_logs: Kill primary leg and primary log of synced 2 leg redundant log mirror(s) ********* Mirror hash info for this scenario ********* * names: syncd_pri_leg_pri_log_2legs_2logs_1 syncd_pri_leg_pri_log_2legs_2logs_2 syncd_pri_leg_pri_log_2legs_2logs_3 * sync: 1 * striped: 0 * leg devices: /dev/sdh1 /dev/sdf1 * log devices: /dev/sdc1 /dev/sdd1 * no MDA devices: * failpv(s): /dev/sdh1 /dev/sdc1 * failnode(s): r5-node02 * leg fault policy: remove * log fault policy: remove ****************************************************** Teseted with: lvm2-2.02.88-11.el5
Tested again with 15 iterations of syncd_pri_leg_pri_log_2legs_2logs without issues (except the known dm_task_run) Marking verified with lvm2-2.02.88-11.el5
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1352.html
Why this problem doesn't happen with XFS?