Bug 732124
| Summary: | 2-leg mirrored filesystem turned read only after primary image and primary log leg device both fail | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Corey Marthaler <cmarthal> | |
| Component: | lvm2 | Assignee: | Jonathan Earl Brassow <jbrassow> | |
| Status: | CLOSED ERRATA | QA Contact: | Corey Marthaler <cmarthal> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 6.2 | CC: | agk, coughlan, dwysocha, heinzm, jbrassow, nperic, prajnoha, prockai, thornber, zkabelac | |
| Target Milestone: | rc | |||
| Target Release: | --- | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | lvm2-2.02.98-4.el6 | Doc Type: | Bug Fix | |
| Doc Text: |
A mirror logical volume can itself have a mirrored log device. When a device in an image of the mirror and its log failed at the same time, it was possible for I/O errors to appear on the mirror LV when they should have been handled. That is, the kernel would not absorb the I/O errors from the failed device by relying on the remaining device. This would then cause file systems built on the device to respond to the I/O errors (turn read-only in the case of the ext3/4 file systems).
The cause was found to be that the mirror was not suspended for repair using the 'noflush' flag. This flag allows the kernel to requeue I/O requests that need to be retried. Because the kernel was not allowed to requeue the requests, it had no choice but to return the I/O as errored. This issue has been corrected and the mirror is now properly suspended with the 'noflush' flag.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 749883 (view as bug list) | Environment: | ||
| Last Closed: | 2013-02-21 08:03:19 UTC | Type: | --- | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 825323 | |||
| Bug Blocks: | 749883, 756082, 886216 | |||
Similar to bug 732098: I have not hit this bug in a weekend's worth of testing (using the helter_skelter test). I hit a different bug where some of the mirror sub-LVs did not come out of suspension. This may have blocked me from seeing this bug, but I have tested for a cumulative duration of 72+ hours and not hit this bug. I'll continue to look for it, but in the absence of hitting it myself, it will have to be verified by the reporter - after fixes for 746254/743112 are in place. (I am currently testing with the proposed fixes for 746254/743112.) Marking as NEEDINFO until either: 1) I hit it with my continued helter_skelter testing or 2) The reporter is able to get new rpms with the aforementioned patches and is able to confirm this bug. Attempting to reproduce this issue is blocked until redundant log mirrors work again (bug 794904). bug 794904 has been cleared. Testing for this case is blocked behind bug 825323. Adding QA ack for 6.4. However, testing of this bug remains blocked by bug 825323. Devel will need to provide unit testing results however before this bug can be ultimately verified by QA. The same fix for bug 825323 appears to fix this bug also: ... ================================================================================ Iteration 92.1 started at Wed Nov 14 17:05:27 CST 2012 ================================================================================ Scenario kill_pri_log_and_pri_leg_2_legs_2_logs: Kill primary leg and primary log of synced 2 leg redundant log mirror(s) ********* Mirror hash info for this scenario ********* * names: syncd_pri_leg_pri_log_2legs_2logs_1 syncd_pri_leg_pri_log_2legs_2logs_2 syncd_pri_leg_pri_log_2legs_2logs_3 * sync: 1 * striped: 0 * leg devices: /dev/sdc1 /dev/sdf1 * log devices: /dev/sdh1 /dev/sde1 * no MDA devices: * failpv(s): /dev/sdc1 /dev/sdh1 * failnode(s): bp-01 * leg fault policy: remove * log fault policy: remove ****************************************************** ... Ran 10 iterations of the tests without problems: ================================================================================ Iteration 9.1 started at Wed Dec 19 11:59:28 CST 2012 ================================================================================ Scenario kill_pri_log_and_pri_leg_2_legs_2_logs: Kill primary leg and primary log of synced 2 leg redundant log mirror(s) ********* Mirror hash info for this scenario ********* * names: syncd_pri_leg_pri_log_2legs_2logs_1 * sync: 1 * striped: 0 * leg devices: /dev/sdh1 /dev/sdf1 * log devices: /dev/sdb1 /dev/sdd1 * no MDA devices: * failpv(s): /dev/sdh1 /dev/sdb1 * failnode(s): r6-node01 * additional snap: /dev/sdf1 * leg fault policy: remove * log fault policy: remove ****************************************************** Verified with: lvm2-2.02.98-6.el6.x86_64 device-mapper-1.02.77-6.el6.x86_64 kernel-2.6.32-347.el6.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-0501.html |
Description of problem: After failing the primary leg and primary log device, the mirrored filesystem was converted to readonly. Scenario: Kill primary leg and primary log of synced 2 leg redundant log mirror(s) ********* Mirror hash info for this scenario ********* * names: syncd_pri_leg_pri_log_2legs_2logs_1 * sync: 1 * striped: 0 * leg devices: /dev/sdd1 /dev/sdg1 * log devices: /dev/sdb1 /dev/sdf1 * no MDA devices: * failpv(s): /dev/sdd1 /dev/sdb1 * failnode(s): taft-01 * leg fault policy: allocate * log fault policy: allocate ****************************************************** Creating mirror(s) on taft-01... taft-01: lvcreate --mirrorlog mirrored -m 1 -n syncd_pri_leg_pri_log_2legs_2logs_1 -L 300M helter_skelter /dev/sdd1:0-1000 /dev/sdg1:0-1000 /dev/sdb1:0-150 /dev/sdf1:0-150 PV=/dev/sdb1 syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_0: 1.3 PV=/dev/sdd1 syncd_pri_leg_pri_log_2legs_2logs_1_mimage_0: 5.1 PV=/dev/sdb1 syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_0: 1.3 PV=/dev/sdd1 syncd_pri_leg_pri_log_2legs_2logs_1_mimage_0: 5.1 Waiting until all mirrors become fully syncd... 1/1 mirror(s) are fully synced: ( 100.00% ) Creating ext on top of mirror(s) on taft-01... mke2fs 1.41.12 (17-May-2010) Mounting mirrored ext filesystems on taft-01... Writing verification files (checkit) to mirror(s) on... ---- taft-01 ---- <start name="taft-01_syncd_pri_leg_pri_log_2legs_2logs_1" pid="10491" time="Fri Aug 19 14:24:52 2011" type="cmd" /> Sleeping 10 seconds to get some outsanding EXT I/O locks before the failure Verifying files (checkit) on mirror(s) on... ---- taft-01 ---- Disabling device sdd on taft-01 Disabling device sdb on taft-01 <fail name="taft-01_syncd_pri_leg_pri_log_2legs_2logs_1" pid="10491" time="Fri Aug 19 14:25:13 2011" type="cmd" duration="21" ec="1" /> ALL STOP! Attempting I/O to cause mirror down conversion(s) on taft-01 dd: opening `/mnt/syncd_pri_leg_pri_log_2legs_2logs_1/ddfile': Read-only file system [root@taft-01 ~]# grep lvm\\[ /var/log/messages Aug 19 14:24:29 taft-01 lvm[2188]: Monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog for events. Aug 19 14:24:29 taft-01 lvm[2188]: helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog is now in-sync. Aug 19 14:24:30 taft-01 lvm[2188]: No longer monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog for events. Aug 19 14:24:30 taft-01 lvm[2188]: Monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog for events. Aug 19 14:24:30 taft-01 lvm[2188]: Monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1 for events. Aug 19 14:24:30 taft-01 lvm[2188]: helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog is now in-sync. Aug 19 14:24:37 taft-01 lvm[2188]: helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1 is now in-sync. Aug 19 14:25:12 taft-01 lvm[2188]: Primary mirror device 253:3 has failed (D). Aug 19 14:25:12 taft-01 lvm[2188]: Device failure in helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog. [...] Aug 19 14:25:12 taft-01 lvm[2188]: Couldn't find device with uuid fuxQDF-cc6B-VaOb-RPsx-vKs1-NFrt-t3GU9F. Aug 19 14:25:12 taft-01 lvm[2188]: Couldn't find device with uuid zlrl4O-67IG-AUY3-p1mq-ULMW-IB4Y-PZEcvo. Aug 19 14:25:13 taft-01 lvm[2188]: Another thread is handling an event. Waiting... Aug 19 14:25:15 taft-01 lvm[2188]: Mirror status: 1 of 2 images failed. Aug 19 14:25:15 taft-01 lvm[2188]: Mirror log status: 1 of 2 images failed. Aug 19 14:25:15 taft-01 lvm[2188]: Trying to up-convert to 2 images, 2 logs. Aug 19 14:25:15 taft-01 lvm[2188]: Monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog for events. Aug 19 14:25:15 taft-01 lvm[2188]: Another thread is handling an event. Waiting... Aug 19 14:25:16 taft-01 lvm[2188]: Monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog for events. Aug 19 14:25:16 taft-01 lvm[2188]: Another thread is handling an event. Waiting... Aug 19 14:25:16 taft-01 lvm[2188]: helter_skelter/syncd_pri_leg_pri_log_2legs_2logs_1: Converted: 1.3% Aug 19 14:25:32 taft-01 lvm[2188]: helter_skelter/syncd_pri_leg_pri_log_2legs_2logs_1: Converted: 100.0% Aug 19 14:25:47 taft-01 lvm[2188]: Repair of mirrored LV helter_skelter/syncd_pri_leg_pri_log_2legs_2logs_1 finished successfully. Aug 19 14:25:47 taft-01 lvm[2188]: Log device 253:5 has failed (D). Aug 19 14:25:47 taft-01 lvm[2188]: Device failure in helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1. Aug 19 14:25:47 taft-01 lvm[2188]: device-mapper: waitevent ioctl failed: No such device or address Aug 19 14:25:47 taft-01 lvm[2188]: dm_task_run failed, errno = 6, No such device or address Aug 19 14:25:47 taft-01 lvm[2188]: helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog disappeared, detaching Aug 19 14:25:47 taft-01 lvm[2188]: No longer monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog for events. Aug 19 14:25:47 taft-01 lvm[2188]: Couldn't find device with uuid fuxQDF-cc6B-VaOb-RPsx-vKs1-NFrt-t3GU9F. Aug 19 14:25:47 taft-01 lvm[2188]: Couldn't find device with uuid zlrl4O-67IG-AUY3-p1mq-ULMW-IB4Y-PZEcvo. Aug 19 14:25:47 taft-01 lvm[2188]: syncd_pri_leg_pri_log_2legs_2logs_1 is consistent. Nothing to repair. Aug 19 14:25:47 taft-01 lvm[2188]: Repair of mirrored LV helter_skelter/syncd_pri_leg_pri_log_2legs_2logs_1 finished successfully. Aug 19 14:25:47 taft-01 lvm[2188]: helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog is now in-sync. Aug 19 14:25:47 taft-01 lvm[2188]: helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog is now in-sync. Aug 19 14:25:47 taft-01 lvm[2188]: helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1 is now in-sync. Aug 19 14:25:48 taft-01 lvm[2188]: No longer monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog for events. couldn't write to syncd_pri_leg_pri_log_2legs_2logs_1 [root@taft-01 ~]# mount /dev/mapper/helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1 on /mnt/syncd_pri_leg_pri_log_2legs_2logs_1 type ext3 (rw) [root@taft-01 ~]# touch /mnt/syncd_pri_leg_pri_log_2legs_2logs_1/testfile touch: cannot touch `/mnt/syncd_pri_leg_pri_log_2legs_2logs_1/testfile': Read-only file system [root@taft-01 ~]# lvs -a -o +devices Couldn't find device with uuid fuxQDF-cc6B-VaOb-RPsx-vKs1-NFrt-t3GU9F. Couldn't find device with uuid zlrl4O-67IG-AUY3-p1mq-ULMW-IB4Y-PZEcvo. LV Attr LSize Log Copy% Devices syncd_pri_leg_pri_log_2legs_2logs_1 mwi-ao 300.00m syncd_pri_leg_pri_log_2legs_2logs_1_mlog 100.00 syncd_pri_leg_pri_log_2legs_2logs_1_mimage_0(0),syncd_pri_leg_pri_log_2legs_2logs_1_mimage_1(0) [syncd_pri_leg_pri_log_2legs_2logs_1_mimage_0] iwi-ao 300.00m /dev/sdg1(0) [syncd_pri_leg_pri_log_2legs_2logs_1_mimage_1] iwi-ao 300.00m /dev/sdh1(0) [syncd_pri_leg_pri_log_2legs_2logs_1_mlog] mwi-ao 4.00m 100.00 syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_0(0),syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_1(0) [syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_0] iwi-ao 4.00m /dev/sde1(0) [syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_1] iwi-ao 4.00m /dev/sdc1(0) Version-Release number of selected component (if applicable): 2.6.32-188.el6.x86_64 lvm2-2.02.87-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 lvm2-libs-2.02.87-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 lvm2-cluster-2.02.87-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 udev-147-2.37.el6 BUILT: Wed Aug 10 07:48:15 CDT 2011 device-mapper-1.02.66-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 device-mapper-libs-1.02.66-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 device-mapper-event-1.02.66-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 device-mapper-event-libs-1.02.66-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 cmirror-2.02.87-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011