Bug 732124 - 2-leg mirrored filesystem turned read only after primary image and primary log leg device both fail
2-leg mirrored filesystem turned read only after primary image and primary lo...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2 (Show other bugs)
6.2
x86_64 Linux
high Severity high
: rc
: ---
Assigned To: Jonathan Earl Brassow
Corey Marthaler
:
Depends On: 825323
Blocks: 749883 756082 886216
  Show dependency treegraph
 
Reported: 2011-08-19 17:32 EDT by Corey Marthaler
Modified: 2013-02-21 03:03 EST (History)
10 users (show)

See Also:
Fixed In Version: lvm2-2.02.98-4.el6
Doc Type: Bug Fix
Doc Text:
A mirror logical volume can itself have a mirrored log device. When a device in an image of the mirror and its log failed at the same time, it was possible for I/O errors to appear on the mirror LV when they should have been handled. That is, the kernel would not absorb the I/O errors from the failed device by relying on the remaining device. This would then cause file systems built on the device to respond to the I/O errors (turn read-only in the case of the ext3/4 file systems). The cause was found to be that the mirror was not suspended for repair using the 'noflush' flag. This flag allows the kernel to requeue I/O requests that need to be retried. Because the kernel was not allowed to requeue the requests, it had no choice but to return the I/O as errored. This issue has been corrected and the mirror is now properly suspended with the 'noflush' flag.
Story Points: ---
Clone Of:
: 749883 (view as bug list)
Environment:
Last Closed: 2013-02-21 03:03:19 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2013:0501 normal SHIPPED_LIVE lvm2 bug fix and enhancement update 2013-02-20 16:30:45 EST

  None (edit)
Description Corey Marthaler 2011-08-19 17:32:14 EDT
Description of problem:
After failing the primary leg and primary log device, the mirrored filesystem was converted to readonly.

Scenario: Kill primary leg and primary log of synced 2 leg redundant log mirror(s)
                                                                                                                                                
********* Mirror hash info for this scenario *********
* names:              syncd_pri_leg_pri_log_2legs_2logs_1
* sync:               1
* striped:            0
* leg devices:        /dev/sdd1 /dev/sdg1
* log devices:        /dev/sdb1 /dev/sdf1
* no MDA devices:     
* failpv(s):          /dev/sdd1 /dev/sdb1
* failnode(s):        taft-01
* leg fault policy:   allocate
* log fault policy:   allocate
******************************************************

Creating mirror(s) on taft-01...
taft-01: lvcreate --mirrorlog mirrored -m 1 -n syncd_pri_leg_pri_log_2legs_2logs_1 -L 300M helter_skelter /dev/sdd1:0-1000 /dev/sdg1:0-1000 /dev/sdb1:0-150 /dev/sdf1:0-150

PV=/dev/sdb1
        syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_0: 1.3
PV=/dev/sdd1
        syncd_pri_leg_pri_log_2legs_2logs_1_mimage_0: 5.1
PV=/dev/sdb1
        syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_0: 1.3
PV=/dev/sdd1
        syncd_pri_leg_pri_log_2legs_2logs_1_mimage_0: 5.1

Waiting until all mirrors become fully syncd...
   1/1 mirror(s) are fully synced: ( 100.00% )

Creating ext on top of mirror(s) on taft-01...
mke2fs 1.41.12 (17-May-2010)
Mounting mirrored ext filesystems on taft-01...

Writing verification files (checkit) to mirror(s) on...
        ---- taft-01 ----

<start name="taft-01_syncd_pri_leg_pri_log_2legs_2logs_1" pid="10491" time="Fri Aug 19 14:24:52 2011" type="cmd" />
Sleeping 10 seconds to get some outsanding EXT I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- taft-01 ----

Disabling device sdd on taft-01
Disabling device sdb on taft-01

<fail name="taft-01_syncd_pri_leg_pri_log_2legs_2logs_1" pid="10491" time="Fri Aug 19 14:25:13 2011" type="cmd" duration="21" ec="1" />
ALL STOP!
Attempting I/O to cause mirror down conversion(s) on taft-01
dd: opening `/mnt/syncd_pri_leg_pri_log_2legs_2logs_1/ddfile': Read-only file system


[root@taft-01 ~]# grep lvm\\[ /var/log/messages
Aug 19 14:24:29 taft-01 lvm[2188]: Monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog for events.
Aug 19 14:24:29 taft-01 lvm[2188]: helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog is now in-sync.
Aug 19 14:24:30 taft-01 lvm[2188]: No longer monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog for events.
Aug 19 14:24:30 taft-01 lvm[2188]: Monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog for events.
Aug 19 14:24:30 taft-01 lvm[2188]: Monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1 for events.
Aug 19 14:24:30 taft-01 lvm[2188]: helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog is now in-sync.
Aug 19 14:24:37 taft-01 lvm[2188]: helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1 is now in-sync.
Aug 19 14:25:12 taft-01 lvm[2188]: Primary mirror device 253:3 has failed (D).
Aug 19 14:25:12 taft-01 lvm[2188]: Device failure in helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog.
[...]
Aug 19 14:25:12 taft-01 lvm[2188]: Couldn't find device with uuid fuxQDF-cc6B-VaOb-RPsx-vKs1-NFrt-t3GU9F.
Aug 19 14:25:12 taft-01 lvm[2188]: Couldn't find device with uuid zlrl4O-67IG-AUY3-p1mq-ULMW-IB4Y-PZEcvo.
Aug 19 14:25:13 taft-01 lvm[2188]: Another thread is handling an event. Waiting...
Aug 19 14:25:15 taft-01 lvm[2188]: Mirror status: 1 of 2 images failed.
Aug 19 14:25:15 taft-01 lvm[2188]: Mirror log status: 1 of 2 images failed.
Aug 19 14:25:15 taft-01 lvm[2188]: Trying to up-convert to 2 images, 2 logs.
Aug 19 14:25:15 taft-01 lvm[2188]: Monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog for events.
Aug 19 14:25:15 taft-01 lvm[2188]: Another thread is handling an event. Waiting...
Aug 19 14:25:16 taft-01 lvm[2188]: Monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog for events.
Aug 19 14:25:16 taft-01 lvm[2188]: Another thread is handling an event. Waiting...
Aug 19 14:25:16 taft-01 lvm[2188]: helter_skelter/syncd_pri_leg_pri_log_2legs_2logs_1: Converted: 1.3%
Aug 19 14:25:32 taft-01 lvm[2188]: helter_skelter/syncd_pri_leg_pri_log_2legs_2logs_1: Converted: 100.0%
Aug 19 14:25:47 taft-01 lvm[2188]: Repair of mirrored LV helter_skelter/syncd_pri_leg_pri_log_2legs_2logs_1 finished successfully.
Aug 19 14:25:47 taft-01 lvm[2188]: Log device 253:5 has failed (D).
Aug 19 14:25:47 taft-01 lvm[2188]: Device failure in helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1.
Aug 19 14:25:47 taft-01 lvm[2188]: device-mapper: waitevent ioctl failed: No such device or address
Aug 19 14:25:47 taft-01 lvm[2188]: dm_task_run failed, errno = 6, No such device or address
Aug 19 14:25:47 taft-01 lvm[2188]: helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog disappeared, detaching
Aug 19 14:25:47 taft-01 lvm[2188]: No longer monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog for events.
Aug 19 14:25:47 taft-01 lvm[2188]: Couldn't find device with uuid fuxQDF-cc6B-VaOb-RPsx-vKs1-NFrt-t3GU9F.
Aug 19 14:25:47 taft-01 lvm[2188]: Couldn't find device with uuid zlrl4O-67IG-AUY3-p1mq-ULMW-IB4Y-PZEcvo.
Aug 19 14:25:47 taft-01 lvm[2188]: syncd_pri_leg_pri_log_2legs_2logs_1 is consistent. Nothing to repair.
Aug 19 14:25:47 taft-01 lvm[2188]: Repair of mirrored LV helter_skelter/syncd_pri_leg_pri_log_2legs_2logs_1 finished successfully.
Aug 19 14:25:47 taft-01 lvm[2188]: helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog is now in-sync.
Aug 19 14:25:47 taft-01 lvm[2188]: helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog is now in-sync.
Aug 19 14:25:47 taft-01 lvm[2188]: helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1 is now in-sync.
Aug 19 14:25:48 taft-01 lvm[2188]: No longer monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1_mlog for events.
couldn't write to syncd_pri_leg_pri_log_2legs_2logs_1


[root@taft-01 ~]# mount
/dev/mapper/helter_skelter-syncd_pri_leg_pri_log_2legs_2logs_1 on /mnt/syncd_pri_leg_pri_log_2legs_2logs_1 type ext3 (rw)


[root@taft-01 ~]# touch /mnt/syncd_pri_leg_pri_log_2legs_2logs_1/testfile
touch: cannot touch `/mnt/syncd_pri_leg_pri_log_2legs_2logs_1/testfile': Read-only file system


[root@taft-01 ~]# lvs -a -o +devices
 Couldn't find device with uuid fuxQDF-cc6B-VaOb-RPsx-vKs1-NFrt-t3GU9F.
 Couldn't find device with uuid zlrl4O-67IG-AUY3-p1mq-ULMW-IB4Y-PZEcvo.
 LV                                                  Attr   LSize   Log                                      Copy%   Devices                                                                                                  
 syncd_pri_leg_pri_log_2legs_2logs_1                 mwi-ao 300.00m syncd_pri_leg_pri_log_2legs_2logs_1_mlog 100.00  syncd_pri_leg_pri_log_2legs_2logs_1_mimage_0(0),syncd_pri_leg_pri_log_2legs_2logs_1_mimage_1(0)          
 [syncd_pri_leg_pri_log_2legs_2logs_1_mimage_0]      iwi-ao 300.00m                                                  /dev/sdg1(0)                                                                                             
 [syncd_pri_leg_pri_log_2legs_2logs_1_mimage_1]      iwi-ao 300.00m                                                  /dev/sdh1(0)                                                                                             
 [syncd_pri_leg_pri_log_2legs_2logs_1_mlog]          mwi-ao   4.00m                                          100.00  syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_0(0),syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_1(0)
 [syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_0] iwi-ao   4.00m                                                  /dev/sde1(0)                                                                                             
 [syncd_pri_leg_pri_log_2legs_2logs_1_mlog_mimage_1] iwi-ao   4.00m                                                  /dev/sdc1(0)                                                                                             


Version-Release number of selected component (if applicable):
2.6.32-188.el6.x86_64

lvm2-2.02.87-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
lvm2-libs-2.02.87-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
lvm2-cluster-2.02.87-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
udev-147-2.37.el6    BUILT: Wed Aug 10 07:48:15 CDT 2011
device-mapper-1.02.66-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
device-mapper-libs-1.02.66-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
device-mapper-event-1.02.66-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
device-mapper-event-libs-1.02.66-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
cmirror-2.02.87-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
Comment 1 Jonathan Earl Brassow 2011-10-17 12:40:12 EDT
Similar to bug 732098:

I have not hit this bug in a weekend's worth of testing (using the
helter_skelter test).

I hit a different bug where some of the mirror sub-LVs did not come out of
suspension.  This may have blocked me from seeing this bug, but I have tested
for a cumulative duration of 72+ hours and not hit this bug.

I'll continue to look for it, but in the absence of hitting it myself, it will
have to be verified by the reporter - after fixes for 746254/743112 are in
place.  (I am currently testing with the proposed fixes for 746254/743112.)

Marking as NEEDINFO until either:
1) I hit it with my continued helter_skelter testing
or
2) The reporter is able to get new rpms with the aforementioned patches and is
able to confirm this bug.
Comment 4 Corey Marthaler 2012-03-21 15:46:41 EDT
Attempting to reproduce this issue is blocked until redundant log mirrors work again (bug 794904).
Comment 6 Jonathan Earl Brassow 2012-04-27 14:55:16 EDT
bug 794904 has been cleared.
Comment 9 Corey Marthaler 2012-07-13 15:16:24 EDT
Testing for this case is blocked behind bug 825323.
Comment 10 Corey Marthaler 2012-07-26 14:47:53 EDT
Adding QA ack for 6.4.

However, testing of this bug remains blocked by bug 825323.

Devel will need to provide unit testing results however before this bug can be
ultimately verified by QA.
Comment 11 Jonathan Earl Brassow 2012-11-14 18:07:08 EST
The same fix for bug 825323 appears to fix this bug also:

...

================================================================================
Iteration 92.1 started at Wed Nov 14 17:05:27 CST 2012
================================================================================
Scenario kill_pri_log_and_pri_leg_2_legs_2_logs: Kill primary leg and primary log of synced 2 leg redundant log mirror(s)

********* Mirror hash info for this scenario *********
* names:              syncd_pri_leg_pri_log_2legs_2logs_1 syncd_pri_leg_pri_log_2legs_2logs_2 syncd_pri_leg_pri_log_2legs_2logs_3
* sync:               1
* striped:            0
* leg devices:        /dev/sdc1 /dev/sdf1
* log devices:        /dev/sdh1 /dev/sde1
* no MDA devices:     
* failpv(s):          /dev/sdc1 /dev/sdh1
* failnode(s):        bp-01
* leg fault policy:   remove
* log fault policy:   remove
******************************************************
...
Comment 14 Nenad Peric 2012-12-19 13:05:33 EST
Ran 10 iterations of the tests without problems:

================================================================================
Iteration 9.1 started at Wed Dec 19 11:59:28 CST 2012
================================================================================
Scenario kill_pri_log_and_pri_leg_2_legs_2_logs: Kill primary leg and primary log of synced 2 leg redundant log mirror(s)

********* Mirror hash info for this scenario *********
* names:              syncd_pri_leg_pri_log_2legs_2logs_1
* sync:               1
* striped:            0
* leg devices:        /dev/sdh1 /dev/sdf1
* log devices:        /dev/sdb1 /dev/sdd1
* no MDA devices:     
* failpv(s):          /dev/sdh1 /dev/sdb1
* failnode(s):        r6-node01
* additional snap:    /dev/sdf1
* leg fault policy:   remove
* log fault policy:   remove
******************************************************

Verified with:

lvm2-2.02.98-6.el6.x86_64
device-mapper-1.02.77-6.el6.x86_64
kernel-2.6.32-347.el6.x86_64
Comment 15 errata-xmlrpc 2013-02-21 03:03:19 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0501.html

Note You need to log in before you can comment on or make changes to this bug.