Bug 619574

Summary: failing both redundant mirror logs can cause inconsistent log allocation results
Product: Red Hat Enterprise Linux 6 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Jonathan Earl Brassow <jbrassow>
Status: CLOSED ERRATA QA Contact: Corey Marthaler <cmarthal>
Severity: high Docs Contact:
Priority: high    
Version: 6.0CC: agk, bmarzins, coughlan, dwysocha, heinzm, jbrassow, joe.thornber, prajnoha, prockai, slevine, syeghiay
Target Milestone: rcKeywords: TestBlocker
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: lvm2-2.02.98-4.el6 Doc Type: Bug Fix
Doc Text:
An LVM mirror can be created with three different types of log devices: core (in-memory), disk, and mirrored. The 'mirrored' log is itself redundant and resides on two different physical volumes. Previously, if both devices composing the mirror log were lost, they were not always properly replace during repair - even if spare devices existed. This issue has been resolved and a 'mirrored' log will be properly replaced with a 'mirrored' log if there are sufficient replacement PVs.
Story Points: ---
Clone Of:
: 644071 (view as bug list) Environment:
Last Closed: 2013-02-21 08:07:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 844493    
Bug Blocks: 756082, 840699, 886216    
Attachments:
Description Flags
log from taft-03 none

Description Corey Marthaler 2010-07-29 20:33:01 UTC
Description of problem:
This issues appears to happen about 20-30% of the time.

Scenario: Kill both logs of synced 2 leg redundant log mirror(s)

********* Mirror hash info for this scenario *********
* names:              syncd_both_logs_2legs_2logs_1
* sync:               1
* leg devices:        /dev/sdg1 /dev/sdc1
* log devices:        /dev/sdh1 /dev/sde1
* failpv(s):          /dev/sdh1 /dev/sde1
* failnode(s):        taft-03
* leg fault policy:   remove
* log fault policy:   allocate
******************************************************

Creating mirror(s) on taft-03...
taft-03: lvcreate --mirrorlog mirrored -m 1 -n syncd_both_logs_2legs_2logs_1 -L 600M helter_skelter /dev/sdg1:0-1000 /dev/sdc1:0-1000 /dev/sdh1:0-150 /dev/sde1:0-150

PV=/dev/sde1
        syncd_both_logs_2legs_2logs_1_mlog_mimage_1: 1.2
PV=/dev/sdh1
        syncd_both_logs_2legs_2logs_1_mlog_mimage_0: 1.2
PV=/dev/sde1
        syncd_both_logs_2legs_2logs_1_mlog_mimage_1: 1.2
PV=/dev/sdh1
        syncd_both_logs_2legs_2logs_1_mlog_mimage_0: 1.2

Waiting until all mirrors become fully syncd...
   0/1 mirror(s) are fully synced: ( 84.00% )
   1/1 mirror(s) are fully synced: ( 100.00% )

Creating ext on top of mirror(s) on taft-03...
mke2fs 1.41.12 (17-May-2010)
Mounting mirrored ext filesystems on taft-03...

Writing verification files (checkit) to mirror(s) on...
        ---- taft-03 ----

<start name="taft-03_syncd_both_logs_2legs_2logs_1" pid="23572" time="Thu Jul 29 14:19:54 2010" type="cmd" />

Sleeping 10 seconds to get some outsanding EXT I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- taft-03 ----

Disabling device sdh on taft-03
Disabling device sde on taft-03

Attempting I/O to cause mirror down conversion(s) on taft-03
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.230363 s, 182 MB/s
Verifying current sanity of lvm after the failure
  /dev/sde1: open failed: No such device or address
Verifying FAILED device /dev/sdh1 is *NOT* in the volume(s)
  /dev/sde1: open failed: No such device or address
Verifying FAILED device /dev/sde1 is *NOT* in the volume(s)
  /dev/sde1: open failed: No such device or address
Verifying LOG device(s) /dev/sdh1 /dev/sde1 are *NOT* in the linear(s)
  /dev/sde1: open failed: No such device or address
  /dev/sde1: open failed: No such device or address
Verifying LEG device /dev/sdg1 *IS* in the volume(s)
  /dev/sde1: open failed: No such device or address
Verifying LEG device /dev/sdc1 *IS* in the volume(s)
  /dev/sde1: open failed: No such device or address
verify the newly allocated dm devices were added as a result of the failures
Checking EXISTENCE of syncd_both_logs_2legs_2logs_1_mlog_mimage_1 on:  taft-03
syncd_both_logs_2legs_2logs_1_mlog_mimage_1 on taft-03 should now exist

Based on the logs, it appears that lvm attempted the mirrored log allocation but failed and then settled on the single device mirror:

lvm[2279]: Trying to up-convert to 2 images, 2 logs.
lvm[2279]: Monitoring mirror device helter_skelter-syncd_both_logs_2legs_2logs_1_mlog for events.
lvm[2279]: dm_task_run failed, errno = 6, No such device or address
lvm[2279]: dm_task_run failed, errno = 6, No such device or address
lvm[2279]: helter_skelter-syncd_both_logs_2legs_2logs_1_mlog disappeared, detaching
lvm[2279]: No longer monitoring mirror device helter_skelter-syncd_both_logs_2legs_2logs_1_mlog for events.
lvm[2279]: Failed to lock syncd_both_logs_2legs_2logs_1

lvm[2279]: Trying to up-convert to 2 images, 1 logs.
lvm[2279]: Monitoring mirror device helter_skelter-syncd_both_logs_2legs_2logs_1 for events.
lvm[2279]: Another thread is handling an event. Waiting...
lvm[2279]: WARNING: Failed to replace 1 of 2 logs in volume syncd_both_logs_2legs_2logs_1
lvm[2279]: 2 missing and now unallocated Physical Volumes removed from VG.
lvm[2279]: Repair of mirrored LV helter_skelter/syncd_both_logs_2legs_2logs_1 finished successfully.
lvm[2279]: Log device 253:5 has failed (D).
lvm[2279]: Device failure in helter_skelter-syncd_both_logs_2legs_2logs_1.
lvm[2279]: dm_task_run failed, errno = 6, No such device or address
lvm[2279]: helter_skelter-syncd_both_logs_2legs_2logs_1_mlog disappeared, detaching
lvm[2279]: No longer monitoring mirror device helter_skelter-syncd_both_logs_2legs_2logs_1_mlog for events.
lvm[2279]: Mirrored transient status: "2 253:6 253:7 1200/1200 1 AA 3 disk 253:3 A"
lvm[2279]: syncd_both_logs_2legs_2logs_1 is consistent. Nothing to repair.
lvm[2279]: Repair of mirrored LV helter_skelter/syncd_both_logs_2legs_2logs_1 finished successfully.

I'll attach the full log...

Version-Release number of selected component (if applicable):
2.6.32-52.el6.x86_64

lvm2-2.02.72-3.el6    BUILT: Wed Jul 28 15:39:43 CDT 2010
lvm2-libs-2.02.72-3.el6    BUILT: Wed Jul 28 15:39:43 CDT 2010
lvm2-cluster-2.02.72-3.el6    BUILT: Wed Jul 28 15:39:43 CDT 2010
udev-147-2.21.el6    BUILT: Mon Jul 12 04:55:00 CDT 2010
device-mapper-1.02.53-3.el6    BUILT: Wed Jul 28 15:39:43 CDT 2010
device-mapper-libs-1.02.53-3.el6    BUILT: Wed Jul 28 15:39:43 CDT 2010
device-mapper-event-1.02.53-3.el6    BUILT: Wed Jul 28 15:39:43 CDT 2010
device-mapper-event-libs-1.02.53-3.el6    BUILT: Wed Jul 28 15:39:43 CDT 2010
cmirror-2.02.72-3.el6    BUILT: Wed Jul 28 15:39:43 CDT 2010

Comment 1 Corey Marthaler 2010-07-29 20:37:20 UTC
Created attachment 435395 [details]
log from taft-03

Comment 4 Jonathan Earl Brassow 2010-08-07 03:10:27 UTC
Seems to be boiling down to a udev issue, specifically:
activate/dev_manager.c:_tree_action->dm_udev_wait (line ~1679)

I can't see any messages from that function indicating failure, however...

Aug  6 21:46:49 bp-01 lvm[29239]: Checking kernel supports mirror segment type for syncd_both_logs_2legs_2logs_1
Aug  6 21:46:49 bp-01 lvm[29239]: Checking kernel supports mirror segment type for syncd_both_logs_2legs_2logs_1_mlog
Aug  6 21:46:49 bp-01 lvm[29239]: Checking kernel supports striped segment type for syncd_both_logs_2legs_2logs_1_mlog_mimage_0
Aug  6 21:46:49 bp-01 lvm[29239]: activate/dev_manager.c [1499]: Calculated readahead of LV syncd_both_logs_2legs_2logs_1_mlog_mimage_0 is 256
Aug  6 21:46:49 bp-01 lvm[29239]: Checking kernel supports striped segment type for syncd_both_logs_2legs_2logs_1_mlog_mimage_1
Aug  6 21:46:49 bp-01 lvm[29239]: activate/dev_manager.c [1499]: Calculated readahead of LV syncd_both_logs_2legs_2logs_1_mlog_mimage_1 is 256
Aug  6 21:46:49 bp-01 lvm[29239]: Setting activation/mirror_region_size to 512
Aug  6 21:46:49 bp-01 lvm[29239]: activate/dev_manager.c [1499]: Calculated readahead of LV syncd_both_logs_2legs_2logs_1_mlog is 256
Aug  6 21:46:49 bp-01 lvm[29239]: Checking kernel supports striped segment type for syncd_both_logs_2legs_2logs_1_mimage_0
Aug  6 21:46:49 bp-01 lvm[29239]: activate/dev_manager.c [1499]: Calculated readahead of LV syncd_both_logs_2legs_2logs_1_mimage_0 is 256
Aug  6 21:46:49 bp-01 lvm[29239]: Checking kernel supports striped segment type for syncd_both_logs_2legs_2logs_1_mimage_1
Aug  6 21:46:49 bp-01 lvm[29239]: activate/dev_manager.c [1499]: Calculated readahead of LV syncd_both_logs_2legs_2logs_1_mimage_1 is 256
Aug  6 21:46:49 bp-01 lvm[29239]: activate/dev_manager.c [1499]: Calculated readahead of LV syncd_both_logs_2legs_2logs_1 is 256
Aug  6 21:46:49 bp-01 lvm[29239]: activate/dev_manager.c [1679]: <backtrace>
Aug  6 21:46:49 bp-01 lvm[29239]: activate/activate.c [622]: <backtrace>
Aug  6 21:46:49 bp-01 lvm[29239]: activate/activate.c [915]: <backtrace>
Aug  6 21:46:49 bp-01 lvm[29239]: locking/locking.c [389]: <backtrace>
Aug  6 21:46:49 bp-01 lvm[29239]: Failed to lock syncd_both_logs_2legs_2logs_1
Aug  6 21:46:49 bp-01 lvm[29239]: Wiping pre-committed helter_skelter metadata from /dev/sdd1 header at 4096
Aug  6 21:46:49 bp-01 lvm[29239]: Wiping pre-committed helter_skelter metadata from /dev/sde1 header at 4096
Aug  6 21:46:49 bp-01 lvm[29239]: Wiping pre-committed helter_skelter metadata from /dev/sdf1 header at 4096
Aug  6 21:46:49 bp-01 lvm[29239]: Wiping pre-committed helter_skelter metadata from /dev/sdh1 header at 4096
Aug  6 21:46:49 bp-01 lvm[29239]: Wiping pre-committed helter_skelter metadata from /dev/sdi1 header at 4096
Aug  6 21:46:49 bp-01 lvm[29239]: XXX: _reload_lv [fail]

Comment 5 Jonathan Earl Brassow 2010-08-09 14:39:01 UTC
I think I'm wrong about that... I didn't notice the '_' in 'goto_out' - which is a macro for 'stack; goto out;'... so the problem is deeper in dm_tree_preload_children.

Comment 6 Jonathan Earl Brassow 2010-08-10 19:47:09 UTC
I don't seem to be getting log messages from the functions in libdevmapper.  I do get log messages from LVM code, but not from functions under the libdm dir.  I'm not sure where the problem is on this, but this is unique to passing the log messages through to dmeventd.

Comment 7 Jonathan Earl Brassow 2010-08-10 21:20:41 UTC
I've found where things are failing... what I don't know now is whether it is failing because of conditions that were caused earlier in processing.  The failure occurs in the same spot every time:

lib/activate/dev_manager.c: dev_manager_preload
lib/activate/dev_manager.c: _tree_action (PRELOAD)
libdm/libdm-deptree.c     : dm_tree_preload_children
libdm/libdm-deptree.c     : _create_node
libdm/ioctl/libdm-iface.c : dm_task_run
libdm/ioctl/libdm-iface.c : _do_dm_ioctl
--------------------------> ioctl: EBUSY

I'm not sure why there is an EBUSY, a 'dmsetup info' of the offending device gives:
Name:              helter_skelter-syncd_both_logs_2legs_2logs_1_mlog_mimage_0
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        0
Event number:      0
Major, minor:      253, 3
Number of targets: 1
UUID: LVM-WjxdPT3yg6cQMvoDLtMlR39r0O8u0Z4BIpLGjhMfl6MCI9GnwKncw7PrPtn89BEo

You can get EBUSY from the kernel when calling dm_hash_insert/dm_hash_rename.  This suggests that it's not the PRELOAD that is causing the problem, but some previous operation that failed to remove the hash entry...

Comment 8 Jonathan Earl Brassow 2010-08-11 13:52:19 UTC
tests ran all night with latest kernel... may not be reproducible with latest kernel.  Please verify.

Comment 10 Corey Marthaler 2010-08-16 22:48:08 UTC
I've run this scenario a few times and have found inconsistent results. 

Some of the times the double failed log device allocate works (hinting that this bug is now fixed), other times it is only able to allocate one log device (even though there are other free unused PVs in the VG (hinting that there is still some oddity here).

Either way the repair failures that used to exist in this bug no longer appear. I'll continue to test to make sure, but we need to figure out why the allocate acts differently during what appears to be the same situations so that the test can logically verify test results.

Comment 12 Jonathan Earl Brassow 2010-08-17 14:40:53 UTC
Please be aware, that there is currently no disk -> mirrored log conversion possible (this is a feature request bug for rhel6.1).  If a single device of the mirrored log is failed, there is currently no automated way for it to go back to 'mirrored'.  Counter-intuitively, if you fail both devices of a mirrored log, it can return to 'mirrored' because core->mirrored conversion is possible.

Take a look and see if this is what is happening, as this will determine if this bug can be closed/verified in favor of the feature request, or is still outstanding.

Comment 13 Corey Marthaler 2010-08-17 18:56:35 UTC
Jon this bug only refers to *both* log devices being failed. At no time is only 1 log device failed in this scenario.

Comment 21 Corey Marthaler 2011-10-04 17:53:33 UTC
Quick update on the latest 6.2 rpms before attempting with the new scratch build... like mentioned in comment #10, inconsistencies still exist when both legs of the mirrored log are failed. Sometimes the allocation works, and sometimes it fails all together, leaving only a linear without any log, and a corrupted filesystem.

[root@taft-01 syncd_both_logs_2legs_2logs_1]# touch  /mnt/syncd_both_logs_2legs_2logs_1/foobar
touch: setting times of `/mnt/syncd_both_logs_2legs_2logs_1/foobar': No such file or directory

2.6.32-203.el6.x86_64

lvm2-2.02.87-3.el6    BUILT: Wed Sep 21 09:54:55 CDT 2011
lvm2-libs-2.02.87-3.el6    BUILT: Wed Sep 21 09:54:55 CDT 2011
lvm2-cluster-2.02.87-3.el6    BUILT: Wed Sep 21 09:54:55 CDT 2011
udev-147-2.40.el6    BUILT: Fri Sep 23 07:51:13 CDT 2011
device-mapper-1.02.66-3.el6    BUILT: Wed Sep 21 09:54:55 CDT 2011
device-mapper-libs-1.02.66-3.el6    BUILT: Wed Sep 21 09:54:55 CDT 2011
device-mapper-event-1.02.66-3.el6    BUILT: Wed Sep 21 09:54:55 CDT 2011
device-mapper-event-libs-1.02.66-3.el6    BUILT: Wed Sep 21 09:54:55 CDT 2011
cmirror-2.02.87-3.el6    BUILT: Wed Sep 21 09:54:55 CDT 2011

Comment 22 Corey Marthaler 2011-10-04 18:37:01 UTC
After further pre-patch investigation, it appears that these issues only happen when dealing with multiple mirrors having both log legs fail. When only one mirror has both log legs fail, the allocation works 12/12 iterations attempted.

Comment 23 Corey Marthaler 2011-10-04 21:48:05 UTC
I didn't see any problems with the latest scratch build. 20/20 iterations passed where two different mirrors had both of their mirror log devices failed, and new log devices allocated. 

That said, 17/20 iterations also passed without the new scratch build, so it's hard to tell for sure.

2.6.32-203.el6.x86_64

lvm2-2.02.87-3.1.el6    BUILT: Tue Oct  4 09:31:19 CDT 2011
lvm2-libs-2.02.87-3.1.el6    BUILT: Tue Oct  4 09:31:19 CDT 2011
lvm2-cluster-2.02.87-3.1.el6    BUILT: Tue Oct  4 09:31:19 CDT 2011
udev-147-2.40.el6    BUILT: Fri Sep 23 07:51:13 CDT 2011
device-mapper-1.02.66-3.1.el6    BUILT: Tue Oct  4 09:31:19 CDT 2011
device-mapper-libs-1.02.66-3.1.el6    BUILT: Tue Oct  4 09:31:19 CDT 2011
device-mapper-event-1.02.66-3.1.el6    BUILT: Tue Oct  4 09:31:19 CDT 2011
device-mapper-event-libs-1.02.66-3.1.el6    BUILT: Tue Oct  4 09:31:19 CDT 2011
cmirror-2.02.87-3.1.el6    BUILT: Tue Oct  4 09:31:19 CDT 2011

Comment 30 Jonathan Earl Brassow 2012-07-30 22:27:03 UTC
Refusing any devel acks for this bug until bug 844493 is cleared.  This bug is likely a symptom of the 'dm_task_run' errors.

Comment 34 Jonathan Earl Brassow 2012-11-27 22:27:09 UTC
So far so good...


================================================================================
Iteration 27.1 started at Tue Nov 27 16:22:46 CST 2012
================================================================================
Scenario kill_both_logs_2_legs_2_logs: Kill both logs of synced 2 leg redundant log mirror(s)

********* Mirror hash info for this scenario *********
* names:              syncd_both_logs_2legs_2logs_1
* sync:               1
* striped:            0
* leg devices:        /dev/sde1 /dev/sdc1
* log devices:        /dev/sdd1 /dev/sdg1
* no MDA devices:     
* failpv(s):          /dev/sdd1 /dev/sdg1
* failnode(s):        bp-01
* additional snap:    /dev/sde1
* leg fault policy:   remove
* log fault policy:   remove
******************************************************

There have been some changes to mirror code this release, but I wouldn't be able to tell which has affected this bug.  I've had this issue before - trouble reproducing this bug.  However, I have new/different machines now which reduces the likelyhood that it is just my machines that can't reproduce the issue.

Comment 35 Jonathan Earl Brassow 2012-11-28 14:06:02 UTC
I'm going to mark this one as MODIFIED.  Best guess is that the fix is attributed to the following commit:
 commit 54c73b7723713f43413584d59ca0bdd42c1d8241
 Author: Jonathan Brassow <jbrassow>
 Date:   Wed Nov 14 14:58:47 2012 -0600
 Bug: 825323
That commit changes the way mirrored log failures are handled and would relate closely to this bug.  It is possible the fix is from another check-in though.

================================================================================
Iteration 328.1 started at Wed Nov 28 08:00:46 CST 2012
================================================================================
Scenario kill_both_logs_2_legs_2_logs: Kill both logs of synced 2 leg redundant log mirror(s)

********* Mirror hash info for this scenario *********
* names:              syncd_both_logs_2legs_2logs_1 syncd_both_logs_2legs_2logs_2
* sync:               1
* striped:            0
* leg devices:        /dev/sdb1 /dev/sdh1
* log devices:        /dev/sdd1 /dev/sdf1
* no MDA devices:     
* failpv(s):          /dev/sdd1 /dev/sdf1
* failnode(s):        bp-01
* leg fault policy:   remove
* log fault policy:   remove
******************************************************

Comment 42 Corey Marthaler 2012-12-21 20:49:47 UTC
Saw no issues while running this test case on the latest rpms. Marking verified.

2.6.32-348.el6.x86_64
lvm2-2.02.98-6.el6    BUILT: Thu Dec 20 07:00:04 CST 2012
lvm2-libs-2.02.98-6.el6    BUILT: Thu Dec 20 07:00:04 CST 2012
lvm2-cluster-2.02.98-6.el6    BUILT: Thu Dec 20 07:00:04 CST 2012
udev-147-2.43.el6    BUILT: Thu Oct 11 05:59:38 CDT 2012
device-mapper-1.02.77-6.el6    BUILT: Thu Dec 20 07:00:04 CST 2012
device-mapper-libs-1.02.77-6.el6    BUILT: Thu Dec 20 07:00:04 CST 2012
device-mapper-event-1.02.77-6.el6    BUILT: Thu Dec 20 07:00:04 CST 2012
device-mapper-event-libs-1.02.77-6.el6    BUILT: Thu Dec 20 07:00:04 CST 2012
cmirror-2.02.98-6.el6    BUILT: Thu Dec 20 07:00:04 CST 2012

================================================================================
Iteration 10.1 started at Fri Dec 21 14:38:49 CST 2012
================================================================================
Scenario kill_both_logs_2_legs_2_logs: Kill both logs of synced 2 leg redundant log mirror(s)

********* Mirror hash info for this scenario *********
* names:              syncd_both_logs_2legs_2logs_1 syncd_both_logs_2legs_2logs_2 syncd_both_logs_2legs_2logs_3
* sync:               1
* striped:            0
* leg devices:        /dev/sdd1 /dev/sde1
* log devices:        /dev/sdf1 /dev/sdb1
* no MDA devices:     
* failpv(s):          /dev/sdf1 /dev/sdb1
* failnode(s):        taft-01
* additional snap:    /dev/sdd1
* leg fault policy:   remove
* log fault policy:   allocate
******************************************************

Creating mirror(s) on taft-01...
taft-01: lvcreate --mirrorlog mirrored -m 1 -n syncd_both_logs_2legs_2logs_1 -L 500M helter_skelter /dev/sdd1:0-1000 /dev/sde1:0-1000 /dev/sdf1:0-150 /dev/sdb1:0-150
taft-01: lvcreate --mirrorlog mirrored -m 1 -n syncd_both_logs_2legs_2logs_2 -L 500M helter_skelter /dev/sdd1:0-1000 /dev/sde1:0-1000 /dev/sdf1:0-150 /dev/sdb1:0-150
taft-01: lvcreate --mirrorlog mirrored -m 1 -n syncd_both_logs_2legs_2logs_3 -L 500M helter_skelter /dev/sdd1:0-1000 /dev/sde1:0-1000 /dev/sdf1:0-150 /dev/sdb1:0-150
[...]

Comment 43 errata-xmlrpc 2013-02-21 08:07:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0501.html