Bug 640101 - mirror leg allocation fails when 1/3 legs is failed along with 1/2 logs
Summary: mirror leg allocation fails when 1/3 legs is failed along with 1/2 logs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: lvm2
Version: 5.5
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Jonathan Earl Brassow
QA Contact: Corey Marthaler
URL:
Whiteboard:
Depends On: 625192
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-10-04 19:55 UTC by Corey Marthaler
Modified: 2011-07-21 12:28 UTC (History)
10 users (show)

Fixed In Version: lvm2-2.02.84-1.el5
Doc Type: Bug Fix
Doc Text:
This field is the basis of the errata or release note for this bug. It can also be used for change logs. The Technical Note template, known as CCFR, is as follows: Cause What actions or circumstances cause this bug to present. Consequence What happens when the bug presents. Fix What was done to fix the bug. Result What now happens when the actions or circumstances above occur. Note: this is not the same as the bug doesn’t present anymore.
Clone Of: 625192
Environment:
Last Closed: 2011-07-21 10:50:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:1071 0 normal SHIPPED_LIVE lvm2 bug fix and enhancement update 2011-07-21 10:50:01 UTC

Description Corey Marthaler 2010-10-04 19:55:39 UTC
This bug exists in rhel5.5 as well.

 Oct  4 14:39:04 taft-03 lvm[17855]: Monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_3legs_2logs_1 for events.
Oct  4 14:39:04 taft-03 lvm[17855]: Another thread is handling an event. Waiting...
Oct  4 14:39:04 taft-03 lvm[17855]: Trying to up-convert to 3 images, 2 logs.
Oct  4 14:39:04 taft-03 lvm[17855]: Adding log redundancy not supported yet.
Oct  4 14:39:04 taft-03 lvm[17855]: Try converting the log to 'core' first.
Oct  4 14:39:04 taft-03 lvm[17855]: Trying to up-convert to 2 images, 2 logs.
Oct  4 14:39:04 taft-03 lvm[17855]: Adding log redundancy not supported yet.
Oct  4 14:39:04 taft-03 lvm[17855]: Try converting the log to 'core' first.
Oct  4 14:39:04 taft-03 lvm[17855]: Trying to up-convert to 2 images, 1 logs.
Oct  4 14:39:05 taft-03 lvm[17855]: Monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_3legs_2logs_1 for events.
Oct  4 14:39:05 taft-03 lvm[17855]: Another thread is handling an event. Waiting...
Oct  4 14:39:05 taft-03 lvm[17855]: WARNING: Failed to replace 1 of 3 images in volume syncd_pri_leg_pri_log_3legs_2logs_1
Oct  4 14:39:05 taft-03 lvm[17855]: WARNING: Failed to replace 1 of 2 logs in volume syncd_pri_leg_pri_log_3legs_2logs_1
Oct  4 14:39:05 taft-03 lvm[17855]: 2 missing and now unallocated Physical Volumes removed from VG.
Oct  4 14:39:05 taft-03 lvm[17855]: Repair of mirrored LV helter_skelter/syncd_pri_leg_pri_log_3legs_2logs_1 finished successfully.
Oct  4 14:39:05 taft-03 lvm[17855]: Primary mirror device 253:5 has failed (D).
Oct  4 14:39:05 taft-03 lvm[17855]: Device failure in helter_skelter-syncd_pri_leg_pri_log_3legs_2logs_1.
Oct  4 14:39:05 taft-03 lvm[17855]: syncd_pri_leg_pri_log_3legs_2logs_1 is consistent. Nothing to repair.
Oct  4 14:39:05 taft-03 lvm[17855]: Repair of mirrored LV helter_skelter/syncd_pri_leg_pri_log_3legs_2logs_1 finished successfully.
Oct  4 14:39:05 taft-03 lvm[17855]: helter_skelter-syncd_pri_leg_pri_log_3legs_2logs_1 is now in-sync.
Oct  4 14:39:05 taft-03 lvm[17855]: helter_skelter-syncd_pri_leg_pri_log_3legs_2logs_1 is now in-sync.
Oct  4 14:39:05 taft-03 lvm[17855]: dm_task_run failed, errno = 9, Bad file descriptor

2.6.18-194.11.3.el5

lvm2-2.02.73-2.el5    BUILT: Mon Aug 30 06:36:20 CDT 2010
lvm2-cluster-2.02.73-2.el5    BUILT: Mon Aug 30 06:38:05 CDT 2010
device-mapper-1.02.54-2.el5    BUILT: Fri Sep 10 12:00:05 CDT 2010
cmirror-1.1.39-10.el5    BUILT: Wed Sep  8 16:32:05 CDT 2010
kmod-cmirror-0.1.22-3.el5    BUILT: Tue Dec 22 13:39:47 CST 2009



+++ This bug was initially created as a clone of Bug #625192 +++

Description of problem:

./helter_skelter -l /home/msp/cmarthal/work/rhel6/sts-root -r /usr/tests/sts-rhel6.0/ -o taft-02 -e kill_pri_log_and_pri_leg_2_legs_2_logs -e kill_pri_log_and_pri_leg_3_legs_2_logs

Scenario: Kill primary leg and primary log of synced 3 leg redundant log mirror(s)

********* Mirror hash info for this scenario *********
* names:              syncd_pri_leg_pri_log_3legs_2logs_1
* sync:               1
* leg devices:        /dev/sdf1 /dev/sdc1 /dev/sdb1
* log devices:        /dev/sdd1 /dev/sdh1
* failpv(s):          /dev/sdf1 /dev/sdd1
* failnode(s):        taft-02
* leg fault policy:   allocate
* log fault policy:   allocate
******************************************************

Creating mirror(s) on taft-02...
taft-02: lvcreate --mirrorlog mirrored -m 2 -n syncd_pri_leg_pri_log_3legs_2logs_1 -L 600M helter_skelter /dev/sdf1:0-1000 /dev/sdc1:0-1000 /dev/sdb1:0-1000 /dev/sdd1:0-150 /dev/sdh1:0-150

PV=/dev/sdd1
        syncd_pri_leg_pri_log_3legs_2logs_1_mlog_mimage_0: 1.3
PV=/dev/sdf1
        syncd_pri_leg_pri_log_3legs_2logs_1_mimage_0: 4
PV=/dev/sdd1
        syncd_pri_leg_pri_log_3legs_2logs_1_mlog_mimage_0: 1.3
PV=/dev/sdf1
        syncd_pri_leg_pri_log_3legs_2logs_1_mimage_0: 4

Waiting until all mirrors become fully syncd...
   0/1 mirror(s) are fully synced: ( 64.50% )
   1/1 mirror(s) are fully synced: ( 100.00% )

Creating ext on top of mirror(s) on taft-02...
mke2fs 1.41.12 (17-May-2010)
Mounting mirrored ext filesystems on taft-02...

Writing verification files (checkit) to mirror(s) on...
        ---- taft-02 ----

<start name="taft-02_syncd_pri_leg_pri_log_3legs_2logs_1" pid="30228" time="Wed Aug 18 15:00:15 2010" type="cmd" />
Sleeping 10 seconds to get some outsanding EXT I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- taft-02 ----

Disabling device sdf on taft-02
Disabling device sdd on taft-02

Attempting I/O to cause mirror down conversion(s) on taft-02
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.235638 s, 178 MB/s
Verifying current sanity of lvm after the failure
  /dev/sdf1: open failed: No such device or address
Verifying FAILED device /dev/sdf1 is *NOT* in the volume(s)
  /dev/sdf1: open failed: No such device or address
Verifying FAILED device /dev/sdd1 is *NOT* in the volume(s)
  /dev/sdf1: open failed: No such device or address
Verifying LOG device(s) /dev/sdh1 *ARE* in the mirror(s)
  /dev/sdf1: open failed: No such device or address
Verifying LEG device /dev/sdc1 *IS* in the volume(s)
  /dev/sdf1: open failed: No such device or address
Verifying LEG device /dev/sdb1 *IS* in the volume(s)
  /dev/sdf1: open failed: No such device or address
verify the dm devices associated with /dev/sdf1 /dev/sdd1 have been removed as expected
Checking REMOVAL of syncd_pri_leg_pri_log_3legs_2logs_1_mlog_mimage_0 on:  taft-02
Checking REMOVAL of syncd_pri_leg_pri_log_3legs_2logs_1_mimage_0 on:  taft-02
verify the newly allocated dm devices were added as a result of the failures
Checking EXISTENCE of syncd_pri_leg_pri_log_3legs_2logs_1_mimage_3 on:  taft-02
syncd_pri_leg_pri_log_3legs_2logs_1_mimage_3 on taft-02 should now exist


Mirror has 3 legs before the failure:
[root@taft-02 ~]# lvs -a -o +devices
  LV                                                  VG             Attr   LSize   Log                                      Copy%  Devices
  syncd_pri_leg_pri_log_3legs_2logs_1                 helter_skelter mwi-a- 600.00m syncd_pri_leg_pri_log_3legs_2logs_1_mlog 1.33   syncd_pri_leg_pri_log_3legs_2logs_1_mimage_0(0),syncd_pri_leg_pri_log_3legs_2logs_1_mimage_1(0),syncd_pri_leg_pri_log_3legs_2logs_1_mimage_2(0)
  [syncd_pri_leg_pri_log_3legs_2logs_1_mimage_0]      helter_skelter Iwi-ao 600.00m                                                 /dev/sdf1(0)
  [syncd_pri_leg_pri_log_3legs_2logs_1_mimage_1]      helter_skelter Iwi-ao 600.00m                                                 /dev/sdc1(0)
  [syncd_pri_leg_pri_log_3legs_2logs_1_mimage_2]      helter_skelter Iwi-ao 600.00m                                                 /dev/sdb1(0)
  [syncd_pri_leg_pri_log_3legs_2logs_1_mlog]          helter_skelter mwi-ao   4.00m                                          100.00 syncd_pri_leg_pri_log_3legs_2logs_1_mlog_mimage_0(0),syncd_pri_leg_pri_log_3legs_2logs_1_mlog_mimage_1(0)
  [syncd_pri_leg_pri_log_3legs_2logs_1_mlog_mimage_0] helter_skelter iwi-ao   4.00m                                                 /dev/sdd1(0)
  [syncd_pri_leg_pri_log_3legs_2logs_1_mlog_mimage_1] helter_skelter iwi-ao   4.00m                                                 /dev/sdh1(0)

But fails to allocate a new one due to log up conversion issues:
[root@taft-02 ~]# lvs -a -o +devices
  /dev/sdf1: open failed: No such device or address
  LV                                             VG             Attr   LSize   Log                                      Copy%  Devices
  syncd_pri_leg_pri_log_3legs_2logs_1            helter_skelter mwi-ao 600.00m syncd_pri_leg_pri_log_3legs_2logs_1_mlog 100.00 syncd_pri_leg_pri_log_3legs_2logs_1_mimage_1(0),syncd_pri_leg_pri_log_3legs_2logs_1_mimage_2(0)
  [syncd_pri_leg_pri_log_3legs_2logs_1_mimage_1] helter_skelter iwi-ao 600.00m                                                 /dev/sdc1(0)
  [syncd_pri_leg_pri_log_3legs_2logs_1_mimage_2] helter_skelter iwi-ao 600.00m                                                 /dev/sdb1(0)
  [syncd_pri_leg_pri_log_3legs_2logs_1_mlog]     helter_skelter lwi-ao   4.00m                                                 /dev/sdh1(0)


It appears to fail because it can't up convert the log at the same time:

Aug 18 14:59:28 taft-02 lvm[3671]: Monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_3legs_2logs_1 for events.
Aug 18 14:59:28 taft-02 lvm[3671]: Another thread is handling an event. Waiting...
Aug 18 14:59:28 taft-02 lvm[3671]: Trying to up-convert to 3 images, 2 logs.
Aug 18 14:59:28 taft-02 lvm[3671]: Adding log redundancy not supported yet.
Aug 18 14:59:28 taft-02 lvm[3671]: Try converting the log to 'core' first.
Aug 18 14:59:28 taft-02 lvm[3671]: Trying to up-convert to 2 images, 2 logs.
Aug 18 14:59:28 taft-02 lvm[3671]: Adding log redundancy not supported yet.
Aug 18 14:59:28 taft-02 lvm[3671]: Try converting the log to 'core' first.
Aug 18 14:59:28 taft-02 lvm[3671]: Trying to up-convert to 2 images, 1 logs.
Aug 18 14:59:28 taft-02 lvm[3671]: Monitoring mirror device helter_skelter-syncd_pri_leg_pri_log_3legs_2logs_1 for events.
Aug 18 14:59:28 taft-02 lvm[3671]: Another thread is handling an event. Waiting...
Aug 18 14:59:28 taft-02 lvm[3671]: WARNING: Failed to replace 1 of 3 images in volume syncd_pri_leg_pri_log_3legs_2logs_1
Aug 18 14:59:28 taft-02 lvm[3671]: WARNING: Failed to replace 1 of 2 logs in volume syncd_pri_leg_pri_log_3legs_2logs_1
Aug 18 14:59:28 taft-02 lvm[3671]: 2 missing and now unallocated Physical Volumes removed from VG.
Aug 18 14:59:28 taft-02 lvm[3671]: Repair of mirrored LV helter_skelter/syncd_pri_leg_pri_log_3legs_2logs_1 finished successfully.
Aug 18 14:59:28 taft-02 lvm[3671]: Primary mirror device 253:6 read failed.
Aug 18 14:59:28 taft-02 lvm[3671]: helter_skelter-syncd_pri_leg_pri_log_3legs_2logs_1 is now in-sync.
Aug 18 14:59:28 taft-02 lvm[3671]: helter_skelter-syncd_pri_leg_pri_log_3legs_2logs_1 is now in-sync.
Aug 18 14:59:28 taft-02 lvm[3671]: helter_skelter-syncd_pri_leg_pri_log_3legs_2logs_1 is now in-sync.
Aug 18 14:59:28 taft-02 lvm[3671]: helter_skelter-syncd_pri_leg_pri_log_3legs_2logs_1 is now in-sync.


Version-Release number of selected component (if applicable):
2.6.32-59.1.el6.x86_64

lvm2-2.02.72-8.el6    BUILT: Wed Aug 18 10:41:52 CDT 2010
lvm2-libs-2.02.72-8.el6    BUILT: Wed Aug 18 10:41:52 CDT 2010
lvm2-cluster-2.02.72-8.el6    BUILT: Wed Aug 18 10:41:52 CDT 2010
udev-147-2.22.el6    BUILT: Fri Jul 23 07:21:33 CDT 2010
device-mapper-1.02.53-8.el6    BUILT: Wed Aug 18 10:41:52 CDT 2010
device-mapper-libs-1.02.53-8.el6    BUILT: Wed Aug 18 10:41:52 CDT 2010
device-mapper-event-1.02.53-8.el6    BUILT: Wed Aug 18 10:41:52 CDT 2010
device-mapper-event-libs-1.02.53-8.el6    BUILT: Wed Aug 18 10:41:52 CDT 2010
cmirror-2.02.72-8.el6    BUILT: Wed Aug 18 10:41:52 CDT 2010


How reproducible:
Everytime

--- Additional comment from tyasui on 2010-08-19 01:18:17 EDT ---

> Aug 18 14:59:28 taft-02 lvm[3671]: Trying to up-convert to 3 images, 2 logs.
> Aug 18 14:59:28 taft-02 lvm[3671]: Adding log redundancy not supported yet.
> Aug 18 14:59:28 taft-02 lvm[3671]: Try converting the log to 'core' first.
> Aug 18 14:59:28 taft-02 lvm[3671]: Trying to up-convert to 2 images, 2 logs.
> Aug 18 14:59:28 taft-02 lvm[3671]: Adding log redundancy not supported yet.
> Aug 18 14:59:28 taft-02 lvm[3671]: Try converting the log to 'core' first.

This error is detected because up-convert of mirror log isn't supported now.
up-convert of mirror leg and log is processed in order of mirror log and mirror leg. So up-convert of mirror leg isn't processed if up-convert of mirror log failed.

There seems to be another bug related to this bug. A mirror volume has 3 legs and 2 logs. Mirror leg and log fault policy are "allocate," then the mirror volume will be unexpectedly down-converted to mirror with 2 legs and 1 log when one of mirror logs failed.

We should fix the repair logic in _lvconvert_mirrors_repair().

Comment 5 Jonathan Earl Brassow 2011-02-08 22:35:47 UTC
upstream in version 2.02.81

Comment 6 Milan Broz 2011-03-04 17:53:05 UTC
Fixed in lvm2-2.02.84-1.el5

Comment 9 Corey Marthaler 2011-04-29 18:43:59 UTC
Fix verified in the latest rpms. The helter_skelter test cases with mirror
leg as well as mirrored log leg failures have been turned back on and executed.

2.6.18-256.el5

lvm2-2.02.84-3.el5    BUILT: Wed Apr 27 03:42:24 CDT 2011
lvm2-cluster-2.02.84-3.el5    BUILT: Wed Apr 27 03:42:43 CDT 2011
device-mapper-1.02.63-2.el5    BUILT: Fri Mar  4 10:23:17 CST 2011
device-mapper-event-1.02.63-2.el5    BUILT: Fri Mar  4 10:23:17 CST 2011
cmirror-1.1.39-10.el5    BUILT: Wed Sep  8 16:32:05 CDT 2010
kmod-cmirror-0.1.22-3.el5    BUILT: Tue Dec 22 13:39:47 CST 2009

Comment 11 Florian Nadge 2011-05-26 14:55:48 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
This field is the basis of the errata or release note for this bug. It can also be used for change logs.

The Technical Note template, known as CCFR, is as follows:

Cause
    What actions or circumstances cause this bug to present.
Consequence
    What happens when the bug presents.
Fix
    What was done to fix the bug.
Result
    What now happens when the actions or circumstances above occur.
    Note: this is not the same as the bug doesn’t present anymore.

Comment 12 errata-xmlrpc 2011-07-21 10:50:33 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-1071.html

Comment 13 errata-xmlrpc 2011-07-21 12:28:43 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-1071.html


Note You need to log in before you can comment on or make changes to this bug.