Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 796906

Summary:

raid volumes with multiple device failures need a way to do partial allocation

Product:

Red Hat Enterprise Linux 6

Reporter:

Corey Marthaler <cmarthal>

Component:

lvm2

Assignee:

Jonathan Earl Brassow <jbrassow>

Status:

CLOSED ERRATA

QA Contact:

Cluster QE <mspqa-list>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

6.3

CC:

agk, dwysocha, heinzm, jbrassow, mbroz, prajnoha, prockai, thornber, zkabelac

Target Milestone:

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

lvm2-2.02.95-10.el6

Doc Type:

Bug Fix

Doc Text:

New feature, no documentation needed.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2012-06-20 15:01:53 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Partial fix for this bug	none
Verbose output from a failed run (includes attached patch)	none

Description Corey Marthaler 2012-02-23 20:45:42 UTC

Description of problem:
If a raid volume experiences multiple device failures (in this case three), but the VG only has two free devices, there's no mechanism to allocate just one or two devices, It's either all three or nothing right now.

Scenario kill_multiple_synced_raid1_4legs: Kill multiple legs of synced 4 leg raid1 volume(s)

********* RAID hash info for this scenario *********
* names:              synced_multiple_raid1_4legs_1
* sync:               1
* type:               raid1
* -m |-i value:       4
* leg devices:        /dev/sdf1 /dev/sdb1 /dev/sdh1 /dev/sdd1 /dev/sde1
* failpv(s):          /dev/sdh1 /dev/sdb1 /dev/sde1
* failnode(s):        taft-01
* raid fault policy:   allocate
******************************************************

Creating raids(s) on taft-01...
taft-01: lvcreate --type raid1 -m 4 -n synced_multiple_raid1_4legs_1 -L 500M black_bird /dev/sdf1:0-1000 /dev/sdb1:0-1000 /dev/sdh1:0-1000 /dev/sdd1:0-1000 /dev/sde1:0-1000

RAID Structure(s):
  LV                                       Attr     LSize   Copy%  Devices
  synced_multiple_raid1_4legs_1            rwi-a-m- 500.00m   0.00 synced_multiple_raid1_4legs_1_rimage_0(0),synced_multiple_raid1_4legs_1_rimage_1(0),synced_multiple_raid1_4legs_1_rimage_2(0),synced_multiple_raid1_4legs_1_rimage_3(0),synced_multiple_raid1_4legs_1_rimage_4(0)
  [synced_multiple_raid1_4legs_1_rimage_0] Iwi-aor- 500.00m        /dev/sdf1(1)
  [synced_multiple_raid1_4legs_1_rimage_1] Iwi-aor- 500.00m        /dev/sdb1(1)
  [synced_multiple_raid1_4legs_1_rimage_2] Iwi-aor- 500.00m        /dev/sdh1(1)
  [synced_multiple_raid1_4legs_1_rimage_3] Iwi-aor- 500.00m        /dev/sdd1(1)
  [synced_multiple_raid1_4legs_1_rimage_4] Iwi-aor- 500.00m        /dev/sde1(1)
  [synced_multiple_raid1_4legs_1_rmeta_0]  ewi-aor-   4.00m        /dev/sdf1(0)
  [synced_multiple_raid1_4legs_1_rmeta_1]  ewi-aor-   4.00m        /dev/sdb1(0)
  [synced_multiple_raid1_4legs_1_rmeta_2]  ewi-aor-   4.00m        /dev/sdh1(0)
  [synced_multiple_raid1_4legs_1_rmeta_3]  ewi-aor-   4.00m        /dev/sdd1(0)
  [synced_multiple_raid1_4legs_1_rmeta_4]  ewi-aor-   4.00m        /dev/sde1(0)

* NOTE: not enough available devices for allocation fault polices to fully work *
(well technically, since we have 2, some allocation should work)

PV=/dev/sde1
        synced_multiple_raid1_4legs_1_rimage_4: 1
        synced_multiple_raid1_4legs_1_rmeta_4: 1
PV=/dev/sdh1
        synced_multiple_raid1_4legs_1_rimage_2: 1
        synced_multiple_raid1_4legs_1_rmeta_2: 1
PV=/dev/sdb1
        synced_multiple_raid1_4legs_1_rimage_1: 1
        synced_multiple_raid1_4legs_1_rmeta_1: 1

Waiting until all mirror|raid volumes become fully syncd...
   0/1 mirror(s) are fully synced: ( 51.54% )
   1/1 mirror(s) are fully synced: ( 100.00% )

Creating ext on top of mirror(s) on taft-01...
mke2fs 1.41.12 (17-May-2010)
Mounting mirrored ext filesystems on taft-01...

Writing verification files (checkit) to mirror(s) on...
        ---- taft-01 ----

Sleeping 10 seconds to get some outsanding EXT I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- taft-01 ----

Disabling device sdh on taft-01
Disabling device sdb on taft-01
Disabling device sde on taft-01

Attempting I/O to cause mirror down conversion(s) on taft-01
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.504002 s, 83.2 MB/s

Verifying current sanity of lvm after the failure

RAID Structure(s):
  /dev/sdb1: read failed after 0 of 512 at 145669554176: Input/output error
  /dev/sde1: read failed after 0 of 512 at 145669554176: Input/output error
  /dev/sdh1: read failed after 0 of 512 at 145669554176: Input/output error
  LV                                       Attr     LSize   Copy%  Devices
  synced_multiple_raid1_4legs_1            rwi-aom- 500.00m 100.00 synced_multiple_raid1_4legs_1_rimage_0(0),synced_multiple_raid1_4legs_1_rimage_1(0),synced_multiple_raid1_4legs_1_rimage_2(0),synced_multiple_raid1_4legs_1_rimage_3(0),synced_multiple_raid1_4legs_1_rimage_4(0)
  [synced_multiple_raid1_4legs_1_rimage_0] iwi-aor- 500.00m        /dev/sdf1(1)
  [synced_multiple_raid1_4legs_1_rimage_1] iwi-aor- 500.00m        unknown device(1)
  [synced_multiple_raid1_4legs_1_rimage_2] iwi-aor- 500.00m        unknown device(1)
  [synced_multiple_raid1_4legs_1_rimage_3] iwi-aor- 500.00m        /dev/sdd1(1)
  [synced_multiple_raid1_4legs_1_rimage_4] iwi-aor- 500.00m        unknown device(1)
  [synced_multiple_raid1_4legs_1_rmeta_0]  ewi-aor-   4.00m        /dev/sdf1(0)
  [synced_multiple_raid1_4legs_1_rmeta_1]  ewi-aor-   4.00m        unknown device(0)
  [synced_multiple_raid1_4legs_1_rmeta_2]  ewi-aor-   4.00m        unknown device(0)
  [synced_multiple_raid1_4legs_1_rmeta_3]  ewi-aor-   4.00m        /dev/sdd1(0)
  [synced_multiple_raid1_4legs_1_rmeta_4]  ewi-aor-   4.00m        unknown device(0)

Verifying FAILED device /dev/sdh1 is *NOT* in the volume(s)
Verifying FAILED device /dev/sdb1 is *NOT* in the volume(s)
Verifying FAILED device /dev/sde1 is *NOT* in the volume(s)
Verifying IMAGE device /dev/sdf1 *IS* in the volume(s)
Verifying IMAGE device /dev/sdd1 *IS* in the volume(s)
verify the rimage/rmeta dm devices remain after the failures
Checking EXISTENCE and STATE of synced_multiple_raid1_4legs_1_rimage_4 on:  taft-01
there should not be an 'unknown' device associated with synced_multiple_raid1_4legs_1_rimage_4 on taft-01

# There are two free devices in this VG (sdg1 and sdc1)

Feb 23 14:02:53 taft-01 lvm[1256]: Insufficient suitable allocatable extents for logical volume : 378 more required
Feb 23 14:02:53 taft-01 lvm[1256]: Failed to allocate replacement images for black_bird/synced_multiple_raid1_4legs_1
Feb 23 14:02:53 taft-01 lvm[1256]: Failed to replace faulty devices in black_bird/synced_multiple_raid1_4legs_1.
Feb 23 14:02:53 taft-01 lvm[1256]: Repair of RAID device black_bird-synced_multiple_raid1_4legs_1 failed.
Feb 23 14:02:53 taft-01 lvm[1256]: Failed to process event for black_bird-synced_multiple_raid1_4legs_1
Feb 23 14:02:53 taft-01 lvm[1256]: Device #1 of raid1 array, black_bird-synced_multiple_raid1_4legs_1, has failed.
Feb 23 14:02:54 taft-01 lvm[1256]: Insufficient suitable allocatable extents for logical volume : 378 more required
Feb 23 14:02:54 taft-01 lvm[1256]: Failed to allocate replacement images for black_bird/synced_multiple_raid1_4legs_1
Feb 23 14:02:54 taft-01 lvm[1256]: Failed to replace faulty devices in black_bird/synced_multiple_raid1_4legs_1.
Feb 23 14:02:54 taft-01 lvm[1256]: Repair of RAID device black_bird-synced_multiple_raid1_4legs_1 failed.
Feb 23 14:02:54 taft-01 lvm[1256]: Failed to process event for black_bird-synced_multiple_raid1_4legs_1

# A manual attempt fails as well

[root@taft-01 ~]# lvconvert --repair black_bird/synced_multiple_raid1_4legs_1
Attempt to replace failed RAID images (requires full device resync)? [y/n]: y
  Insufficient suitable allocatable extents for logical volume : 378 more required
  Failed to allocate replacement images for black_bird/synced_multiple_raid1_4legs_1
  Failed to replace faulty devices in black_bird/synced_multiple_raid1_4legs_1.


Version-Release number of selected component (if applicable):
2.6.32-236.el6.x86_64

lvm2-2.02.93-0.48.el6    BUILT: Thu Feb 23 07:04:40 CST 2012
lvm2-libs-2.02.93-0.48.el6    BUILT: Thu Feb 23 07:04:40 CST 2012
lvm2-cluster-2.02.93-0.48.el6    BUILT: Thu Feb 23 07:04:40 CST 2012
udev-147-2.40.el6    BUILT: Fri Sep 23 07:51:13 CDT 2011
device-mapper-1.02.72-0.48.el6    BUILT: Thu Feb 23 07:04:40 CST 2012
device-mapper-libs-1.02.72-0.48.el6    BUILT: Thu Feb 23 07:04:40 CST 2012
device-mapper-event-1.02.72-0.48.el6    BUILT: Thu Feb 23 07:04:40 CST 2012
device-mapper-event-libs-1.02.72-0.48.el6    BUILT: Thu Feb 23 07:04:40 CST 2012
cmirror-2.02.93-0.48.el6    BUILT: Thu Feb 23 07:04:40 CST 2012


How reproducible:
Everytime

Comment 2 Jonathan Earl Brassow 2012-04-20 13:51:01 UTC

This bug hinges on the way suspend is behaving.

When an LV is changed, a suspend may need to "preload" additional sub-LV targets.  But when the actual suspend of the LV happens, it forgets to suspend the newly pre-loaded LVs.  So, when the resume comes along it finds that these new sub-LVs are not suspended like they should be and we get errors.

Comment 4 Jonathan Earl Brassow 2012-04-20 23:14:37 UTC

Created attachment 579124 [details]
Partial fix for this bug

This patch solves the first issue with this bug.  It detects that it cannot allocate the necessary number of devices and gradually backs-off.  It will try for fewer and fewer devices until it succeeds or there is simply no space to allocate a new device from.

Comment 5 Jonathan Earl Brassow 2012-04-20 23:38:58 UTC

The second and most difficult problem with this bug is dealing with how *_missing_* devices are treated.

When device failures happen in a raid device, they do not need to be replaced in order for the device to continue to operate.  If there is not activation/deactivation, the array will continue to function, BUT *_missing_* devices (as added by '_add_error_device') will not be present.

When action is taken on an array that has failed devices, as long as all the failed devices are replace, there is no problem.  However, if only some of the devices are replace or a simple suspend+resume are issued on the device from within LVM, the suspend will preload the *_missing_* devices and the resume will fail because of their presence.

Here are easy steps to reproduce the problem:
*) compile LVM with the attached patch
1) create vg with 4 devices
2) create raid1 LV with 3 devices
3) wait for sync
4) kill 2 of the 3 devices in the LV
5) run 'lvconvert --repair vg/lv' and select 'y' to replace devices

If you have activated the LV since the device failures (implies you deactivated at some time between steps 4 and 5), then step 5 will work just fine.  This is because the *_missing_*  devices were loaded during activation.  If you are running the repair directly after a failure - like dmeventd would do - then you will have a failure.  The failure can leave you RAID array and sub-LVs in a suspended state.  ('dmsetup resume' from the bottom up to avoid frustration.)

Comment 6 Jonathan Earl Brassow 2012-04-21 00:07:02 UTC

Created attachment 579127 [details]
Verbose output from a failed run (includes attached patch)

Search phrases that will bring you quickly to the problem areas:
 "missing_"
 "Suspending"
 "Resuming"
 "Device or resource busy"

Comment 7 Alasdair Kergon 2012-04-21 00:32:31 UTC

Might need you to talk this through to save my having to work out what you probably already know.

Is the line:

#ioctl/libdm-iface.c:1705   device-mapper: create ioctl on vg-lv_rimage_1-missing_0_0 failed: Device or resource busy

the *first* place where this went wrong?

Or should it have done some other operations prior to this?
- If so, what did it miss out and where is the *first* place the sequence of ioctls diverges from what would be correct?

Do you know yet whether or not all the entries in the deptree are as intended, or are some nodes missing from it or with incorrect state?

Do you know what the tables should look like after correct behavior?  And what they actually look like after the failure?   (dmsetup info -c; dmsetup table; dmsetup table --inactive  - dumped before the cmd is run, after failed run, what it would be like if ran correctly)

Comment 8 Alasdair Kergon 2012-04-21 01:23:16 UTC

_add_error_device uses dm_tree_add_new_dev()

which assumes the device does not already exist - but on the resume code path in the trace it does already exist => failure

My first guess is that this (_add_error_device()) needs to be changed to use parts of the _add_dev_to_dtree() code path instead.  [Call _info() on it.  If it doesn't exist, call dm_tree_add_new_dev() like now.  If it does exist, use dm_tree_add_dev() instead.]

Comment 9 Petr Rockai 2012-04-23 12:51:34 UTC

There might be something to that, but right now, RAID repair doesn't work for me at all, and neither does partial activation of RAID volumes.

With:

aux prepare_vg 5
lvcreate --type raid1 -m 2 -L 1 -n raid $vg "$dev1" "$dev2" "$dev3" # "$dev4"
lvchange -a n --partial $vg/raid
aux disable_dev $dev3 # $dev4
lvchange -a y --partial $vg/raid -vvvv

I see this:

#libdm/libdm-deptree.c:2273         Adding target to (254:20): 0 8192 raid raid1 3 0 region_size 1024 3 254:10 254:13 254:14 254:15 254:17 254:19
#libdm/ioctl/libdm-iface.c:1687         dm table   (254:20) OF   [16384] (*1)
#ibdm/ioctl/libdm-iface.c:1687         dm reload   (254:20) NF   [16384] (*1)
#libdm/ioctl/libdm-iface.c:1705   device-mapper: reload ioctl on  failed: Invalid argument

All the devices mentioned on that table line do exist, two of them point to an underlying error target (missing). I am not exactly familiar with the raid table format, but it seems consistent with the table that's loaded by normal activation, which works. This is, however, on linux 3.1.10 -- is it possible that this is a kernel bug on my end? (I am upgrading to 3.2.15 in the meantime...)

Comment 10 Jonathan Earl Brassow 2012-04-23 21:24:32 UTC

Petr, perhaps you are hitting 0447568fc51e0268e201f7086d2450cf986e0411.

[j]$ git tag --contains 0447568fc51e0268e201f7086d2450cf986e0411
v3.4-rc1
v3.4-rc2

Comment 11 Alasdair Kergon 2012-04-24 00:00:59 UTC

A simpler reproducer:

  Put 2 PVs into a VG
  Create and activate $LV with 2 stripes.
  Disable one of the PVs.
  lvchange --refresh --partial $LV

Comment 12 Alasdair Kergon 2012-04-24 00:57:30 UTC

http://sources.redhat.com/cgi-bin/cvsweb.cgi/LVM2/lib/activate/dev_manager.c.diff?cvsroot=lvm2&r2=1.278&r1=1.277&f=u

Comment 13 Jonathan Earl Brassow 2012-04-24 16:50:52 UTC

yes, the patch works.

I'll review it a bit more closely and run it through the tests suites before checking-in.

Comment 15 Jonathan Earl Brassow 2012-04-25 13:34:16 UTC

Sorry I didn't put this in sooner...


commit 50b2d511ecc5177895961e16b215c9fcb84ad80f
Author: Jonathan Earl Brassow <jbrassow>
Date:   Tue Apr 24 20:05:31 2012 +0000

    Allow a subset of failed devices to be replaced in RAID LVs.
    
    If two devices in an array failed, it was previously impossible to replace
    just one of them.  This patch allows for the replacement of some, but perhaps
    not all, failed devices.

commit 9ac67656ae54cde61184e0f5bda25022a1c1d3c1
Author: Jonathan Earl Brassow <jbrassow>
Date:   Tue Apr 24 20:00:03 2012 +0000

    Prevent resume from creating error devices that already exist from suspend.
    
    Thanks to agk for providing the patch that prevents resume from attempting
    (and then failing) to create error devices which already exist; having been
    created by a corresponding suspend operation.

Comment 16 Jonathan Earl Brassow 2012-04-25 13:34:41 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
New feature, no documentation needed.

Comment 17 Corey Marthaler 2012-05-17 19:24:45 UTC

This doesn't appear to work with the latest rpms.

2.6.32-270.el6.x86_64
lvm2-2.02.95-9.el6    BUILT: Wed May 16 10:34:14 CDT 2012
lvm2-libs-2.02.95-9.el6    BUILT: Wed May 16 10:34:14 CDT 2012
lvm2-cluster-2.02.95-9.el6    BUILT: Wed May 16 10:34:14 CDT 2012
udev-147-2.41.el6    BUILT: Thu Mar  1 13:01:08 CST 2012
device-mapper-1.02.74-9.el6    BUILT: Wed May 16 10:34:14 CDT 2012
device-mapper-libs-1.02.74-9.el6    BUILT: Wed May 16 10:34:14 CDT 2012
device-mapper-event-1.02.74-9.el6    BUILT: Wed May 16 10:34:14 CDT 2012
device-mapper-event-libs-1.02.74-9.el6    BUILT: Wed May 16 10:34:14 CDT 2012
cmirror-2.02.95-9.el6    BUILT: Wed May 16 10:34:14 CDT 2012

./black_bird -l /home/msp/cmarthal/work/sts/sts-root -r /usr/tests/sts-rhel6.3 -o taft-01 -i 2 -e kill_multiple_synced_raid6_4legs

Scenario kill_multiple_synced_raid6_4legs: Kill multiple legs of synced 4 leg raid6 volume(s)

********* RAID hash info for this scenario *********
* names:              synced_multiple_raid6_4legs_1
* sync:               1
* type:               raid6
* -m |-i value:       4
* leg devices:        /dev/sdf1 /dev/sdc1 /dev/sdg1 /dev/sdh1 /dev/sde1 /dev/sdb1
* failpv(s):          /dev/sdc1 /dev/sde1
* failnode(s):        taft-01
* raid fault policy:   allocate
******************************************************

Creating raids(s) on taft-01...
taft-01: lvcreate --type raid6 -i 4 -n synced_multiple_raid6_4legs_1 -L 500M black_bird /dev/sdf1:0-1000 /dev/sdc1:0-1000 /dev/sdg1:0-1000 /dev/sdh1:0-1000 /dev/sde1:0-1000 /dev/sdb1:0-1000

RAID Structure(s):
  LV                                       Attr     LSize   Copy%  Devices
  synced_multiple_raid6_4legs_1            rwi-a-r- 512.00m        synced_multiple_raid6_4legs_1_rimage_0(0),synced_multiple_raid6_4legs_1_rimage_1(0),synced_multiple_raid6_4legs_1_rimage_2(0),synced_multiple_raid6_4legs_1_rimage_3(0),synced_multiple_raid6_4legs_1_rimage_4(0),synced_multiple_raid6_4legs_1_rimage_5(0)
  [synced_multiple_raid6_4legs_1_rimage_0] Iwi-aor- 128.00m        /dev/sdf1(1)
  [synced_multiple_raid6_4legs_1_rimage_1] Iwi-aor- 128.00m        /dev/sdc1(1)
  [synced_multiple_raid6_4legs_1_rimage_2] Iwi-aor- 128.00m        /dev/sdg1(1)
  [synced_multiple_raid6_4legs_1_rimage_3] Iwi-aor- 128.00m        /dev/sdh1(1)
  [synced_multiple_raid6_4legs_1_rimage_4] Iwi-aor- 128.00m        /dev/sde1(1)
  [synced_multiple_raid6_4legs_1_rimage_5] Iwi-aor- 128.00m        /dev/sdb1(1)
  [synced_multiple_raid6_4legs_1_rmeta_0]  ewi-aor-   4.00m        /dev/sdf1(0)
  [synced_multiple_raid6_4legs_1_rmeta_1]  ewi-aor-   4.00m        /dev/sdc1(0)
  [synced_multiple_raid6_4legs_1_rmeta_2]  ewi-aor-   4.00m        /dev/sdg1(0)
  [synced_multiple_raid6_4legs_1_rmeta_3]  ewi-aor-   4.00m        /dev/sdh1(0)
  [synced_multiple_raid6_4legs_1_rmeta_4]  ewi-aor-   4.00m        /dev/sde1(0)
  [synced_multiple_raid6_4legs_1_rmeta_5]  ewi-aor-   4.00m        /dev/sdb1(0)

* NOTE: not enough available devices for allocation fault polices to fully work *
(well technically, since we have 1, some allocation should work)

PV=/dev/sde1
        synced_multiple_raid6_4legs_1_rimage_4: 1
        synced_multiple_raid6_4legs_1_rmeta_4: 1
PV=/dev/sdc1
        synced_multiple_raid6_4legs_1_rimage_1: 1
        synced_multiple_raid6_4legs_1_rmeta_1: 1

Creating ext on top of mirror(s) on taft-01...
mke2fs 1.41.12 (17-May-2010)
Mounting mirrored ext filesystems on taft-01...

Writing verification files (checkit) to mirror(s) on...
        ---- taft-01 ----

Sleeping 10 seconds to get some outsanding EXT I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- taft-01 ----

Disabling device sdc on taft-01
Disabling device sde on taft-01

[HANG]


lvm[3047]: Insufficient suitable allocatable extents for logical volume : 66 more required
lvm[3047]: Failed to allocate replacement images for black_bird/synced_multiple_raid6_4legs_1
lvm[3047]: Attempting replacement of 1 devices instead of 2
kernel: device-mapper: raid: Failed to read superblock of device at position 1
kernel: device-mapper: raid: Device 4 specified for rebuild: Clearing superblock
kernel: md/raid:mdX: device dm-14 operational as raid disk 5
kernel: md/raid:mdX: device dm-10 operational as raid disk 3
kernel: md/raid:mdX: device dm-8 operational as raid disk 2
kernel: md/raid:mdX: device dm-4 operational as raid disk 0
kernel: md/raid:mdX: allocated 6384kB
kernel: sd 3:0:0:4: rejecting I/O to offline device
kernel: sd 3:0:0:4: rejecting I/O to offline device
kernel: md/raid:mdX: raid level 6 active with 4 out of 6 devices, algorithm 8
kernel: created bitmap (1 pages) for device mdX
lvm[3047]: device-mapper: create ioctl on black_bird-synced_multiple_raid6_4legs_1_rimage_1-missing_0_0 failed: Device or resource busy
lvm[3047]: Failed to resume black_bird/synced_multiple_raid6_4legs_1 after committing changes
lvm[3047]: Failed to replace faulty devices in black_bird/synced_multiple_raid6_4legs_1.
lvm[3047]: Repair of RAID device black_bird-synced_multiple_raid6_4legs_1 failed.
lvm[3047]: Failed to process event for black_bird-synced_multiple_raid6_4legs_1
lvm[3047]: No longer monitoring RAID device black_bird-synced_multiple_raid6_4legs_1 for events.
kernel: INFO: task kjournald:6648 blocked for more than 120 seconds.
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: kjournald     D 0000000000000001     0  6648      2 0x00000080
kernel: ffff880216bbdc50 0000000000000046 ffff880216bbdc10 ffffffffa000422e
kernel: ffff880216bbdbc0 ffffffff81012bd9 ffff880216bbdc00 ffffffff8109cd39
kernel: ffff880216a845f8 ffff880216bbdfd8 000000000000fb88 ffff880216a845f8
kernel: Call Trace:
kernel: [<ffffffffa000422e>] ? dm_table_unplug_all+0x8e/0x100 [dm_mod]
kernel: [<ffffffff81012bd9>] ? read_tsc+0x9/0x20
kernel: [<ffffffff8109cd39>] ? ktime_get_ts+0xa9/0xe0
kernel: [<ffffffff8109cd39>] ? ktime_get_ts+0xa9/0xe0
kernel: [<ffffffff811ae860>] ? sync_buffer+0x0/0x50
kernel: [<ffffffff814fdd93>] io_schedule+0x73/0xc0
kernel: [<ffffffff811ae8a0>] sync_buffer+0x40/0x50
kernel: [<ffffffff814fe74f>] __wait_on_bit+0x5f/0x90
kernel: [<ffffffff811ae860>] ? sync_buffer+0x0/0x50
kernel: [<ffffffff814fe7f8>] out_of_line_wait_on_bit+0x78/0x90
kernel: [<ffffffff81092110>] ? wake_bit_function+0x0/0x50
kernel: [<ffffffff811ae856>] __wait_on_buffer+0x26/0x30
kernel: [<ffffffffa049afde>] journal_commit_transaction+0x9ee/0x1310 [jbd]
kernel: [<ffffffff8107e00c>] ? lock_timer_base+0x3c/0x70
kernel: [<ffffffff8107eabb>] ? try_to_del_timer_sync+0x7b/0xe0
kernel: [<ffffffffa04a0bb8>] kjournald+0xe8/0x250 [jbd]
kernel: [<ffffffff810920d0>] ? autoremove_wake_function+0x0/0x40
kernel: [<ffffffffa04a0ad0>] ? kjournald+0x0/0x250 [jbd]
kernel: [<ffffffff81091d66>] kthread+0x96/0xa0
kernel: [<ffffffff8100c14a>] child_rip+0xa/0x20
kernel: [<ffffffff81091cd0>] ? kthread+0x0/0xa0
kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20

Comment 19 Jonathan Earl Brassow 2012-05-17 21:27:06 UTC

I've reproduced this on corey's machine by doing the following:

1> vgcreate vg /dev/sd[bcdef]1
2> lvcreate --type raid1 -m 3 -L 100M -n lv vg
#> Wait for sync
3> off.sh <2 devices>
4> echo y | lvconvert --repair vg/lv

The DM devices are left in a suspended state - which explains the hang.

These issues don't happen on my machine.  There must be something different between our userspace versions of LVM.

Comment 20 Jonathan Earl Brassow 2012-05-17 22:00:49 UTC

Looking at the source for the RPM, it looks like the agk's patch from comment 12 has not been pulled in.  That would cause precisely this problem.

Moving back to post.

Comment 23 Corey Marthaler 2012-05-22 20:38:45 UTC

Partial allocation now works when multiple device failures occur. Marking this verified in the latest rpms.

That said a few issues popped up that required test changes to make these scenarios pass. 

1. With partial allocation, we'll now see "Repair of RAID device VG-LV failed" messages, an RFE to fix this should be created. Test will need to now ignore this error.

2. With partial allocation, the test will have to manually restore the failed VG, this is not required with other raid failure scenarios

3. With partial allocation, the test will have to maunally recreate one of the failed PVs (which ever one didn't get restored).

Comment 24 Corey Marthaler 2012-05-24 20:17:35 UTC

4. another issue with partial allocation, is that -missing devices still remain. (Filed rfe 825023 for this)

Comment 26 errata-xmlrpc 2012-06-20 15:01:53 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0962.html