Bug 832596

Summary: --alloc anywhere only works with raid1, not raid4, raid5, or raid6
Product: Red Hat Enterprise Linux 6 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Alasdair Kergon <agk>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: high    
Version: 6.3CC: agk, dwysocha, heinzm, jbrassow, msnitzer, nperic, prajnoha, prockai, thornber, zkabelac
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: lvm2-2.02.97-2.el6 Doc Type: Bug Fix
Doc Text:
A bug in the space allocation logic meant that the '--alloc anywhere' option sometimes failed as it tried to allocate space on devices that were already full. This has been fixed. raid4/5/6 were particularly affected.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-21 08:10:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 852440    

Description Corey Marthaler 2012-06-15 21:24:13 UTC
Description of problem:
[root@hayes-01 bin]# pvscan
  PV /dev/etherd/e1.1p9    VG raid_sanity   lvm2 [908.23 GiB / 908.23 GiB free]
  PV /dev/etherd/e1.1p8    VG raid_sanity   lvm2 [908.23 GiB / 908.23 GiB free]
  PV /dev/etherd/e1.1p7    VG raid_sanity   lvm2 [908.23 GiB / 908.23 GiB free]
  PV /dev/etherd/e1.1p6    VG raid_sanity   lvm2 [908.23 GiB / 908.23 GiB free]
  PV /dev/etherd/e1.1p5    VG raid_sanity   lvm2 [908.23 GiB / 908.23 GiB free]
  PV /dev/etherd/e1.1p4    VG raid_sanity   lvm2 [908.23 GiB / 908.23 GiB free]
  PV /dev/etherd/e1.1p3    VG raid_sanity   lvm2 [908.23 GiB / 908.23 GiB free]
  PV /dev/etherd/e1.1p2    VG raid_sanity   lvm2 [908.23 GiB / 908.23 GiB free]
  PV /dev/etherd/e1.1p10   VG raid_sanity   lvm2 [908.23 GiB / 908.23 GiB free]
  PV /dev/etherd/e1.1p1    VG raid_sanity   lvm2 [908.23 GiB / 908.23 GiB free]

[root@hayes-01 bin]# lvcreate --type raid4 -i 2 -n alloc_anywhere --alloc anywhere -L 50M raid_sanity /dev/etherd/e1.1p4:0-1500 /dev/etherd/e1.1p1:0-1500
  Using default stripesize 64.00 KiB
  Rounding up size to full physical extent 52.00 MiB
  Rounding size (13 extents) up to stripe boundary size (14 extents)
  Segment with extent 1 in PV /dev/etherd/e1.1p1 not found
  Failed to extend alloc_anywhere_rmeta_1 in alloc_anywhere.

[root@hayes-01 bin]# lvcreate --type raid4 -i 3 -n alloc_anywhere --alloc anywhere -L 50M raid_sanity /dev/etherd/e1.1p4:0-1500 /dev/etherd/e1.1p1:0-1500
  Using default stripesize 64.00 KiB
  Rounding up size to full physical extent 52.00 MiB
  Rounding size (13 extents) up to stripe boundary size (15 extents)
  Inconsistent length: 1 0
  PV segment pe_alloc_count mismatch: 12 != 4294734804
  Inconsistent length: 1 0
  PV segment pe_alloc_count mismatch: 12 != 4294734804
  PV segment VG free_count mismatch: 2325036 != 2790044
  Internal error: PV segments corrupted in raid_sanity.
  LV alloc_anywhere_rimage_0: segment 1 has inconsistent PV area 0
  LV alloc_anywhere_rimage_0: segment 2 has inconsistent PV area 0
  Internal error: LV segments corrupted in alloc_anywhere_rimage_0.
  LV alloc_anywhere_rimage_1: segment 1 has inconsistent PV area 0
  LV alloc_anywhere_rimage_1: segment 2 has inconsistent PV area 0
  Internal error: LV segments corrupted in alloc_anywhere_rimage_1.


Version-Release number of selected component (if applicable):
2.6.32-278.el6.x86_64

lvm2-2.02.95-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
lvm2-libs-2.02.95-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
lvm2-cluster-2.02.95-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
udev-147-2.41.el6    BUILT: Thu Mar  1 13:01:08 CST 2012
device-mapper-1.02.74-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
device-mapper-libs-1.02.74-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
device-mapper-event-1.02.74-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
device-mapper-event-libs-1.02.74-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
cmirror-2.02.95-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012


How reproducible:
Everytime

Comment 1 Alasdair Kergon 2012-06-16 01:15:33 UTC
1) The allocation section of the -vvvv trace will show what the code is actually doing.  (The pasted messages are just some internal checks that detect something went wrong earlier.)

2) Let's see what the simplest failure case is.  These ones have --raid4 *and* --stripes as well as the anywhere policy.  Are all those parts required for this sort of failure?

Comment 2 Alasdair Kergon 2012-06-16 02:25:14 UTC
First idea: I induced a simple test case to add empty areas to allocated_areas[s] - which is nonsensical.  (It found 1 extent for each metadata area, but nothing for any data areas.)

--- a/lib/metadata/lv_manip.c
+++ b/lib/metadata/lv_manip.c
@@ -1110,9 +1122,14 @@ static int _alloc_parallel_area(struct alloc_handle *ah, uint32_t max_to_allocat
 			dm_list_add(&ah->alloced_areas[s], &aa[s].list);
 			s -= ah->area_count + ah->parity_count;
 		}
+
+		aa[s].len = (ah->alloc_and_split_meta) ? len - ah->log_len : len;
+		/* Skip empty allocations */
+		if (!aa[s].len)
+			continue;
+
 		aa[s].pv = pva->map->pv;
 		aa[s].pe = pva->start;
-		aa[s].len = (ah->alloc_and_split_meta) ? len - ah->log_len : len;
 
 		log_debug("Allocating parallel area %" PRIu32
 			  " on %s start PE %" PRIu32 " length %" PRIu32 ".",

Comment 3 Alasdair Kergon 2012-06-16 02:50:01 UTC
Output without the above patch: notice the incorrect allocations listed with 'length 0'.

Allocating parallel metadata area 0 on /dev/loop2 start PE 3 length 1.
Allocating parallel area 0 on /dev/loop2 start PE 4 length 0.
Allocating parallel metadata area 1 on /dev/loop2 start PE 0 length 1.
Allocating parallel area 1 on /dev/loop2 start PE 0 length 0.
Allocating parallel metadata area 2 on /dev/loop3 start PE 3 length 1.
Allocating parallel area 2 on /dev/loop3 start PE 4 length 0.

Comment 7 Nenad Peric 2012-10-30 13:51:59 UTC
Verified by running raid_sainty tests, except the tests which needed a missing PV due to Bug 867644 which crashes kernel if attempt is made at aprtially activating a raid LV. 


Verified with:

lvm2-2.02.98-2.el6.x86_64

Comment 8 errata-xmlrpc 2013-02-21 08:10:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0501.html