Bug 983600

Summary: 'Contiguous' policy should partition the tags across the areas if cling tags are defined in lvm.conf
Product: Red Hat Enterprise Linux 6 Reporter: Jose Castillo <jcastillo>
Component: lvm2Assignee: Alasdair Kergon <agk>
lvm2 sub component: Changing Logical Volumes (RHEL6) QA Contact: cluster-qe <cluster-qe>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: high CC: agk, cmarthal, dwysocha, heinzm, jbrassow, msnitzer, nperic, pablo.iranzo, prajnoha, prockai, thornber, zkabelac
Version: 6.4   
Target Milestone: beta   
Target Release: 6.7   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: lvm2-2.02.118-2.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-07-22 07:36:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1075802, 1159926    

Description Jose Castillo 2013-07-11 14:25:07 UTC
When specifying cling_by_tags option in the lvcreate command, the result of the command depends on how the tags are distributed. The command used in both test cases is:

	 lvcreate -vvvvv -d --alloc cling_by_tags -m1 -n mirror -L 600M --nosync
	test_cbt

Case 1:

	Tag distribution:

	  PV         VG       Fmt  Attr PSize   PFree   PV Tags
	  /dev/sda1  test_cbt lvm2 a--  508.00m 508.00m A
	  /dev/sdb1  test_cbt lvm2 a--  508.00m 508.00m A
	  /dev/sdh1  test_cbt lvm2 a--  508.00m 508.00m A
	  /dev/sdk1  test_cbt lvm2 a--  508.00m 508.00m B
	  /dev/sdl1  test_cbt lvm2 a--  508.00m 508.00m B
	  /dev/sdm1  test_cbt lvm2 a--  508.00m 508.00m B

	Result:

	Allocation fails with "Insufficient suitable allocatable extents"

	A extract of the allocation:

	#metadata/lv_manip.c:1192         Allocating parallel area 0 on
	/dev/sda1 start PE 0 length 127.
	#metadata/lv_manip.c:1192         Allocating parallel area 1 on
	/dev/sdb1 start PE 0 length 127.
	#metadata/lv_manip.c:2110         Trying allocation using cling policy.
	#metadata/lv_manip.c:1760         Cling_to_allocated is set
	#metadata/lv_manip.c:1721         Still need 46 total extents:
	#metadata/lv_manip.c:1724           2 (2 data/0 parity) parallel areas
	of 23 extents each
	#metadata/lv_manip.c:1726           0 mirror logs of 0 extents each
	#metadata/lv_manip.c:2110         Trying allocation using cling_by_tags
	policy.
	#metadata/lv_manip.c:1760         Cling_to_allocated is set
	#metadata/lv_manip.c:1721         Still need 46 total extents:
	#metadata/lv_manip.c:1724           2 (2 data/0 parity) parallel areas
	of 23 extents each
	#metadata/lv_manip.c:1726           0 mirror logs of 0 extents each
	#metadata/lv_manip.c:1380         Matched allocation PV tag A on
	existing /dev/sdh1 with free space on /dev/sda1.
	#metadata/lv_manip.c:1414         Considering allocation area 0 as
	/dev/sdh1 start PE 0 length 127 leaving 0.
	#metadata/lv_manip.c:2133   Insufficient suitable allocatable extents
	for logical volume mirror: 46 more required

Case 2:

	Tag distribution:

	  PV         VG       Fmt  Attr PSize   PFree   PV Tags
	  /dev/sda1  test_cbt lvm2 a--  508.00m 508.00m A
	  /dev/sdb1  test_cbt lvm2 a--  508.00m 508.00m B
	  /dev/sdh1  test_cbt lvm2 a--  508.00m 508.00m A
	  /dev/sdk1  test_cbt lvm2 a--  508.00m 508.00m B
	  /dev/sdl1  test_cbt lvm2 a--  508.00m 508.00m A
	  /dev/sdm1  test_cbt lvm2 a--  508.00m 508.00m B

	Result:

	The LV is created correctly.

	A extract of the allocation:

	#metadata/lv_manip.c:1192         Allocating parallel area 0 on
	/dev/sda1 start PE 0 length 127.
	#metadata/lv_manip.c:1192         Allocating parallel area 1 on
	/dev/sdb1 start PE 0 length 127.
	#metadata/lv_manip.c:2110         Trying allocation using cling policy.
	#metadata/lv_manip.c:1760         Cling_to_allocated is set
	#metadata/lv_manip.c:1721         Still need 46 total extents:
	#metadata/lv_manip.c:1724           2 (2 data/0 parity) parallel areas
	of 23 extents each
	#metadata/lv_manip.c:1726           0 mirror logs of 0 extents each
	#metadata/lv_manip.c:2110         Trying allocation using cling_by_tags
	policy.
	#metadata/lv_manip.c:1760         Cling_to_allocated is set
	#metadata/lv_manip.c:1721         Still need 46 total extents:
	#metadata/lv_manip.c:1724           2 (2 data/0 parity) parallel areas
	of 23 extents each
	#metadata/lv_manip.c:1726           0 mirror logs of 0 extents each
	#metadata/lv_manip.c:1380         Matched allocation PV tag A on
	existing /dev/sdl1 with free space on /dev/sda1.
	#metadata/lv_manip.c:1414         Considering allocation area 0 as
	/dev/sdl1 start PE 0 length 127 leaving 0.
	#metadata/lv_manip.c:1380         Matched allocation PV tag B on
	existing /dev/sdm1 with free space on /dev/sdb1.
	#metadata/lv_manip.c:1414         Considering allocation area 1 as
	/dev/sdm1 start PE 0 length 127 leaving 0.
	#metadata/lv_manip.c:1192         Allocating parallel area 0 on
	/dev/sdl1 start PE 0 length 23.
	#metadata/lv_manip.c:1192         Allocating parallel area 1 on
	/dev/sdm1 start PE 0 length 23.

As Alasdair noted, allocation runs through policies in sequence, not going beyond the one specified. Contiguous comes first: It tries to satisfy the requirement by using asmuch contiguous space as possible. This takes no account of tag. 

So this is a new requirement: if cling tags are defined in lvm.conf, then the contiguous policy needs to partition the tags across the areas. 

If the alloc policy requested *requires* tag clinging, then allocation would fail if there isn't enough space with the partitioning. If tag clinging is not a hard requirement, then it would proceed with a non-partitioned allocation like it does today if the partitioned/cling version is not possible.

Comment 3 Corey Marthaler 2015-03-04 21:28:54 UTC
This appears to only affect raid1 and mirror types. The allocation works with all other raid volumes (4,5,6,10).

[root@host-112 ~]# vgs raid_sanity -o pv_name,pv_tags -O pv_name
  PV         PV Tags
  /dev/sda1  A      
  /dev/sda2  A      
  /dev/sdb1  A      
  /dev/sdb2  B      
  /dev/sdd1  B      
  /dev/sdd2  B      

[root@host-112 ~]# lvcreate  --alloc cling_by_tags --type raid1 -n cling_raid -L 600M raid_sanity
  Insufficient suitable allocatable extents for logical volume cling_raid: 28 more required

[root@host-112 ~]# lvcreate  --alloc cling_by_tags --type raid6 -n cling_raid -L 600M raid_sanity
  Rounding size (150 extents) up to stripe boundary size (152 extents).
  Logical volume "cling_raid" created.

[root@host-112 ~]# lvs -a -o +devices
  LV                    VG           Attr       LSize   Cpy%Sync Devices
  cling_raid            raid_sanity  rwl-a-r--- 608.00m 62.50    cling_raid_rimage_0(0),cling_raid_rimage_1(0),cling_raid_rimage_2(0),cling_raid_rimage_3(0),cling_raid_rimage_4(0),cling_raid_rimage_5(0)
  [cling_raid_rimage_0] raid_sanity  Iwl-aor--- 152.00m          /dev/sda1(1)
  [cling_raid_rimage_1] raid_sanity  Iwl-aor--- 152.00m          /dev/sda2(1)
  [cling_raid_rimage_2] raid_sanity  Iwl-aor--- 152.00m          /dev/sdb1(1)
  [cling_raid_rimage_3] raid_sanity  Iwl-aor--- 152.00m          /dev/sdb2(1)
  [cling_raid_rimage_4] raid_sanity  Iwl-aor--- 152.00m          /dev/sdd1(1)
  [cling_raid_rimage_5] raid_sanity  Iwl-aor--- 152.00m          /dev/sdd2(1)
  [cling_raid_rmeta_0]  raid_sanity  ewl-aor---   4.00m          /dev/sda1(0)
  [cling_raid_rmeta_1]  raid_sanity  ewl-aor---   4.00m          /dev/sda2(0)
  [cling_raid_rmeta_2]  raid_sanity  ewl-aor---   4.00m          /dev/sdb1(0)
  [cling_raid_rmeta_3]  raid_sanity  ewl-aor---   4.00m          /dev/sdb2(0)
  [cling_raid_rmeta_4]  raid_sanity  ewl-aor---   4.00m          /dev/sdd1(0)
  [cling_raid_rmeta_5]  raid_sanity  ewl-aor---   4.00m          /dev/sdd2(0)

Comment 4 Alasdair Kergon 2015-04-11 01:15:49 UTC
https://www.redhat.com/archives/lvm-devel/2015-April/msg00050.html

plus several earlier preparatory patches that enhanced the logging to expose the tag clinging behaviour:


  Considering allocation area 0 as /dev/loop10 start PE 0 length 2 leaving 22 with PV tags: tag1,tag3.
  Considering allocation area 1 as /dev/loop11 start PE 0 length 2 leaving 22 with PV tags: tag3.
  Eliminating allocation area 1 at PV /dev/loop11 start PE 0 from consideration: PV tag tag3 already used.


This code should work for straightforward partitioning using tags but it won't necessarily solve obscure cases where multiple tags are overlapping.  (Many of those cases could be solved without very much more work, but I've no reason to believe that anyone needs them so I've not done it.)

Comment 6 Nenad Peric 2015-04-20 17:50:13 UTC
[root@tardis-02 ~]# pvs -o+pv_tags
  PV         VG          Fmt  Attr PSize   PFree  PV Tags
  /dev/sda2  vg_tardis02 lvm2 a--  278.88g     0         
  /dev/sdb1  test_cbt    lvm2 a--   93.12g 93.12g A      
  /dev/sdc1  test_cbt    lvm2 a--   93.12g 93.12g A      
  /dev/sdd1  test_cbt    lvm2 a--   93.12g 93.12g A      
  /dev/sde1  test_cbt    lvm2 a--   93.12g 93.12g B      
  /dev/sdf1  test_cbt    lvm2 a--   93.12g 93.12g B      
  /dev/sdg1  test_cbt    lvm2 a--   93.12g 93.12g B  

[root@tardis-02 ~]# lvcreate -d --alloc cling_by_tags --type mirror -m1 -n mirror -L 94G --nosync test_cbt
  WARNING: New mirror won't be synchronised. Don't read what you didn't write!
  Logical volume "mirror" created.


[root@tardis-02 ~]# lvcreate -d --alloc cling_by_tags --type raid1 -m1 -n raid1 -L 98G --nosync test_cbt
  WARNING: New raid1 won't be synchronised. Don't read what you didn't write!
  Logical volume "raid1" created.


Marking this VERIFIED by:

lvm2-2.02.118-2.el6.x86_64

Comment 7 Corey Marthaler 2015-04-20 22:07:32 UTC
Types mirror and raid1 now work, however the striped raids (raid4,5,6,10) have appeared to regress in the latest build. Moving back to ASSIGNED.


lvm2-2.02.118-1.el6

[root@host-075 ~]# pvs -a -o +pv_tags | grep raid_sanity
  /dev/sda1               raid_sanity lvm2 a--  548.00m 548.00m A      
  /dev/sda2               raid_sanity lvm2 a--  548.00m 548.00m A      
  /dev/sdc1               raid_sanity lvm2 a--  548.00m 548.00m A      
  /dev/sdc2               raid_sanity lvm2 a--  548.00m 548.00m B      
  /dev/sdd1               raid_sanity lvm2 a--  548.00m 548.00m B      
  /dev/sdd2               raid_sanity lvm2 a--  548.00m 548.00m B      

[root@host-075 ~]# lvcreate  --alloc cling_by_tags --type raid1 -m 1 -n cling_raid -L 600M raid_sanity
  Insufficient suitable allocatable extents for logical volume cling_raid: 28 more required

[root@host-075 ~]# lvcreate  --alloc cling_by_tags --type raid4 -i 2 -n cling_raid -L 600M raid_sanity
  Using default stripesize 64.00 KiB.
  Logical volume "cling_raid" created.

[root@host-075 ~]# lvcreate  --alloc cling_by_tags --type raid5 -i 2 -n cling_raid -L 600M raid_sanity
  Using default stripesize 64.00 KiB.
  Logical volume "cling_raid" created.

[root@host-075 ~]# lvcreate  --alloc cling_by_tags --type raid6 -i 3 -n cling_raid -L 600M raid_sanity
  Using default stripesize 64.00 KiB.
  Logical volume "cling_raid" created.

[root@host-075 ~]# lvcreate  --alloc cling_by_tags --type raid10 -i 2 -n cling_raid -L 600M raid_sanity
  Using default stripesize 64.00 KiB.
  Logical volume "cling_raid" created.




lvm2-2.02.118-2.el6

[root@host-076 ~]# pvs -a -o +pv_tags | grep raid_sanity
  /dev/sda1               raid_sanity lvm2 a--  548.00m 548.00m A      
  /dev/sda2               raid_sanity lvm2 a--  548.00m 548.00m A      
  /dev/sdc1               raid_sanity lvm2 a--  548.00m 548.00m A      
  /dev/sdc2               raid_sanity lvm2 a--  548.00m 548.00m B      
  /dev/sdd1               raid_sanity lvm2 a--  548.00m 548.00m B      
  /dev/sdd2               raid_sanity lvm2 a--  548.00m 548.00m B      

[root@host-076 ~]# lvcreate  --alloc cling_by_tags --type raid1 -m 1 -n cling_raid -L 600M raid_sanity
  Logical volume "cling_raid" created.

[root@host-076 ~]# lvcreate  --alloc cling_by_tags --type raid4 -i 2 -n cling_raid -L 600M raid_sanity
  Using default stripesize 64.00 KiB.
  Insufficient suitable allocatable extents for logical volume cling_raid: 152 more required

# WITHOUT --ALLOC CLING_BY_TAGS
[root@host-076 ~]# lvcreate --type raid4 -i 2 -n cling_raid -L 600M raid_sanity
  Using default stripesize 64.00 KiB.
  Logical volume "cling_raid" created.
[root@host-076 ~]# lvs -a -o +devices
  LV                    Attr       LSize   Cpy%Sync Devices                                                             
  cling_raid            rwi-a-r--- 600.00m 43.33    cling_raid_rimage_0(0),cling_raid_rimage_1(0),cling_raid_rimage_2(0)
  [cling_raid_rimage_0] Iwi-aor--- 300.00m          /dev/sda1(1)
  [cling_raid_rimage_1] Iwi-aor--- 300.00m          /dev/sda2(1)
  [cling_raid_rimage_2] Iwi-aor--- 300.00m          /dev/sdc1(1)
  [cling_raid_rmeta_0]  ewi-aor---   4.00m          /dev/sda1(0)
  [cling_raid_rmeta_1]  ewi-aor---   4.00m          /dev/sda2(0)
  [cling_raid_rmeta_2]  ewi-aor---   4.00m          /dev/sdc1(0)

[root@host-076 ~]# lvcreate  --alloc cling_by_tags --type raid5 -i 2 -n cling_raid -L 600M raid_sanity
  Using default stripesize 64.00 KiB.
  Insufficient suitable allocatable extents for logical volume cling_raid: 152 more required

# WITHOUT --ALLOC CLING_BY_TAGS
[root@host-076 ~]# lvcreate --type raid5 -i 2 -n cling_raid -L 600M raid_sanity
  Using default stripesize 64.00 KiB.
  Logical volume "cling_raid" created.
[root@host-076 ~]# lvs -a -o +devices
  LV                    Attr       LSize   Cpy%Sync Devices                                                             
  cling_raid            rwi-a-r--- 600.00m 50.00    cling_raid_rimage_0(0),cling_raid_rimage_1(0),cling_raid_rimage_2(0)
  [cling_raid_rimage_0] Iwi-aor--- 300.00m          /dev/sda1(1)
  [cling_raid_rimage_1] Iwi-aor--- 300.00m          /dev/sda2(1)
  [cling_raid_rimage_2] Iwi-aor--- 300.00m          /dev/sdc1(1)
  [cling_raid_rmeta_0]  ewi-aor---   4.00m          /dev/sda1(0)
  [cling_raid_rmeta_1]  ewi-aor---   4.00m          /dev/sda2(0)
  [cling_raid_rmeta_2]  ewi-aor---   4.00m          /dev/sdc1(0)

[root@host-076 ~]# lvcreate  --alloc cling_by_tags --type raid6 -i 3 -n cling_raid -L 600M raid_sanity
  Using default stripesize 64.00 KiB.
  Insufficient suitable allocatable extents for logical volume cling_raid: 153 more required

[root@host-076 ~]# lvcreate  --alloc cling_by_tags --type raid10 -i 2 -n cling_raid -L 600M raid_sanity
  Using default stripesize 64.00 KiB.
  Insufficient suitable allocatable extents for logical volume cling_raid: 304 more required

Comment 8 Alasdair Kergon 2015-04-27 10:47:09 UTC
The original report did not discuss the other raid modes and I left them out because it was not clear there was any customer demand and we have not yet developed sensible specifications of the desired layouts.

Does anyone know of some layouts people are actually using, splitting raid5 or raid6 across 2 or 3 data centres?

Comment 11 Corey Marthaler 2015-04-30 19:14:11 UTC
Marking this bug verified for types "mirror" and "raid1" volumes *only* and opened bug 1217605 for raid10 contiguous allocation.

Comment 12 errata-xmlrpc 2015-07-22 07:36:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1411.html