Bug 837927 - RAID: inability to create, extend or convert to a large (> 1TiB) RAID LV
RAID: inability to create, extend or convert to a large (> 1TiB) RAID LV
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2 (Show other bugs)
6.3
x86_64 Linux
urgent Severity urgent
: rc
: ---
Assigned To: Jonathan Earl Brassow
Cluster QE
:
Depends On:
Blocks: 840699
  Show dependency treegraph
 
Reported: 2012-07-05 16:31 EDT by Corey Marthaler
Modified: 2015-04-22 21:51 EDT (History)
10 users (show)

See Also:
Fixed In Version: lvm2-2.02.98-1.el6
Doc Type: Bug Fix
Doc Text:
Previously when creating a RAID LV, if the region_size was not specified by the user upon creation, the region_size would not allow for LVs larger than a certain size (2TB in general). Thus, creating or extending a RAID LV beyond that size would cause errors. The region_size is now automatically adjusted upon creation or extension and large LVs can now be created.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-02-21 03:11:32 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2012-07-05 16:31:58 EDT
Description of problem:
This is different than bug 834703 because it's a different error message, also happens on raid1 volumes, and only appears to happen on large PVs.

When I recreate the PVs to be just 1G in size, this extend works.  
[root@hayes-01 bin]# lvcreate --type raid1 -m 1 -n cling_raid -l 242 raid_sanity /dev/etherd/e1.1p1 /dev/etherd/e1.1p5
  Logical volume "cling_raid" created
[root@hayes-01 bin]# lvs -a -o +devices
  LV                    VG          Attr     LSize   Copy%  Devices
  cling_raid            raid_sanity rwi-a-m- 968.00m  39.67 cling_raid_rimage_0(0),cling_raid_rimage_1(0)
  [cling_raid_rimage_0] raid_sanity Iwi-aor- 968.00m        /dev/etherd/e1.1p1(1)
  [cling_raid_rimage_1] raid_sanity Iwi-aor- 968.00m        /dev/etherd/e1.1p5(1)
  [cling_raid_rmeta_0]  raid_sanity ewi-aor-   4.00m        /dev/etherd/e1.1p1(0)
  [cling_raid_rmeta_1]  raid_sanity ewi-aor-   4.00m        /dev/etherd/e1.1p5(0)
[root@hayes-01 bin]# lvextend -l 484 --alloc cling_by_tags raid_sanity/cling_raid
  Extending 2 mirror images.
  Extending logical volume cling_raid to 1.89 GiB
  Logical volume cling_raid successfully resized



However, when I leave them at their normal 900+G size, this fails.
[root@hayes-01 bin]# lvcreate --type raid1 -m 1 -n cling_raid -l 220880 raid_sanity /dev/etherd/e1.1p3 /dev/etherd/e1.1p2
  Logical volume "cling_raid" created

[root@hayes-01 bin]# lvs -a -o +devices
  LV                    Attr     LSize   Copy%  Devices
  cling_raid            rwi-a-m- 862.81g   0.37 cling_raid_rimage_0(0),cling_raid_rimage_1(0)
  [cling_raid_rimage_0] Iwi-aor- 862.81g        /dev/etherd/e1.1p3(1)
  [cling_raid_rimage_1] Iwi-aor- 862.81g        /dev/etherd/e1.1p2(1)
  [cling_raid_rmeta_0]  ewi-aor-   4.00m        /dev/etherd/e1.1p3(0)
  [cling_raid_rmeta_1]  ewi-aor-   4.00m        /dev/etherd/e1.1p2(0)

[root@hayes-01 bin]# lvextend -l 441760 --alloc cling_by_tags raid_sanity/cling_raid
  Extending 2 mirror images.
  Extending logical volume cling_raid to 1.69 TiB
  device-mapper: reload ioctl on  failed: Invalid argument
  Failed to suspend cling_raid

device-mapper: raid: Supplied region_size (1024 sectors) below minimum (1725)
device-mapper: table: 253:7: raid: Supplied region size is too small
device-mapper: ioctl: error adding target to table





Version-Release number of selected component (if applicable):
2.6.32-278.el6.x86_64

lvm2-2.02.95-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
lvm2-libs-2.02.95-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
lvm2-cluster-2.02.95-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
udev-147-2.41.el6    BUILT: Thu Mar  1 13:01:08 CST 2012
device-mapper-1.02.74-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
device-mapper-libs-1.02.74-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
device-mapper-event-1.02.74-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
device-mapper-event-libs-1.02.74-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012
cmirror-2.02.95-10.el6    BUILT: Fri May 18 03:26:00 CDT 2012


How reproducible:
Everytime
Comment 1 Jonathan Earl Brassow 2012-09-26 17:46:12 EDT
The MD bitmap can only track 2^21 regions.  If the region_size is 1024 sectors, the maximum allowable size for the LV would be (1024 * 512 * 2^21 = 2^19 *2^21 = 2^40) 1TiB.  When the device was extended beyond 1TB, the operation failed.

The region size is not adjusted either, which means that even creating a RAID LV greater than 1TiB will fail.

[root@hayes-01 ~]# lvcreate -m1 --type raid1 -L 2T -n lv vg
  device-mapper: reload ioctl on  failed: Invalid argument
  Failed to activate new LV.
Comment 2 Jonathan Earl Brassow 2012-09-26 17:49:35 EDT
Even though the region_size is not automatically adjusted, you can change it when creating an LV:

[root@hayes-01 ~]# lvcreate -m1 --type raid1  -L 1T -n lv vg
  Logical volume "lv" created
[root@hayes-01 ~]# dmsetup table vg-lv
0 2147483648 raid raid1 3 0 region_size 1024 2 254:3 254:4 254:5 254:6

[root@hayes-01 ~]# lvcreate -m1 --type raid1 -R 4M -L 2T -n lv vg
  Logical volume "lv" created
[root@hayes-01 ~]# dmsetup table vg-lv
0 4294967296 raid raid1 3 0 region_size 8192 2 254:3 254:4 254:5 254:6
Comment 3 Jonathan Earl Brassow 2012-09-26 17:50:56 EDT
The option to change the region size is not available to 'lvextend', however.

[root@hayes-01 ~]# lvextend -R 8M -L +1T vg/lv
lvextend: invalid option -- 'R'
  Error during parsing of command line.
Comment 4 Jonathan Earl Brassow 2012-09-26 18:18:31 EDT
converting huge linear devices seems to be a problem also:

[root@hayes-01 ~]# lvcreate -L 2T -n lv vg
  Logical volume "lv" created
[root@hayes-01 ~]# lvconvert --type raid1 -m 1 -R 8M vg/lv
  device-mapper: reload ioctl on  failed: Invalid argument
  Failed to suspend vg/lv before committing changes
  Device '/dev/etherd/e1.1p5' has been left open.
  Device '/dev/etherd/e1.1p6' has been left open.
  Device '/dev/etherd/e1.1p7' has been left open.
  Device '/dev/etherd/e1.1p6' has been left open.
  Device '/dev/etherd/e1.1p1' has been left open.
  Device '/dev/etherd/e1.1p2' has been left open.
  Device '/dev/etherd/e1.1p5' has been left open.
  Device '/dev/etherd/e1.1p8' has been left open.
  Device '/dev/etherd/e1.1p3' has been left open.
  Device '/dev/etherd/e1.1p4' has been left open.
  Device '/dev/etherd/e1.1p7' has been left open.
  Device '/dev/etherd/e1.1p4' has been left open.
  Device '/dev/etherd/e1.1p2' has been left open.
  Device '/dev/etherd/e1.1p8' has been left open.
  Device '/dev/etherd/e1.1p3' has been left open.
  Device '/dev/etherd/e1.1p1' has been left open.

Sep 26 17:16:44 hayes-01 kernel: device-mapper: raid: Supplied region_size (1024 sectors) below minimum (2048)
Sep 26 17:16:44 hayes-01 kernel: device-mapper: table: 254:3: raid: Supplied region size is too small
Sep 26 17:16:44 hayes-01 kernel: device-mapper: ioctl: error adding target to table
Comment 5 Jonathan Earl Brassow 2012-09-27 17:06:52 EDT
Unit test showing correctness of solution:

[root@hayes-01 lvm2]# lvcreate --type raid1 -L 20T -n lv large; lvextend -L 200T large/lv; lvremove -ff large; lvcreate -L 200T -n lv large; lvconvert --type raid1 -m 1 large/lv; lvremove -ff large
  Logical volume "lv" created
  Extending 2 mirror images.
  Extending logical volume lv to 200.00 TiB
  Logical volume lv successfully resized
  Logical volume "lv" successfully removed
  Logical volume "lv" created
  Logical volume "lv" successfully removed
[root@hayes-01 lvm2]# 

Additionally, tests have been added to the LVM test suite (lvcreate-large.sh) that include:
- creating large (> 200T) RAID 1/4/5/6/10 LVs
- convert large (> 200T) linear to RAID1
- extending a large RAID1 LV

Another important test is to ensure that a 50%-synced huge RAID LV that is doubled in size correctly becomes a 25%-synced RAID LV.  The MD bitmap is not resized, but the bitmap is reconfigured.  I have unit tested this as well and it works.
Comment 6 Jonathan Earl Brassow 2012-09-27 17:57:12 EDT
commit 886656e4ac5e5c932d9fbe60d18c063136288a38
Author: Jonathan Brassow <jbrassow@redhat.com>
Date:   Thu Sep 27 16:51:22 2012 -0500

    RAID: Fix problems with creating, extending and converting large RAID LVs
    
    MD's bitmaps can handle 2^21 regions at most.  The RAID code has always
    used a region_size of 1024 sectors.  That means the size of a RAID LV was
    limited to 1TiB.  (The user can adjust the region_size when creating a
    RAID LV, which can affect the maximum size.)  Thus, creating, extending or
    converting to a RAID LV greater than 1TiB would result in a failure to
    load the new device-mapper table.
    
    Again, the size of the RAID LV is not limited by how much space is allocated
    for the metadata area, but by the limitations of the MD bitmap.  Therefore,
    we must adjust the 'region_size' to ensure that the number of regions does
    not exceed the limit.  I've added code to do this when extending a RAID LV
    (which covers 'create' and 'extend' operations) and when up-converting -
    specifically from linear to RAID1.
Comment 8 Corey Marthaler 2013-01-02 16:16:55 EST
Fix verified in the latest rpms.

2.6.32-348.el6.x86_64
lvm2-2.02.98-6.el6    BUILT: Thu Dec 20 07:00:04 CST 2012
lvm2-libs-2.02.98-6.el6    BUILT: Thu Dec 20 07:00:04 CST 2012
lvm2-cluster-2.02.98-6.el6    BUILT: Thu Dec 20 07:00:04 CST 2012
udev-147-2.43.el6    BUILT: Thu Oct 11 05:59:38 CDT 2012
device-mapper-1.02.77-6.el6    BUILT: Thu Dec 20 07:00:04 CST 2012
device-mapper-libs-1.02.77-6.el6    BUILT: Thu Dec 20 07:00:04 CST 2012
device-mapper-event-1.02.77-6.el6    BUILT: Thu Dec 20 07:00:04 CST 2012
device-mapper-event-libs-1.02.77-6.el6    BUILT: Thu Dec 20 07:00:04 CST 2012
cmirror-2.02.98-6.el6    BUILT: Thu Dec 20 07:00:04 CST 2012

Tested with raid[1,4,5,6,10]:

SCENARIO (raid1) - [cling_extend_avail_tagged_extents]
Verify that mirror extends honor the cling by tags allocation policy when
there are enough PVs with tags present for extension to work
Add tags to random PVs
A's /dev/etherd/e1.1p10 /dev/etherd/e1.1p9 /dev/etherd/e1.1p3
B's /dev/etherd/e1.1p7 /dev/etherd/e1.1p6 /dev/etherd/e1.1p1
C's /dev/etherd/e1.1p8 /dev/etherd/e1.1p2 /dev/etherd/e1.1p4
Create a raid using the tagged PVs
hayes-01: lvcreate --type raid1 -m 1 -n cling_raid -l 209255 raid_sanity /dev/etherd/e1.1p10 /dev/etherd/e1.1p7

  LV                     Attr      LSize   Cpy%Sync Devices
  cling_raid             rwi-a-r-- 817.40g     0.54 cling_raid_rimage_0(0),cling_raid_rimage_1(0)
  [cling_raid_rimage_0]  Iwi-aor-- 817.40g          /dev/etherd/e1.1p10(1)
  [cling_raid_rimage_1]  Iwi-aor-- 817.40g          /dev/etherd/e1.1p7(1)
  [cling_raid_rmeta_0]   ewi-aor--   4.00m          /dev/etherd/e1.1p10(0)
  [cling_raid_rmeta_1]   ewi-aor--   4.00m          /dev/etherd/e1.1p7(0)

Extend using the cling_by_tags policy:

[root@hayes-01 ~]# lvextend -l 418510 --alloc cling_by_tags raid_sanity/cling_raid
  Extending 2 mirror images.
  Extending logical volume cling_raid to 1.60 TiB
  Logical volume cling_raid successfully resized

  LV                     Attr      LSize  Cpy%Sync Devices
  cling_raid             rwi-a-r--  1.60t     0.49 cling_raid_rimage_0(0),cling_raid_rimage_1(0)
  [cling_raid_rimage_0]  Iwi-aor--  1.60t          /dev/etherd/e1.1p10(1)
  [cling_raid_rimage_0]  Iwi-aor--  1.60t          /dev/etherd/e1.1p9(0)
  [cling_raid_rimage_1]  Iwi-aor--  1.60t          /dev/etherd/e1.1p7(1)
  [cling_raid_rimage_1]  Iwi-aor--  1.60t          /dev/etherd/e1.1p6(0)
  [cling_raid_rmeta_0]   ewi-aor--  4.00m          /dev/etherd/e1.1p10(0)
  [cling_raid_rmeta_1]   ewi-aor--  4.00m          /dev/etherd/e1.1p7(0)
Comment 9 errata-xmlrpc 2013-02-21 03:11:32 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0501.html

Note You need to log in before you can comment on or make changes to this bug.