RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1713820 - "Data alignment must not exceed device size." failure when attempting to stack PVs on small virt volumes backed by much larger storage
Summary: "Data alignment must not exceed device size." failure when attempting to stac...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: lvm2
Version: 8.0
Hardware: x86_64
OS: Linux
medium
low
Target Milestone: rc
: 8.0
Assignee: Zdenek Kabelac
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 2090828
TreeView+ depends on / blocked
 
Reported: 2019-05-25 00:12 UTC by Corey Marthaler
Modified: 2022-05-26 16:00 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2090828 (view as bug list)
Environment:
Last Closed: 2021-02-01 07:41:02 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)
verbose pvcreate attempt (96.07 KB, text/plain)
2019-05-25 00:14 UTC, Corey Marthaler
no flags Details

Description Corey Marthaler 2019-05-25 00:12:10 UTC
Description of problem:
The backing storage that make up this pool volume are 2T PVs. If I create 200M+ virt volumes and stack PVs on those, it works fine, however, virt volumes <200M fail with this error. This also happens on rhel7.7.

[root@hayes-01 ~]# lvs -a -o +devices
  LV              VG            Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices      
  POOL            snapper_thinp twi-aot---  <8.64t             0.01   12.21                            POOL_tdata(0)
  [POOL_tdata]    snapper_thinp Twi-ao----  <8.64t                                                     /dev/sde1(0)
  [POOL_tdata]    snapper_thinp Twi-ao----  <8.64t                                                     /dev/sdi1(0)
  [POOL_tdata]    snapper_thinp Twi-ao----  <8.64t                                                     /dev/sdf1(0)
  [POOL_tdata]    snapper_thinp Twi-ao----  <8.64t                                                     /dev/sdj1(0)
  [POOL_tdata]    snapper_thinp Twi-ao----  <8.64t                                                     /dev/sdg1(2)
  [POOL_tmeta]    snapper_thinp ewi-ao----   4.00m                                                     /dev/sdg1(1)
  PV1             snapper_thinp Vwi-a-t--- 220.00m POOL        50.00                                                
  PV2             snapper_thinp Vwi-a-t--- 456.00m POOL        25.00                                                
  PV3             snapper_thinp Vwi-a-t--- 248.00m POOL        50.00                                                
  PV4             snapper_thinp Vwi-a-t---  80.00m POOL        0.00                                                
  [lvol0_pmspare] snapper_thinp ewi-------   4.00m                                                     /dev/sdg1(0)
  origin          snapper_thinp Vwi-a-t---   1.00g POOL        0.00                                                
  other1          snapper_thinp Vwi-a-t---   1.00g POOL        0.00                                                
  other2          snapper_thinp Vwi-a-t---   1.00g POOL        0.00                                                
  other3          snapper_thinp Vwi-a-t---   1.00g POOL        0.00                                                
  other4          snapper_thinp Vwi-a-t---   1.00g POOL        0.00                                                
  other5          snapper_thinp Vwi-a-t---   1.00g POOL        0.00                                                
 
 
[root@hayes-01 ~]# pvscan  --config devices/scan_lvs=1
  PV /dev/sdg1                VG snapper_thinp   lvm2 [<1.82 TiB / 465.62 GiB free]
  PV /dev/sde1                VG snapper_thinp   lvm2 [<1.82 TiB / 0    free]
  PV /dev/sdi1                VG snapper_thinp   lvm2 [<1.82 TiB / 0    free]
  PV /dev/sdf1                VG snapper_thinp   lvm2 [<1.82 TiB / 0    free]
  PV /dev/sdj1                VG snapper_thinp   lvm2 [<1.82 TiB / 0    free]
  PV /dev/snapper_thinp/PV1                      lvm2 [220.00 MiB]
  PV /dev/snapper_thinp/PV2                      lvm2 [456.00 MiB]
  PV /dev/snapper_thinp/PV3                      lvm2 [248.00 MiB]
  Total: 8 [<9.10 TiB] / in use: 5 [9.09 TiB] / in no VG: 3 [924.00 MiB]
 
[root@hayes-01 ~]# pvcreate --config devices/scan_lvs=1 /dev/snapper_thinp/PV4
  /dev/snapper_thinp/PV4: Data alignment must not exceed device size.
  Format-specific initialisation of physical volume /dev/snapper_thinp/PV4 failed.
  Failed to setup physical volume "/dev/snapper_thinp/PV4".


Version-Release number of selected component (if applicable):
4.18.0-80.el8.x86_64

kernel-4.18.0-80.el8    BUILT: Wed Mar 13 07:47:44 CDT 2019
lvm2-2.03.02-6.el8    BUILT: Fri Feb 22 04:47:54 CST 2019
lvm2-libs-2.03.02-6.el8    BUILT: Fri Feb 22 04:47:54 CST 2019
lvm2-dbusd-2.03.02-6.el8    BUILT: Fri Feb 22 04:50:28 CST 2019
device-mapper-1.02.155-6.el8    BUILT: Fri Feb 22 04:47:54 CST 2019
device-mapper-libs-1.02.155-6.el8    BUILT: Fri Feb 22 04:47:54 CST 2019
device-mapper-event-1.02.155-6.el8    BUILT: Fri Feb 22 04:47:54 CST 2019
device-mapper-event-libs-1.02.155-6.el8    BUILT: Fri Feb 22 04:47:54 CST 2019
device-mapper-persistent-data-0.7.6-1.el8    BUILT: Sun Aug 12 04:21:55 CDT 2018


How reproducible:
Everytime

Comment 1 Corey Marthaler 2019-05-25 00:14:12 UTC
Created attachment 1573098 [details]
verbose pvcreate attempt

Comment 2 David Teigland 2019-09-24 18:01:30 UTC
The kernel is reporting a strange value for optimimal_io_size:

#device/dev-type.c:904         Device /dev/snapper_thinp/PV4: queue/minimum_io_size is 262144 bytes.
#device/dev-type.c:904         Device /dev/snapper_thinp/PV4: queue/optimal_io_size is 144965632 bytes.

When I run this I see a more ordinary looking value:

12:58:48.775710 pvcreate[26231] device/dev-type.c:979  Device /dev/foo/thin1: queue/minimum_io_size is 65536 bytes.
12:58:48.775765 pvcreate[26231] device/dev-type.c:979  Device /dev/foo/thin1: queue/optimal_io_size is 65536 bytes.

Mike, do you recall any recent dm-thin patches related to this?

Comment 3 Mike Snitzer 2019-09-25 14:23:51 UTC
(In reply to David Teigland from comment #2)
> The kernel is reporting a strange value for optimimal_io_size:
> 
> #device/dev-type.c:904         Device /dev/snapper_thinp/PV4:
> queue/minimum_io_size is 262144 bytes.
> #device/dev-type.c:904         Device /dev/snapper_thinp/PV4:
> queue/optimal_io_size is 144965632 bytes.
> 
> When I run this I see a more ordinary looking value:
> 
> 12:58:48.775710 pvcreate[26231] device/dev-type.c:979  Device
> /dev/foo/thin1: queue/minimum_io_size is 65536 bytes.
> 12:58:48.775765 pvcreate[26231] device/dev-type.c:979  Device
> /dev/foo/thin1: queue/optimal_io_size is 65536 bytes.
> 
> Mike, do you recall any recent dm-thin patches related to this?

No, but that doesn't mean there isn't something recent (or not).  Something has to explain this...

Certainly weird.

Comment 4 David Teigland 2019-09-25 14:31:00 UTC
Corey, could you cat /sys/block/dm-<minor>/queue/optimal_io_size which corresponds to /dev/snapper_thinp/PV4?  Also, could you cat the same value for each of the PVs in that VG (sdg1-sdj1)?  This should confirm if it's a kernel issue or userspace.

Comment 5 Corey Marthaler 2019-10-22 16:01:12 UTC
kernel-4.18.0-147.8.el8    BUILT: Thu Oct 17 19:20:05 CDT 2019
lvm2-2.03.05-5.el8    BUILT: Thu Sep 26 01:40:57 CDT 2019
lvm2-libs-2.03.05-5.el8    BUILT: Thu Sep 26 01:40:57 CDT 2019
lvm2-dbusd-2.03.05-5.el8    BUILT: Thu Sep 26 01:43:33 CDT 2019
device-mapper-1.02.163-5.el8    BUILT: Thu Sep 26 01:40:57 CDT 2019
device-mapper-libs-1.02.163-5.el8    BUILT: Thu Sep 26 01:40:57 CDT 2019
device-mapper-event-1.02.163-5.el8    BUILT: Thu Sep 26 01:40:57 CDT 2019
device-mapper-event-libs-1.02.163-5.el8    BUILT: Thu Sep 26 01:40:57 CDT 2019
device-mapper-persistent-data-0.8.5-2.el8    BUILT: Wed Jun  5 10:28:04 CDT 2019



[root@hayes-01 ~]# lvs -a -o +devices
  LV              VG            Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices      
  POOL            snapper_thinp twi-aot---  <8.64t             0.01   12.30                            POOL_tdata(0)
  [POOL_tdata]    snapper_thinp Twi-ao----  <8.64t                                                     /dev/sdg1(0) 
  [POOL_tdata]    snapper_thinp Twi-ao----  <8.64t                                                     /dev/sdh1(0) 
  [POOL_tdata]    snapper_thinp Twi-ao----  <8.64t                                                     /dev/sdi1(0) 
  [POOL_tdata]    snapper_thinp Twi-ao----  <8.64t                                                     /dev/sdf1(0) 
  [POOL_tdata]    snapper_thinp Twi-ao----  <8.64t                                                     /dev/sdk1(2) 
  [POOL_tmeta]    snapper_thinp ewi-ao----   4.00m                                                     /dev/sdk1(1) 
  PV1             snapper_thinp Vwi-a-t--- 284.00m POOL        33.33                                                
  PV2             snapper_thinp Vwi-a-t--- 344.00m POOL        33.33                                                
  PV3             snapper_thinp Vwi-a-t--- 188.00m POOL        50.00                                                
  PV4             snapper_thinp Vwi-a-t--- 308.00m POOL        33.33                                                
  PV5             snapper_thinp Vwi-a-t--- 324.00m POOL        33.33                                                
  PV6             snapper_thinp Vwi-a-t---  76.00m POOL        0.00                                                 
  [lvol0_pmspare] snapper_thinp ewi-------   4.00m                                                     /dev/sdk1(0) 
  origin          snapper_thinp Vwi-a-t---   1.00g POOL        0.00                                                 
  other1          snapper_thinp Vwi-a-t---   1.00g POOL        0.00                                                 
  other2          snapper_thinp Vwi-a-t---   1.00g POOL        0.00                                                 
  other3          snapper_thinp Vwi-a-t---   1.00g POOL        0.00                                                 
  other4          snapper_thinp Vwi-a-t---   1.00g POOL        0.00                                                                                                                                                                          
  other5          snapper_thinp Vwi-a-t---   1.00g POOL        0.00                                                                                                                                                                          
[root@hayes-01 ~]# pvcreate --config devices/scan_lvs=1 /dev/snapper_thinp/PV6                                                                                                                                                               
  /dev/snapper_thinp/PV6: Data alignment must not exceed device size.                                                                                                                                                                        
  Format-specific initialisation of physical volume /dev/snapper_thinp/PV6 failed.                                                                                                                                                           
  Failed to setup physical volume "/dev/snapper_thinp/PV6".                                                                                                                                                                                  
                                                                                                                                                                                                                                             
                                                                                                                                                                                                                                             
[root@hayes-01 ~]# dmsetup ls                                                                                                                                                                                                                
snapper_thinp-PV4       (253:13)                                                                                                                                                                                                             
snapper_thinp-origin    (253:4)                                                                                                                                                                                                              
snapper_thinp-PV3       (253:12)                                                                                                                                                                                                             
snapper_thinp-PV2       (253:11)                                                                                                                                                                                                             
snapper_thinp-PV1       (253:10)                                                                                                                                                                                                             
snapper_thinp-POOL      (253:3)                                                                                                                                                                                                              
snapper_thinp-other5    (253:9)                                                                                                                                                                                                              
snapper_thinp-other4    (253:8)                                                                                                                                                                                                              
snapper_thinp-other3    (253:7)                                                                                                                                                                                                              
snapper_thinp-POOL-tpool        (253:2)                                                                                                                                                                                                      
snapper_thinp-POOL_tdata        (253:1)                                                                                                                                                                                                      
snapper_thinp-other2    (253:6)                                                                                                                                                                                                              
snapper_thinp-POOL_tmeta        (253:0)                                                                                                                                                                                                      
snapper_thinp-other1    (253:5)                                                                                                                                                                                                              
snapper_thinp-PV6       (253:15)                                                                                                                                                                                                             
snapper_thinp-PV5       (253:14)

# /dev/snapper_thinp/PV6 (Failed)
[root@hayes-01 ~]# cat /sys/block/dm-15/queue/optimal_io_size
145752064

# /dev/snapper_thinp/PV6 (Passed)
[root@hayes-01 ~]# cat /sys/block/dm-14/queue/optimal_io_size
145752064

# actual PVs in snapper_thinp
[root@hayes-01 ~]# cat /sys/block/sdg/queue/optimal_io_size
0
[root@hayes-01 ~]# cat /sys/block/sdh/queue/optimal_io_size
0
[root@hayes-01 ~]# cat /sys/block/sdi/queue/optimal_io_size
0
[root@hayes-01 ~]# cat /sys/block/sdf/queue/optimal_io_size
0
[root@hayes-01 ~]# cat /sys/block/sdk/queue/optimal_io_size
0

Comment 6 Mike Snitzer 2019-10-22 16:22:57 UTC
(In reply to Corey Marthaler from comment #5)
> kernel-4.18.0-147.8.el8    BUILT: Thu Oct 17 19:20:05 CDT 2019
> lvm2-2.03.05-5.el8    BUILT: Thu Sep 26 01:40:57 CDT 2019
> lvm2-libs-2.03.05-5.el8    BUILT: Thu Sep 26 01:40:57 CDT 2019
> lvm2-dbusd-2.03.05-5.el8    BUILT: Thu Sep 26 01:43:33 CDT 2019
> device-mapper-1.02.163-5.el8    BUILT: Thu Sep 26 01:40:57 CDT 2019
> device-mapper-libs-1.02.163-5.el8    BUILT: Thu Sep 26 01:40:57 CDT 2019
> device-mapper-event-1.02.163-5.el8    BUILT: Thu Sep 26 01:40:57 CDT 2019
> device-mapper-event-libs-1.02.163-5.el8    BUILT: Thu Sep 26 01:40:57 CDT
> 2019
> device-mapper-persistent-data-0.8.5-2.el8    BUILT: Wed Jun  5 10:28:04 CDT
> 2019
> 
> 
> 
> [root@hayes-01 ~]# lvs -a -o +devices
>   LV              VG            Attr       LSize   Pool Origin Data%  Meta% 
> Move Log Cpy%Sync Convert Devices      
>   POOL            snapper_thinp twi-aot---  <8.64t             0.01   12.30 
> POOL_tdata(0)
>   [POOL_tdata]    snapper_thinp Twi-ao----  <8.64t                          
> /dev/sdg1(0) 
>   [POOL_tdata]    snapper_thinp Twi-ao----  <8.64t                          
> /dev/sdh1(0) 
>   [POOL_tdata]    snapper_thinp Twi-ao----  <8.64t                          
> /dev/sdi1(0) 
>   [POOL_tdata]    snapper_thinp Twi-ao----  <8.64t                          
> /dev/sdf1(0) 
>   [POOL_tdata]    snapper_thinp Twi-ao----  <8.64t                          
> /dev/sdk1(2) 
>   [POOL_tmeta]    snapper_thinp ewi-ao----   4.00m                          
> /dev/sdk1(1) 
>   PV1             snapper_thinp Vwi-a-t--- 284.00m POOL        33.33        
> 
>   PV2             snapper_thinp Vwi-a-t--- 344.00m POOL        33.33        
> 
>   PV3             snapper_thinp Vwi-a-t--- 188.00m POOL        50.00        
> 
>   PV4             snapper_thinp Vwi-a-t--- 308.00m POOL        33.33        
> 
>   PV5             snapper_thinp Vwi-a-t--- 324.00m POOL        33.33        
> 
>   PV6             snapper_thinp Vwi-a-t---  76.00m POOL        0.00         
> 
>   [lvol0_pmspare] snapper_thinp ewi-------   4.00m                          
> /dev/sdk1(0) 
>   origin          snapper_thinp Vwi-a-t---   1.00g POOL        0.00         
> 
>   other1          snapper_thinp Vwi-a-t---   1.00g POOL        0.00         
> 
>   other2          snapper_thinp Vwi-a-t---   1.00g POOL        0.00         
> 
>   other3          snapper_thinp Vwi-a-t---   1.00g POOL        0.00         
> 
>   other4          snapper_thinp Vwi-a-t---   1.00g POOL        0.00         
> 
>   other5          snapper_thinp Vwi-a-t---   1.00g POOL        0.00         
> 
> [root@hayes-01 ~]# pvcreate --config devices/scan_lvs=1
> /dev/snapper_thinp/PV6                                                      
> 
>   /dev/snapper_thinp/PV6: Data alignment must not exceed device size.       
> 
>   Format-specific initialisation of physical volume /dev/snapper_thinp/PV6
> failed.                                                                     
> 
>   Failed to setup physical volume "/dev/snapper_thinp/PV6".                 
> 
>                                                                             
> 
>                                                                             
> 
> [root@hayes-01 ~]# dmsetup ls                                               
> 
> snapper_thinp-PV4       (253:13)                                            
> 
> snapper_thinp-origin    (253:4)                                             
> 
> snapper_thinp-PV3       (253:12)                                            
> 
> snapper_thinp-PV2       (253:11)                                            
> 
> snapper_thinp-PV1       (253:10)                                            
> 
> snapper_thinp-POOL      (253:3)                                             
> 
> snapper_thinp-other5    (253:9)                                             
> 
> snapper_thinp-other4    (253:8)                                             
> 
> snapper_thinp-other3    (253:7)                                             
> 
> snapper_thinp-POOL-tpool        (253:2)                                     
> 
> snapper_thinp-POOL_tdata        (253:1)                                     
> 
> snapper_thinp-other2    (253:6)                                             
> 
> snapper_thinp-POOL_tmeta        (253:0)                                     
> 
> snapper_thinp-other1    (253:5)                                             
> 
> snapper_thinp-PV6       (253:15)                                            
> 
> snapper_thinp-PV5       (253:14)
> 
> # /dev/snapper_thinp/PV6 (Failed)
> [root@hayes-01 ~]# cat /sys/block/dm-15/queue/optimal_io_size
> 145752064
> 
> # /dev/snapper_thinp/PV6 (Passed)
> [root@hayes-01 ~]# cat /sys/block/dm-14/queue/optimal_io_size
> 145752064

You meant PV5 (dm-14) Passed.

> # actual PVs in snapper_thinp
> [root@hayes-01 ~]# cat /sys/block/sdg/queue/optimal_io_size
> 0
> [root@hayes-01 ~]# cat /sys/block/sdh/queue/optimal_io_size
> 0
> [root@hayes-01 ~]# cat /sys/block/sdi/queue/optimal_io_size
> 0
> [root@hayes-01 ~]# cat /sys/block/sdf/queue/optimal_io_size
> 0
> [root@hayes-01 ~]# cat /sys/block/sdk/queue/optimal_io_size
> 0

Can you provide the DM table (dmsetup table) output for PV5 and PV6?  Just want to make sure we know all the devices they are layering upon.

DM thinp will establish an optimal_io_size that matches the thin-pool chunksize.  So really what 145752064 implies is you've used a really large thin-pool block size right?

Think the 145752064 is in bytes, so 139MB thin-pool chunksize?  Seems weird...

How did you create the thin-pool?

Comment 7 David Teigland 2019-10-22 16:31:06 UTC
PV6 is 76MB, and the optimal_io_size is 139MB, so the device is smaller than the device's optimal_io_size.  (Probably not a very realistic config outside of testing.)

By default, lvm aligns data according to optimal_io_size, which won't work with those values, so I think the pvcreate failure is reasonable.  You could disable pvcreate's alignment logic for this unusual config with devices/data_alignment_detection=0.  That will likely allow the pvcreate to work.

Comment 8 Corey Marthaler 2019-10-22 16:44:20 UTC
Creation steps:

vgcreate   snapper_thinp /dev/sdk1 /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdf1
lvcreate  --thinpool POOL -l95%FREE  --zero n --poolmetadatasize 4M snapper_thinp

lvcreate  --virtualsize 1G -T snapper_thinp/POOL -n origin
lvcreate  --virtualsize 1G -T snapper_thinp/POOL -n other1
lvcreate  --virtualsize 1G -T snapper_thinp/POOL -n other2
lvcreate  --virtualsize 1G -T snapper_thinp/POOL -n other3
lvcreate  -V 1G -T snapper_thinp/POOL -n other4
lvcreate  -V 1G -T snapper_thinp/POOL -n other5

lvcreate  -V 284M -T snapper_thinp/POOL -n PV1
pvcreate --config devices/scan_lvs=1 /dev/snapper_thinp/PV1
lvcreate  -V 344M -T snapper_thinp/POOL -n PV2
pvcreate --config devices/scan_lvs=1 /dev/snapper_thinp/PV2
lvcreate  -V 188M -T snapper_thinp/POOL -n PV3
pvcreate --config devices/scan_lvs=1 /dev/snapper_thinp/PV3
lvcreate  -V 305M -T snapper_thinp/POOL -n PV4
pvcreate --config devices/scan_lvs=1 /dev/snapper_thinp/PV4
lvcreate  -V 322M -T snapper_thinp/POOL -n PV5
pvcreate --config devices/scan_lvs=1 /dev/snapper_thinp/PV5
lvcreate  -V 74M -T snapper_thinp/POOL -n PV6
pvcreate --config devices/scan_lvs=1 /dev/snapper_thinp/PV6
  /dev/snapper_thinp/PV6: Data alignment must not exceed device size.
  Format-specific initialisation of physical volume /dev/snapper_thinp/PV6 failed.
  Failed to setup physical volume "/dev/snapper_thinp/PV6".


[root@hayes-01 ~]# dmsetup table
snapper_thinp-PV4: 0 630784 thin 253:2 10
snapper_thinp-origin: 0 2097152 thin 253:2 1
snapper_thinp-PV3: 0 385024 thin 253:2 9
snapper_thinp-PV2: 0 704512 thin 253:2 8
snapper_thinp-PV1: 0 581632 thin 253:2 7
snapper_thinp-POOL: 0 18553085952 linear 253:2 0
snapper_thinp-other5: 0 2097152 thin 253:2 6
snapper_thinp-other4: 0 2097152 thin 253:2 5
snapper_thinp-other3: 0 2097152 thin 253:2 4
snapper_thinp-POOL-tpool: 0 18553085952 thin-pool 253:0 253:1 284672 0 1 skip_block_zeroing 
snapper_thinp-POOL_tdata: 0 3905937408 linear 8:97 2048
snapper_thinp-POOL_tdata: 3905937408 3905937408 linear 8:113 2048
snapper_thinp-POOL_tdata: 7811874816 3905937408 linear 8:129 2048
snapper_thinp-POOL_tdata: 11717812224 3905937408 linear 8:81 2048
snapper_thinp-POOL_tdata: 15623749632 2929336320 linear 8:161 18432
snapper_thinp-other2: 0 2097152 thin 253:2 3
snapper_thinp-POOL_tmeta: 0 8192 linear 8:161 10240
snapper_thinp-other1: 0 2097152 thin 253:2 2
snapper_thinp-PV6: 0 155648 thin 253:2 12
snapper_thinp-PV5: 0 663552 thin 253:2 11

Comment 9 Mike Snitzer 2019-10-22 17:37:57 UTC
(In reply to Corey Marthaler from comment #8)
> Creation steps:
> 
> vgcreate   snapper_thinp /dev/sdk1 /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdf1
> lvcreate  --thinpool POOL -l95%FREE  --zero n --poolmetadatasize 4M
> snapper_thinp
> 
...
> lvcreate  -V 322M -T snapper_thinp/POOL -n PV5
> pvcreate --config devices/scan_lvs=1 /dev/snapper_thinp/PV5
> lvcreate  -V 74M -T snapper_thinp/POOL -n PV6
> pvcreate --config devices/scan_lvs=1 /dev/snapper_thinp/PV6
>   /dev/snapper_thinp/PV6: Data alignment must not exceed device size.
>   Format-specific initialisation of physical volume /dev/snapper_thinp/PV6
> failed.
>   Failed to setup physical volume "/dev/snapper_thinp/PV6".
> 
> 
> [root@hayes-01 ~]# dmsetup table
...
> snapper_thinp-POOL-tpool: 0 18553085952 thin-pool 253:0 253:1 284672 0 1
> skip_block_zeroing 


OK, so the underlying LV (PV6) is only 74M but the lvm2 chosen thin-pool blocksize of 139M (284672 * 512 = 145752064 = optimal_io_size) is larger.  Hence the error.

This command is really the one that should've failed: lvcreate -V 74M -T snapper_thinp/POOL -n PV6
Unless others can see why it makes sense for a logical address space to not even be able to allocate a single block from the underlying thin-pool?

Comment 10 Zdenek Kabelac 2019-10-22 20:10:21 UTC
When the chunk size is not specified - and  huge sizes of thin-pool 'dataLV' are used - then lvm2 tries to be conforming with
the requirement that metadata should not exceed 128MiB.

In this case --poolmetadatasize seems to be even futher restricted by 4MiB during thin-pool creation.

To meet the size limitation - lvm2 needs to 'scale-up'  chunk-size - that's likely how  139M chunk size gets created.

If user wants to any small chunk size - he can always 'set' his preferred default into lvm.conf 
or simply set the  '-c'  during thin-pool creation -  metadata size will grow accordingly.

The question is - how 'reasonable' is to expose option_io_size matching chunk size of huge dimensions - when it doesn't seem to have any influence on performance once we are probably beyond  512KiB anyway ?

Thin-pool chunk size can go upto 2GiB - so reporting this as 'optimal' maybe doesn't look as best approach ?

Comment 11 Mike Snitzer 2019-10-23 13:46:42 UTC
(In reply to Zdenek Kabelac from comment #10)
> When the chunk size is not specified - and  huge sizes of thin-pool 'dataLV'
> are used - then lvm2 tries to be conforming with
> the requirement that metadata should not exceed 128MiB.
> 
> In this case --poolmetadatasize seems to be even futher restricted by 4MiB
> during thin-pool creation.
> 
> To meet the size limitation - lvm2 needs to 'scale-up'  chunk-size - that's
> likely how  139M chunk size gets created.
> 
> If user wants to any small chunk size - he can always 'set' his preferred
> default into lvm.conf 
> or simply set the  '-c'  during thin-pool creation -  metadata size will
> grow accordingly.
> 
> The question is - how 'reasonable' is to expose option_io_size matching
> chunk size of huge dimensions - when it doesn't seem to have any influence
> on performance once we are probably beyond  512KiB anyway ?
> 
> Thin-pool chunk size can go upto 2GiB - so reporting this as 'optimal' maybe
> doesn't look as best approach ?

Your concern about setting such a large optimal_io_size is valid.  I'll think about it.  We really only have the single optimal_io_size hint to convey anything useful in terms of block limits.  While it maay seem foolish to say "2GB is optimal" it does convey that "this thinp volume's granularity of allocation is 2GB". *shrug*

BUT, that is really a tangential concern.  IMHO the larger problem is we're allowing the creation of a thin LV whose logical address space is smaller than a single thin-pool chunk.
I suppose thinp will accommodate it, by simply aallocating a block from the pool and only partially using it, but in general I'm missing: why we want to support this edge case?
Why not require the logical size to be at least as large as a single thin-pool block?  And preferably a multiple of the thin-pool blocksize.

Comment 12 Zdenek Kabelac 2019-10-23 13:59:23 UTC
With regards to the size of thinLV - we are ATM more or less limited to the sizes expressible as a multiple of extent_size - which is more strict then block/chunk_size of a thin-pool.

Personally I'd like to see some way how to actually completely 'free' relation between Virtually sized LV (like thinLV is)  and Logically sized LV - ATM we have those two 32bit numbers where number of extent times size of an extent gives the size of LV.

From users POV -  the size of LV should not be probably 'enforcing' users to create  LVs them might not 'fit' the need - i.e. if I create LV bigger then I want - i.e. I may cause automatic resize - so there can be cases where at least existing 'extent-size' based allocation might be wanted.  But maybe good enough service would be to give 'prompt' to user if he really wants to make an LV smaller then a single chunk of thin-pool and losing rest of chunk. i.e. User may provision space by some metric - and even if he has a 'pool' underneath using bigger chunks -  it might be important to provide  i.e. per MiB size increments granularity...    



As for optimal-io size - there is also probably impact from 'zeroing' - without zeroing I'd think the optimal-io size can likely by significantly smaller somehow more closely matching  _tdata geometry ?

Comment 13 Mike Snitzer 2019-10-23 14:40:56 UTC
(In reply to Zdenek Kabelac from comment #12)
> With regards to the size of thinLV - we are ATM more or less limited to the
> sizes expressible as a multiple of extent_size - which is more strict then
> block/chunk_size of a thin-pool.
> 
> Personally I'd like to see some way how to actually completely 'free'
> relation between Virtually sized LV (like thinLV is)  and Logically sized LV
> - ATM we have those two 32bit numbers where number of extent times size of
> an extent gives the size of LV.

The thin-pool blocksize should be a factor of the extent size (or vice-versa).

> From users POV -  the size of LV should not be probably 'enforcing' users to
> create  LVs them might not 'fit' the need - i.e. if I create LV bigger then
> I want - i.e. I may cause automatic resize - so there can be cases where at
> least existing 'extent-size' based allocation might be wanted.  But maybe
> good enough service would be to give 'prompt' to user if he really wants to
> make an LV smaller then a single chunk of thin-pool and losing rest of
> chunk. i.e. User may provision space by some metric - and even if he has a
> 'pool' underneath using bigger chunks -  it might be important to provide 
> i.e. per MiB size increments granularity...    

Not seeing why we need to accommodate such an inefficient use of the underlying storage.  If you take this to the logical conclusion: you're wasting space because it is completely inaccessible to the user.

> As for optimal-io size - there is also probably impact from 'zeroing' -
> without zeroing I'd think the optimal-io size can likely by significantly
> smaller somehow more closely matching  _tdata geometry ?

I'm not aware of any practical use for tracking optimal_io_size other than what XFS does.  It respects the hint when laying out its allocation groups (AGs).  So minimum_io_size and optimal_io_size can convey raid stripping (they reflect chunksize and stripesize respectively)... giving upper layers the insight that data layout should be on a thinp blocksize boundary is useful in this context.

Comment 14 Zdenek Kabelac 2019-10-23 14:59:50 UTC
(In reply to Mike Snitzer from comment #13)
> (In reply to Zdenek Kabelac from comment #12)
> > Personally I'd like to see some way how to actually completely 'free'
> > relation between Virtually sized LV (like thinLV is)  and Logically sized LV
> > - ATM we have those two 32bit numbers where number of extent times size of
> > an extent gives the size of LV.
> 
> The thin-pool blocksize should be a factor of the extent size (or
> vice-versa).

We have users using quite huge 'extent_size' (i.e. even 4GiB) as these were 'smart advices' provide by google engine.

On the other hand granularity of chunk-size is  64KiB multiple.

So joining these two together will always lead to some 'corner' cases where some space will
simply be wasted.


> 
> > From users POV -  the size of LV should not be probably 'enforcing' users to
> > create  LVs them might not 'fit' the need - i.e. if I create LV bigger then
> > I want - i.e. I may cause automatic resize - so there can be cases where at
> > least existing 'extent-size' based allocation might be wanted.  But maybe
> > good enough service would be to give 'prompt' to user if he really wants to
> > make an LV smaller then a single chunk of thin-pool and losing rest of
> > chunk. i.e. User may provision space by some metric - and even if he has a
> > 'pool' underneath using bigger chunks -  it might be important to provide 
> > i.e. per MiB size increments granularity...    
> 
> Not seeing why we need to accommodate such an inefficient use of the
> underlying storage.  If you take this to the logical conclusion: you're
> wasting space because it is completely inaccessible to the user.

The main case I had in mind is -  user want to provide device with precise size X - as it might be required i.e. to match some particular 'image' size you download out of net.

You may later use the 'hidden/lost' space by lvextend - but I'd probably not exclude the usage of  'smaller' LVs  as there might be requirements to provide i.e.
100MB LV  - even if thin-pool is using 512MiB chunks.

Telling users the LV is wasting 312MiB in a thin-pool is IMHO reasonable good info,
and user may decided whether it's good or bad for him.

(Of course using these huuuge chunk-size is probably corner case on its own....)

> > As for optimal-io size - there is also probably impact from 'zeroing' -
> > without zeroing I'd think the optimal-io size can likely by significantly
> > smaller somehow more closely matching  _tdata geometry ?
> 
> I'm not aware of any practical use for tracking optimal_io_size other than
> what XFS does.  It respects the hint when laying out its allocation groups
> (AGs).  So minimum_io_size and optimal_io_size can convey raid stripping
> (they reflect chunksize and stripesize respectively)... giving upper layers
> the insight that data layout should be on a thinp blocksize boundary is
> useful in this context.

My 'interpretation' of optimal_io_size here would be - if I go with this size - I'm optimally using bandwith of provided storage - but  using  1GiB optimal io size simple doesn't look like it will bring any extra benefit over using i.e. 1MiB (with zeroing disabled) 

Though I'm not really sure how 'widespread' usage of optimal_io_size is...

Comment 15 Mike Snitzer 2019-10-23 18:12:39 UTC
(In reply to Zdenek Kabelac from comment #14)
> (In reply to Mike Snitzer from comment #13)
> > (In reply to Zdenek Kabelac from comment #12)
> > > Personally I'd like to see some way how to actually completely 'free'
> > > relation between Virtually sized LV (like thinLV is)  and Logically sized LV
> > > - ATM we have those two 32bit numbers where number of extent times size of
> > > an extent gives the size of LV.
> > 
> > The thin-pool blocksize should be a factor of the extent size (or
> > vice-versa).
> 
> We have users using quite huge 'extent_size' (i.e. even 4GiB) as these were
> 'smart advices' provide by google engine.
> 
> On the other hand granularity of chunk-size is  64KiB multiple.
> 
> So joining these two together will always lead to some 'corner' cases where
> some space will
> simply be wasted.

Not if those 2 variables are sized with awareness of the other.  Which responsible users do, irresponsible users will rely on lvm2 to have sane defaults.

> > > From users POV -  the size of LV should not be probably 'enforcing' users to
> > > create  LVs them might not 'fit' the need - i.e. if I create LV bigger then
> > > I want - i.e. I may cause automatic resize - so there can be cases where at
> > > least existing 'extent-size' based allocation might be wanted.  But maybe
> > > good enough service would be to give 'prompt' to user if he really wants to
> > > make an LV smaller then a single chunk of thin-pool and losing rest of
> > > chunk. i.e. User may provision space by some metric - and even if he has a
> > > 'pool' underneath using bigger chunks -  it might be important to provide 
> > > i.e. per MiB size increments granularity...    
> > 
> > Not seeing why we need to accommodate such an inefficient use of the
> > underlying storage.  If you take this to the logical conclusion: you're
> > wasting space because it is completely inaccessible to the user.
> 
> The main case I had in mind is -  user want to provide device with precise
> size X - as it might be required i.e. to match some particular 'image' size
> you download out of net.
> 
> You may later use the 'hidden/lost' space by lvextend - but I'd probably not
> exclude the usage of  'smaller' LVs  as there might be requirements to
> provide i.e.
> 100MB LV  - even if thin-pool is using 512MiB chunks.
> 
> Telling users the LV is wasting 312MiB in a thin-pool is IMHO reasonable
> good info,
> and user may decided whether it's good or bad for him.
> 
> (Of course using these huuuge chunk-size is probably corner case on its
> own....)

Fair enough, if you think there utility in it that's fine.  I suppose having lvcreate warn would suffice.

> > > As for optimal-io size - there is also probably impact from 'zeroing' -
> > > without zeroing I'd think the optimal-io size can likely by significantly
> > > smaller somehow more closely matching  _tdata geometry ?
> > 
> > I'm not aware of any practical use for tracking optimal_io_size other than
> > what XFS does.  It respects the hint when laying out its allocation groups
> > (AGs).  So minimum_io_size and optimal_io_size can convey raid stripping
> > (they reflect chunksize and stripesize respectively)... giving upper layers
> > the insight that data layout should be on a thinp blocksize boundary is
> > useful in this context.
> 
> My 'interpretation' of optimal_io_size here would be - if I go with this
> size - I'm optimally using bandwith of provided storage - but  using  1GiB
> optimal io size simple doesn't look like it will bring any extra benefit
> over using i.e. 1MiB (with zeroing disabled) 
> 
> Though I'm not really sure how 'widespread' usage of optimal_io_size is...

optimal_io_size isn't purely about performance of an arbitrary single IO, it also serves as a useful indicator that being aligned on that boundary will yield better results.

Comment 18 RHEL Program Management 2021-02-01 07:41:02 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.


Note You need to log in before you can comment on or make changes to this bug.