Bug 1733292

Summary: It is possible to create COW snapshot volumes larger than origin volumes
Product: Red Hat Enterprise Linux 8 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: LVM and device-mapper development team <lvm-team>
lvm2 sub component: Snapshots QA Contact: cluster-qe <cluster-qe>
Status: CLOSED NOTABUG Docs Contact:
Severity: low    
Priority: unspecified CC: agk, heinzm, jbrassow, msnitzer, prajnoha, zkabelac
Version: 8.1   
Target Milestone: rc   
Target Release: 8.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-07-29 20:47:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2019-07-25 15:49:15 UTC
Description of problem:
This isn't really a bug per se, more of a pointer for folks looking for a way to create an snapshot larger than the origin volume. Although maybe the "COW maximum usable size" message could be more precise or it could be mentioned in the man page?

I wanted to be able to reencrypt an origin volume with out the snapshot filling and becoming invalid and unusable. I learned that even though the message seems to imply there's no difference, creating a snap larger with give you 4m more and thus never fill up.

 
[root@hayes-02 ~]# lvcreate --wipesignatures y  -L 4G -n orig test
  Logical volume "orig" created.
[root@hayes-02 ~]# echo foobarglarch | cryptsetup luksFormat --type luks2 /dev/test/orig
[root@hayes-02 ~]# echo foobarglarch | cryptsetup luksOpen /dev/test/orig luks_origin

[root@hayes-02 ~]# lvcreate  -s /dev/test/orig -c 64 -n snap1 -L 4G
  Logical volume "snap1" created.
[root@hayes-02 ~]# lvcreate  -s /dev/test/orig -c 64 -n snap2 -L 5G
  Reducing COW size 5.00 GiB down to maximum usable size 4.00 GiB.
  Logical volume "snap2" created.

[root@hayes-02 ~]# lvs -a -o +devices,size --units m
  LV    VG   Attr       LSize    Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices         LSize   
  orig  test owi-aos--- 4096.00m                                                     /dev/sdb1(0)    4096.00m
  snap1 test swi-a-s--- 4096.00m      orig   0.00                                    /dev/sdb1(1024) 4096.00m
  snap2 test swi-a-s--- 4100.00m      orig   0.00                                    /dev/sdb1(2048) 4100.00m

[root@hayes-02 ~]#  lvcreate  -s /dev/test/orig -c 64 -n snap3 -L 4104.00m # try and add another 4m
  Reducing COW size <4.01 GiB down to maximum usable size 4.00 GiB.
  Logical volume "snap3" created.

[root@hayes-02 ~]# echo foobarglarch | cryptsetup luksOpen /dev/test/snap1 luks_snap1
[root@hayes-02 ~]# echo foobarglarch | cryptsetup luksOpen /dev/test/snap3 luks_snap3
[root@hayes-02 ~]# echo foobarglarch | cryptsetup reencrypt --active-name luks_origin
Finished, time 06:23.741, 4080 MiB written, speed  10.6 MiB/s   

# Snaps growing
[root@hayes-02 ~]# lvs -a -o +devices
  LV    VG   Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices        
  orig  test owi-aos--- 4.00g                                                     /dev/sdb1(0)   
  snap1 test swi-aos--- 4.00g      orig   9.56                                    /dev/sdb1(1024)
  snap2 test swi-a-s--- 4.00g      orig   9.55                                    /dev/sdb1(2048)
  snap3 test swi-aos--- 4.00g      orig   9.55                                    /dev/sdb1(3073)
[root@hayes-02 ~]# lvs -a -o +devices
  LV    VG   Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices        
  orig  test owi-aos--- 4.00g                                                     /dev/sdb1(0)   
  snap1 test swi-aos--- 4.00g      orig   12.27                                   /dev/sdb1(1024)
  snap2 test swi-a-s--- 4.00g      orig   12.26                                   /dev/sdb1(2048)
  snap3 test swi-aos--- 4.00g      orig   12.26                                   /dev/sdb1(3073)
[root@hayes-02 ~]# lvs -a -o +devices
  LV    VG   Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices        
  orig  test owi-aos--- 4.00g                                                     /dev/sdb1(0)   
  snap1 test swi-Ios--- 4.00g      orig   100.00                                  /dev/sdb1(1024)
  snap2 test swi-a-s--- 4.00g      orig   99.93                                   /dev/sdb1(2048)
  snap3 test swi-aos--- 4.00g      orig   99.93                                   /dev/sdb1(3073)


Jul 25 10:29:36 hayes-02 kernel: device-mapper: crypt: xts(aes) using implementation "xts-aes-aesni"
Jul 25 10:29:36 hayes-02 kernel: device-mapper: crypt: xts(aes) using implementation "xts-aes-aesni"
[87138.841597] device-mapper: snapshots: Invalidating snapshot: Unable to allocate exception.
Jul 25 10:29:37 hayes-02 kernel: device-mapper: snapshots: Invalidating snapshot: Unable to allocate exception.
Jul 25 10:29:38 hayes-02 lvm[4479]: WARNING: Snapshot test-snap1 changed state to: Invalid and should be removed.
Jul 25 10:29:38 hayes-02 dmeventd[4479]: No longer monitoring snapshot test-snap1.


Version-Release number of selected component (if applicable):
kernel-4.18.0-121.el8    BUILT: Tue Jul 23 09:49:25 CDT 2019
lvm2-2.03.05-2.el8    BUILT: Wed Jul 24 08:05:11 CDT 2019
lvm2-libs-2.03.05-2.el8    BUILT: Wed Jul 24 08:05:11 CDT 2019

Comment 1 Zdenek Kabelac 2019-07-29 13:31:59 UTC
So after brainstorming here comes the story:

snap1  was set exactly to 4G size being equal to origin size - so this snapshot does NOT have enough space for metadata - which are 'relatively' small compared to size of data - but still you need to store somewhere around 16bytes per chunk.

So snap1 needs monitoring & threshold set to a value good enough that dmeventd can 'extend' snapshot fast enough *before* it runs out of space - this can be challenging with today's hw speed of disk writes   - in this case it's not clear whether threshold for autoextend was left at 100% or autoextend_percent was not big enough or dmeeventd was not fast enough to match writing speed (reencryption can easily stream over 256MB/s) (and unlike with thins - there is no wait/sleep - if during the write there is no space in COW area - snapshost is directly rendered as Invalid)

snap2 & snap3 are both cupped at maximum size the snapshot can ever occupy (we do not support bigger snaps then origins - like you can create with thin volume snapshots).

So it should never be possible to overfill those 2 snapshots - they could only be filled to 100% capacity.

Also lvm2 does not monitor snapshots once they reach maximum capacity.

So I hope this explains this mystery and there is no real bug.

Also you should be able to create full/max 'COW' size with  "lvcreate -s -l100%ORIGIN vg/orig"

I think in past there might have been possibly few more words about some 'extra' snapshot logic - which might have been probably slightly changed with unification and automated generation of man pages.

So there was left only this little sentence in man page:

"A small amount of the COW snapshot LV size is used to track COW block locations"

So can we consider this a CLOSED ?

Comment 2 Corey Marthaler 2019-07-29 19:48:32 UTC
> snap2 & snap3 are both cupped at maximum size the snapshot can ever occupy
> (we do not support bigger snaps then origins - like you can create with thin
> volume snapshots).

Sure we do. You just need to know the "code" to create a snap bigger than the origin (i.e. give any size larger than the origin and you get an extra 4-20m larger than the origin but no more).

[root@hayes-02 ~]# lvcreate -L 4G -n origin test
  Logical volume "origin" created.
[root@hayes-02 ~]# lvcreate -s -l100%ORIGIN test/origin
  Logical volume "lvol0" created.
[root@hayes-02 ~]# lvs -a -o +devices
  LV     VG   Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices        
  lvol0  test swi-a-s--- <4.02g      origin 0.00                                    /dev/sdb1(1024)
  origin test owi-a-s---  4.00g                                                     /dev/sdb1(0)   
[root@hayes-02 ~]# lvcreate -s -L 5g test/origin
  Reducing COW size 5.00 GiB down to maximum usable size <4.02 GiB.
  Logical volume "lvol1" created.
[root@hayes-02 ~]# lvs -a -o +devices
  LV     VG   Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices        
  lvol0  test swi-a-s--- <4.02g      origin 0.00                                    /dev/sdb1(1024)
  lvol1  test swi-a-s--- <4.02g      origin 0.00                                    /dev/sdb1(2053)
  origin test owi-a-s---  4.00g                                                     /dev/sdb1(0)   
[root@hayes-02 ~]# lvcreate -s -L 6g test/origin
  Reducing COW size 6.00 GiB down to maximum usable size <4.02 GiB.
  Logical volume "lvol2" created.
[root@hayes-02 ~]# lvcreate -s -L 7g test/origin
  Reducing COW size 7.00 GiB down to maximum usable size <4.02 GiB.
  Logical volume "lvol3" created.
[root@hayes-02 ~]# lvcreate -s -L 700g test/origin
  Reducing COW size 700.00 GiB down to maximum usable size <4.02 GiB.
  Logical volume "lvol4" created.
[root@hayes-02 ~]# lvs -a -o +devices
  LV     VG   Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices        
  lvol0  test swi-a-s--- <4.02g      origin 0.00                                    /dev/sdb1(1024)
  lvol1  test swi-a-s--- <4.02g      origin 0.00                                    /dev/sdb1(2053)
  lvol2  test swi-a-s--- <4.02g      origin 0.00                                    /dev/sdb1(3082)
  lvol3  test swi-a-s--- <4.02g      origin 0.00                                    /dev/sdb1(4111)
  lvol4  test swi-a-s--- <4.02g      origin 0.00                                    /dev/sdb1(5140)
  origin test owi-a-s---  4.00g                                                     /dev/sdb1(0)   
[root@hayes-02 ~]# lvs -a -o +devices --units m
  LV     VG   Attr       LSize    Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices        
  lvol0  test swi-a-s--- 4116.00m      origin 0.00                                    /dev/sdb1(1024)
  lvol1  test swi-a-s--- 4116.00m      origin 0.00                                    /dev/sdb1(2053)
  lvol2  test swi-a-s--- 4116.00m      origin 0.00                                    /dev/sdb1(3082)
  lvol3  test swi-a-s--- 4116.00m      origin 0.00                                    /dev/sdb1(4111)
  lvol4  test swi-a-s--- 4116.00m      origin 0.00                                    /dev/sdb1(5140)
  origin test owi-a-s--- 4096.00m                                                     /dev/sdb1(0)   


> So it should never be possible to overfill those 2 snapshots - they could
> only be filled to 100% capacity.

Nope only to 99.93% since they are 4-20M larger than the origin volume (if only writing to the origin volume).

Comment 3 Zdenek Kabelac 2019-07-29 20:47:19 UTC
(In reply to Corey Marthaler from comment #2)
> > snap2 & snap3 are both cupped at maximum size the snapshot can ever occupy
> > (we do not support bigger snaps then origins - like you can create with thin
> > volume snapshots).
> 
> Sure we do. You just need to know the "code" to create a snap bigger than
> the origin (i.e. give any size larger than the origin and you get an extra
> 4-20m larger than the origin but no more).

This was never designed to be supported - so all snapshots always have the
same size of 'usable' LV as origin.

Yes - in lvs output we present size of 'COW' storage - which can be different.
But using COW storage above 'addressable space of snapshot would be effectively
wasting space in VG - so lvm2 automatically truncates size of COW storage,
to maximum size that can ever be used.

When users do need 'different size of snapshot from their origin,
they should switch to thin provisioning.

Hopefully this explains things in details - so closing this bug.