Bug 1413106

Summary:	full snapshot check on bootup: bootup duraction increases with increased changed blocks
Product:	[Community] LVM and device-mapper	Reporter:	Gerben <gerbgeus>
Component:	lvm2	Assignee:	LVM and device-mapper development team <lvm-team>
lvm2 sub component:	Snapshots	QA Contact:	cluster-qe <cluster-qe>
Status:	CLOSED NOTABUG	Docs Contact:
Severity:	unspecified
Priority:	unspecified	CC:	agk, heinzm, jbrassow, msnitzer, prajnoha, zkabelac
Version:	2.02.133	Flags:	rule-engine: lvm-technical-solution? rule-engine: lvm-test-coverage?
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-01-14 09:41:51 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Gerben 2017-01-13 16:10:22 UTC

Description of problem:
When a snapshot is created on a LV some form of replay is being performed when booting up systems. On increasing changed blocks this replay is taking increased time. A full replay should not be needed in my opinion, some form of journalling should require at most last few journals for replay.

Version-Release number of selected component (if applicable):
Ubuntu 16.04, but will re-test in Fedora 25 shortly

How reproducible:
View iostat upon bootting, dm for snapshots COW will increase on increased changed blocks.

Steps to Reproduce:
1. Generate a full filesystem (e.g. 19GB), then create a snapshot (also 19GB)
2. On every reboot change a different portion (like 350MB) of the origin
3. After reboot, log the results for iostat, it will increment steadily

(4. when testing using a virtual machine disable host's file caches to feel increasing boottimes. Easiest is to supply a blockdevice to the client in stead of a file on host's filesystem)


Actual results:
blocks read for the COW volume increase steadily on every reboot, boot times for the OS (Ubuntu) increase steadily on every reboot up unto it stops the boot process where I need to manually intervent.

Expected results:
Boot times should remain short. snapshot verification should be short as wel, independent of the number blocks in the snapshot volume.

Additional info:

Comment 1 Zdenek Kabelac 2017-01-14 09:41:51 UTC

The short answer could be:

Do not use old snapshots with large COW volumes. 
Switch to thin-provisioning  which is simply way better.

The longer answer is -  old snapshot was meant to be used for temporarily used consistent filesystem snapshots when taking their backups.  Who ever came with the idea of using this for multi-GB snapshot and keep them 'permanent' (long period of time)  misunderstood the original purpose.

Old snapshot simply can't deliver as the metadata are thrown across whole COW volume which effectively means you have to read (& parse) full COW volume size.
It also does occupy a lot of RAM then...

The old format is unfixable and very inefficient when taking more snapshost (although in newer kernels there were some patches trying to speed-up disk read performance)

New format is called  thin-provisioning - so please use this one.

Comment 2 Gerben 2017-01-15 11:18:13 UTC

Hi,

Yes: thin-provisioning does show proper performance characteristics.

From what I've read about it it also has nice characteristics on multiple (incremental) snapshots on the same source LV.

Defining thin-provisioning looks easy as well, just an extra layer of LV from which the final LV's are defined.

When creating snapshots on thin provisioned LV: do NOT provide the size parameter or the snapshot will end up outside the thin provisioning completely.

LVM2 keeps surprising me time and time again: nice!!

Gerben