Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1384578

Summary:	Document thinlv behavior when full
Product:	Red Hat Enterprise Linux 7	Reporter:	Steven J. Levine <slevine>
Component:	doc-Logical_Volume_Manager	Assignee:	Marek Suchánek <msuchane>
Status:	CLOSED DEFERRED	QA Contact:	Corey Marthaler <cmarthal>
Severity:	low	Docs Contact:
Priority:	low
Version:	7.3	CC:	agk, jbrassow, msnitzer, prajnoha, rhel-docs, zkabelac
Target Milestone:	rc	Keywords:	Documentation
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-12-04 17:48:53 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Steven J. Levine 2016-10-13 15:09:13 UTC

Zdenek Kabelac put together this information to describe thinlv behavior, and this should be incorporated into the product doc.

Subject: 
Re: about the thinlv behavior when full
From: 
Zdenek Kabelac <zkabelac>
Date: 
09/08/2016 04:22 AM
To: 
Nikhil Kshirsagar <nkshirsa>, Peter Rajnoha <prajnoha>

Dne 7.9.2016 v 19:47 Nikhil Kshirsagar napsal(a):
> Hi Peter,
>
> so today I created a thin pool of 100 MB, and a thin lv of 200 MB in it. I
> have thin_pool_autoextend_percent = 20 and thin_pool_autoextend_threshold = 100
>
> I wrote 200 mb to a file on the mounted fs on the thin lv using dd. (not directIO)
>
> df -h showed it full. (200 mb all written, not sure if cached, so I rebooted).
>
> So lvs and df -h looks like this after the system comes back after reboot,

Hi Nick

If you are new to Unix world - let me introduce you  'fsync()' and 'fdatasync()' system calls  (man 2 section).

Any program in Unix which does rely on properly stored data content on disk has to call these function (even before it exits!)

If it does not call them - it never knows the data were actually stored on this or they were directly thrown to the nearest trash can (again even if you would've expected this to happen automatically on apps exit - it will not!)

To show it on an real-world example - if your text editor would write your 'edited' data and does not call this operation BEFORE close() of the file handle - it has NO idea data were properly written or they were just lost.

The reason of data loss can wary.

You can use thin provisioned device, you can have falty device drive, you can have faulty hard drive.....    about gazillion reason.....

And this way we are getting to the nature of 'page-cache' system on linux.
Once you 'write' page  (and you are NOT using O_DIRECT) - your 'memory' chunk lands as anonymous piece of  bytes in page-cache - there is absolutely NO connection between your running process and these page-cached piece of bits.

Now the kflushd kicks-in  and with preconfigured behavior ensures there are at most X dirty pages which needs to lend on disk  (see  /proc/sys/vm/* settings)

Then there is ALSO filesystem behavior that differs between XFS & ext4 about the reaction on 'failed' write.

Hopefully now you start to understand there is no real way to answer your question.

As a hint - you would need to change EVERY program in your system to use 'mmap' for ANY file operation - then you would be able to track which pages are still 'dirty' and which are 'clean' - however this would quite challenging  task

And we are getting to the nature of thin-pool behavior.

Using thin-pool WITHOUT monitoring and  with 100% is in general UNSUPPORTED
and by RH it's seen as  major fail of system admin.

Usage of thin provisioning has its rules - if the user doesn't like then - he simply has to use different technology (i.e. plain fully provisioned device)
(we are always clearly communicating DO NOT USE full dm thin-pools)

Full thin-pool is nowhere near to be equivalent of full filesystem - it's a very very different level of problems - simply incomparable.
So whoever hopes in this filesystem-like behavior - simply doesn't understand it and should not use it.

As a supported we see a monitored usage of thin-pool where pool is not let pass reasonable threshold and is properly extended in time.

Trying to use thin-pool at 100% level is plain misuse of this technology and we simply can't support this usage for long list of different reasons
(in short - to much work and fixes across whole kernel)

To make it similar to some real-world case - full thin-pool behaves like a seriously faulty hdd device.

> I had even deleted the file that I had written to in the dd command, and then
> rebooted, but I still see 190 M used after reboot when i mounted it back and
> even in df -h output before the reboot. Whats taking up the 190MB if I deleted
> that file I had written, then rebooted, then mounted the fs back ?
>

So here we get to another Linux command - called  'fstrim'

Once you call it on a mountpoint - you let unused allocated FULL thin-pool chunks to be discarded (TRIM)

Please note - if 'discarded' areas are too small (i.e.  discarding 4K block - while  thin-pool chunk size is 128K - obviously nothing can be released on thin-pool side).
So in case you get too fragmented filesystem - it's not uncommon that lot of unused space inside filesystem stays allocated even after fstrim.

> And when it allowed me to write 190 M to the thinlv created in a 100MB thin
> pool lv,  did it write *all* the 190M to the cache? Any way I can prove how
> much if any was going to the cache for sure?

Now the advice for anyone who wants to know EXACTLY what was written and what not.

1. Do NOT use buffered operation (with all the page-cache logic behind).
2. Use DIRECT IO operation.

I hope this gives clear answer to your question.

Regards

Zdenek

Comment 1 Steven J. Levine 2017-02-22 18:29:43 UTC

Zdenek:

This BZ has been sitting in my queue for a long time. I had been putting it off in part because at first glance it looked as if this would require some extensive documentation of internals, but I'm now looking at it more carefully and I think I misinterpreted what was needed here, and it might just be a simple caution.

I think that what we need to document is something along the lines of the following:

Caution:
A thin pool cannot function at 100% capacity. For this reason, you must monitor the thin pool usage and extend the size of the thin pool when it exceeds a reasonable capacity.

Question: What is a reasonable capacity that we recommend?
Question: How would you summarize the issues that arise when a thin pool approaches 100% capacity -- is it enough to say "data loss may occur"?

Do we need to mention that when a thin pool reaches capacity, simply deleting files may not address the issues?

Do we need to mention that if you need to monitor what was written you must use a DIRECT IO operation rather than a buffered operation?

Thanks for any guidance here,

Steven

Comment 3 Red Hat Bugzilla 2023-09-14 03:32:24 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days