1459646 – Pool space leak: Shrinking snapshotted volumes retains block mappings after deleting all snapshots

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1459646 - Pool space leak: Shrinking snapshotted volumes retains block mappings after deleting all snapshots

Summary: Pool space leak: Shrinking snapshotted volumes retains block mappings after d...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	lvm2
Sub Component:
Version:	7.3
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Zdenek Kabelac
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-06-07 17:10 UTC by bugzilla
Modified:	2023-12-15 15:55 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-11-11 21:56:23 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description bugzilla 2017-06-07 17:10:48 UTC

Description of problem:

If you shrink a thin logical volume that has been snapshotted, and then delete the snapshot, the space that should be freed by the snapshot deletion is not unmapped; it is held by the original device that was shrunken and lvs gives the following warning:
  WARNING: Thin volume data/usage_test maps 1.00 GiB while the size is only 512.00 MiB.

This affects CentOS 6 systems also, but instead of giving the warning it shows thin LV usage above 100%.


Version-Release number of selected component (if applicable):

Linux hvtest1.ewheeler.net 3.10.0-514.2.2.el7.x86_64 #1 SMP Tue Dec 6 23:06:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

lvm2-2.02.166-1.el7_3.3.x86_64


How reproducible:

Very


Steps to Reproduce:

1. Create thin logical volume

[root@hvtest1 ~]# lvcreate -T data/pool0 -V1g -n usage_test
  Using default stripesize 64.00 KiB.
  Logical volume "usage_test" created.

2. Fully allocate the volume

[root@hvtest1 ~]# dd if=/dev/zero bs=1M of=/dev/data/usage_test
dd: error writing ‘/dev/data/usage_test’: No space left on device

1025+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 18.6354 s, 57.6 MB/s

[root@hvtest1 ~]# lvs data/usage_test
  LV         VG   Attr       LSize Pool  Origin Data%  Meta%  Move Log Cpy%Sync Convert
  usage_test data Vwi-aotz-- 1.00g pool0        100.00                                 

3. Create a thin snapshot of that volume

[root@hvtest1 ~]# lvcreate -s -n usage_test_snap data/usage_test
  Using default stripesize 64.00 KiB.
  Logical volume "usage_test_snap" created.

[root@hvtest1 ~]# lvs |grep usage_test
  usage_test                                                                data       Vwi-a-tz--   1.00g pool0                                                    100.00                                 
  usage_test_snap                                                           data       Vwi---tz-k   1.00g pool0 usage_test                                                                                

4. Shrink the original volume

[root@hvtest1 ~]# lvresize -L512m data/usage_test
  WARNING: Reducing active logical volume to 512.00 MiB.
  THIS MAY DESTROY YOUR DATA (filesystem etc.)
Do you really want to reduce data/usage_test? [y/n]: y
  Size of logical volume data/usage_test changed from 1.00 GiB (256 extents) to 512.00 MiB (128 extents).
  Logical volume data/usage_test successfully resized.

[root@hvtest1 ~]# lvchange -K -ay data/usage_test_snap

[root@hvtest1 ~]# lvs |grep usage_test
  WARNING: Thin volume data/usage_test maps 1.00 GiB while the size is only 512.00 MiB.
  usage_test                                                                data       Vwi-a-tz-- 512.00m pool0                                                    100.00                                 
  usage_test_snap                                                           data       Vwi-a-tz-k   1.00g pool0 usage_test                                         100.00                                 

5. Remove the snapshot

[root@hvtest1 ~]# lvremove data/usage_test_snap
Do you really want to remove active logical volume data/usage_test_snap? [y/n]: y
  Logical volume "usage_test_snap" successfully removed

6. Note the warning. Also, if you were to inspect thin_dump of the tmeta, you would notice allocations beyond the volume size. In CentOS 6, the 100.00 would show 200.00 instead of giving the warning.

[root@hvtest1 ~]# lvs |grep usage_test
  WARNING: Thin volume data/usage_test maps 1.00 GiB while the size is only 512.00 MiB.
  usage_test                                                                data       Vwi-a-tz-- 512.00m pool0          100.00                 


Actual results:

Pool space is lost and not reclaimed.


Expected results:

Pool space should be freed when the snapshot is deleted.


Additional info:

Comment 2 bugzilla 2017-06-07 17:15:34 UTC

Note that deleting the origin frees 1gb (90g * (.1219 - .1108) == .999g) as expected given the nature of the bug:

[root@hvtest1 ~]# lvs data/pool0
  LV    VG   Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  pool0 data twi-aotz-- 90.00g             12.19  5.17                            
[root@hvtest1 ~]# lvremove data/usage_test
Do you really want to remove active logical volume data/usage_test? [y/n]: y
  Logical volume "usage_test" successfully removed
[root@hvtest1 ~]# lvs data/pool0
  LV    VG   Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  pool0 data twi-aotz-- 90.00g             11.08  5.12

Comment 3 Zdenek Kabelac 2017-06-08 08:35:20 UTC

This is not a new bug  - this is basically yet-to-be-resolved issue where lvm2 needs to figure some nice way how to do in configurable manner and without leaks and to long lock holdings.

For now - whenever user 'lvreduce'  size of thinLV  - the reduced chunk is NOT 'trimmed/discard' by lvm2 (so user may 'revert'  lvreduce step by vgcfgrestore in case he did a mistake - as lvm2 normally tries to support 1-command-back safe recovery)

So the current 'workaround' for a user is to 'discard'  this reduced chunk in-front by issuing 'blkdiscard' command.

Next problem on lvm2 side is - it cannot be done atomically so we need to basically put in internal mechanism to 'queue' some work into lvm2 metadata - this is planned as we needed for other pieces.

So the issue is known - but since not many users ever reduce device size - it's low prio ATM - unless there is important case in mind behind where the manual 'workaround' is not good enough.

Comment 4 Eric Wheeler 2017-07-28 19:17:05 UTC

Interesting.

You might add an option for lvresize like `--discard-reduction` that would blkdiscard the tail.  This could work on traditional LVs as well, no reason for it not to: indeed, some users might wish to discard their shrunken LVs for SSDs and such.

Since the user is invoking it, they can be aware of the potentially long resize/lock times by warning them in the option documentation.

Comment 5 Zdenek Kabelac 2017-07-28 19:26:30 UTC

For normal 'case'  i.e. linear  LV -  reduced/released extents can be immediately discarded by using lvm.conf   issue_discard=1  option.

Using an option with lvresize/lvremove might be probably another way telling system user is 100% he will not want TRIMed data back.

So probaly woth something thinking.

Comment 8 Zdenek Kabelac 2019-03-29 11:23:39 UTC

issue_discard  has *NO* effect on removal or reduction of thin LVs.

ATM setting 'issue_discard' only applies when a real physical extent is released back to VG - i.e. lvremove of plain linear LV releases X physical extents that are going to receive  'discard' while  they are returned back to set of free extents in a VG.

When  there is reduced thin LV - you would need to issue  'blkdiscard --offset' to the offset the is being reduced before actual lvreduce happens.

This is currently required to be made by user - otherwise the data stored in this reduce are still being hold by LV.
When user extends LV back to original size - all data whould be basically still there.
(so you could actually reduce data anytime later - just lvextend, blkdiscard, lvreduce)

Comment 9 Jiri Lunacek 2020-02-18 09:56:47 UTC

I can confirm that resizing the LV up, issuing blkdiscard --offset ... and resizing back down fixes the issue.

Comment 11 Chris Williams 2020-11-11 21:56:23 UTC

Red Hat Enterprise Linux 7 shipped it's final minor release on September 29th, 2020. 7.9 was the last minor releases scheduled for RHEL 7.
From intial triage it does not appear the remaining Bugzillas meet the inclusion criteria for Maintenance Phase 2 and will now be closed. 

From the RHEL life cycle page:
https://access.redhat.com/support/policy/updates/errata#Maintenance_Support_2_Phase
"During Maintenance Support 2 Phase for Red Hat Enterprise Linux version 7,Red Hat defined Critical and Important impact Security Advisories (RHSAs) and selected (at Red Hat discretion) Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available."

If this BZ was closed in error and meets the above criteria please re-open it flag for 7.9.z, provide suitable business and technical justifications, and follow the process for Accelerated Fixes:
https://source.redhat.com/groups/public/pnt-cxno/pnt_customer_experience_and_operations_wiki/support_delivery_accelerated_fix_release_handbook  

Feature Requests can re-opened and moved to RHEL 8 if the desired functionality is not already present in the product. 

Please reach out to the applicable Product Experience Engineer[0] if you have any questions or concerns.  

[0] https://bugzilla.redhat.com/page.cgi?id=agile_component_mapping.html&product=Red+Hat+Enterprise+Linux+7

Note You need to log in before you can comment on or make changes to this bug.