Bug 1321608
Summary: | When using RHEV with thin provisioned block storage, SAN space is allocated upon creation of VM virtual disks, but never returned back to the SAN after deleting virtual disks, causing the storage usage to never shrink | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Greg Scott <gscott> |
Component: | lvm2 | Assignee: | LVM and device-mapper development team <lvm-team> |
lvm2 sub component: | Thin Provisioning | QA Contact: | cluster-qe <cluster-qe> |
Status: | CLOSED DUPLICATE | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | agk, amureini, fsimonce, gscott, heinzm, jbrassow, mkalinin, msnitzer, prajnoha, prockai, thornber, ylavi, zkabelac |
Version: | 7.2 | ||
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-05-26 12:20:16 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Greg Scott
2016-03-28 15:24:41 UTC
WRT implementation, please consider comment 10 on bug 981626: https://bugzilla.redhat.com/show_bug.cgi?id=981626#c10 Using blkdiscard on the LV instead of "issue_discards = 1" is the preferable (safest) approach. Yes - your comments in BZ are quite true. There is 'misunderstanding' what 'issue_discards=1' really means. It has nothing in common with 'passing through' discards to underlying device - this works no matter what is set for issue_discards. So issue_discard takes only place when some 'real' space in VG is released (aka lvremove). And this happens when LV device itself is ALREADY destroyed - individual released extent chunks are then 1-by-1 discarded (and yes it can be lengthy operation - preventing other lvm2 commands to take action). With 'thin-pool' there is no real-space returned to 'VG' - as all space is effectively still kept inside dataLV. Now this BZ effectively wants from 'lvm2' to implement some kind of 'pre-remove-command-to-execute' So intead of user executing blkdiscard - lvm2 could automatically call such operation (on still active LV) - from the nature of this operation - there are races since nothing will prevent such LV to be again opened and written during this 'time-window' - but it's likely the price we need to live with. So as a workaround until something 'clever' will automate this in lvm2 - calling blkdiscard prio 'lvremove' call will have the same effect. ----------------- under-the-line------------------------ Users should NOT actually use issue_discard by default - it makes 'vgcfgrestore' pointless.... (I've already seen quite a few users left with just dump content of LV after they have realized their lvreduce/lvremove operation was not what they really wanted....) Normally the space is TRIM-ed/discard after new LV is allocated and 'mkfs' is executed. Thanks Zdenek. I set up this bz mostly to ask for guidance on the best way to proceed because the lost storage in this use case adds up to multiple terabytes. If blkdiscard is the best way to proceed and can work right now, I don't have strong feelings that LVM needs major surgery. For the RHEV/Ovirt team and other engineering consumers of LVM - will blkdiscard achieve the goal of returning unallocated space back to the SAN and is it easy to do? And how fast can we implement it? I think the solution should be that the 'wipe after delete' action should be converted to use 'write_same' (via blkdiscard) and fill the area with zeros. (In reply to Yaniv Kaul from comment #22) > I think the solution should be that the 'wipe after delete' action should be > converted to use 'write_same' (via blkdiscard) and fill the area with zeros. Generally true, but you can see here that we use postZeroes=0, i.e., no wiping, just a bunch of lvrmeoves Ok - I'm really getting confused from all the comments and BZ title. So - if this suppose to remain as an 'lvm2' BZ - could anyone please some condense what is supposedly an lvm2 problem for fixing ? From my current understanding - it's not a problem with lvm2 'thin-provisioning' and TRIM/discard support. lvm2 team has no real idea what RHEV calls thin-provisiong in other context. So - we would like to see some 'basic' summary about: current input (executed command) current output (what is missing on storage side?) expected fixes for lvm2 Zdenek, I wish I could provide the info you're asking for. But from the field point of view I don't know if we need lvm2 fixes or just RHEV fixes or fixes to both. I don't know the internals on how this all works, I only know the overall behavior is broken and needs to be fixed yesterday. I also know there have been several BZs around this same problem and none of them have been resolved satisfactorily. So customers keep experiencing the problem I documented in the original problem statement. If RHEV does lvremoves, but the lvremove doesn't tell the SAN about it, that feels like an LVM problem to me. But I only see the whole package, not deep into its components. I was hoping this time for a meeting of the minds with RHEV and LVM2 and any other components that make up the total package so we can finally solve this problem once and for all. I'll leave the needsinfo turned on because I'm not able to provide the info you need, but perhaps others from RHEV and other Engineering groups can. - Greg Still confused... From previous comments it does look like RHEV is using lvm2 thin-provisioning over SAN-kind-thin-provisiong as IMHO if the 'SAN' needs to release unused blocks - it does look like another kind of lvm2 thin-pool running on the SAN machine. This doesn't seem to be practical to have both technologies doing the same ? Is there any practical reason to use then lvm2 thins in this case instead of plain linear LVs ? (snapshots ??) As a 'workaround' for lvm2 thins - RHEV may just call 'blkdiscard' before calling lvremove (i.e. basically wrapper shell script lvremovediscardthin) - lvm2 cannot do anything better anyway, but ATM lvm2 does not provide 'hooks' for calling apps before lvremove. Would this whole issue go away by using fully provisioned RHEV virtual disk images on top of SAN thin-provisioned LUNs? If that's an easy workaround and it solves the problem, I'm more than willing to try it. But if I use fully provisioned RHEV storage, does that consume more SAN space than thin provisioned RHEV storage, even though the SAN LUN is thin provisioned? Reading back the history of this BZ - mainly comment 7 and its referenced BZ and Federico analysis - To get to the core - I'd probably prefer some confcall/irc discussion to move more quickly forward here. It seems there is no 'lvm2' issue - as there is no thinLV in use, and for normal LVs we do send discard right to underlying PVs (and holding VG write lock). So is the PV device (SAN) actually supporting discard ? (Could be easily checked via /sys/block/... content). Given the fact the 'discard' ioctl is synchronous and made while holding VG write lock as Federico correctly pointed out - it's much better to call blkdiscard prior taking the lock. I'll just explain difference a minor difference as bug 981626 comment 10 is not completely correct. Issuing blkdiscard on active LV has an small 'race' problem - if you have still some user there - after discard operation the device space may still be occupied by new written content of a device user. So lvm2 does NOT send discard on an ACTIVE LV - instead it is discarding areas on PV - for such operation lvm2 ATM holds the lock so such LV cannot be activated - we may improve this eventually and deploy polling architecture if that would be requested - but it'd be non-trvial RFE. Getting back to the BZ - Where in this chain of operation is seen an issue with lvm2 ? We would need to then metadata, and exact trace of single lvm2 operations (not a mixture trace of python logging) where lvm2 does not discard released block. Also we need to see that system does support discard on such PV device. Can you please provide a reply on the above question? Zdenek, Yaniv - I don't know how to answer Zdenek's question:
> Where in this chain of operation is seen an issue with lvm2 ?
I don't know. I do know that when we delete virtual machine disk images, the free space never becomes free from the SAN point of view and my customer put in two 18 hour days over the past weekend rolling over new RHEV storage domains to recover some free SAN space. Hopefully we can all put our heads together and come up with a solution. I'm happy to IRC or talk on the phone or video as needed.
- Greg
I think we know know this is a duplicate of bug 981626 - we need to send DISCARD when deleting the disk (LV), so XtremIO can reclaim space. *** This bug has been marked as a duplicate of bug 981626 *** |