It would be great if TRIM commands from withing the guests actually caused thin-provisioned disks on iSCSI and FC storage domains to shrink. For example: 1. Guest OS writes 10 GB file to local filesystem -> thinly-provisioned disk grows by 10 GB. 2. User deletes the file from Guest file system. 3. Guest OS issues TRIM command for blocks occupied by the file. 4. Disk shrinks by ~ 10 GB. According to discussion on rhev-tech: > qcow2 discards are available in RHEL 7 hosts. They should work ok on NFS > but in iscsi domains, it won't re-shrink the disk images, they just > shouldn't grow
There are many where TRIM/DISCARD have to be enabled in order to work. Common ones are: 1. Guest FS (scheduled - fstrim or immediate - fstab's discards approaches) 2. Guest LVM (lvm.conf: issue_discards = 1) ---- guest/host boundary ---- 3. kvm disk driver (<disk><driver discard='unmap'/><target bus='scsi'/></disk>) 4. qcow2 (enabled by default in el7 hosts) Lower layers for iscsi storage are: 5. storage domain's LVM (issue_discards = 1). This is host-wide setting! 6. dm-multipath 7. iscsi initiator ---- host/storage boundary ---- 8. target server 9. backing storage Please note that there are two main use cases for TRIM/DISCARD: 1. freeing up storage space taken by deleted data: a) host- (RHEV-) side b) storage-side 2. prevent performance degradation of SSD-backed storage and a single reason to prevent discards from reaching storage: that they can block other I/O operations thus making I/O slow. Use case 1.a) applies to pretty-much any RHEV setup and it can be enabled and working irrespective of lower layers set-up. Conversely, enabling discards in lower layers by default can be tricky due to system-wide nature of issue_discards @ lvm.conf, risk of performance regressions on existing setups and diversity of setup options. Given all the above, I'd think that somewhat granular approach would make most sense: Upper layers: expose some "enable TRIM" knob in properties of: * disk * VM * instance type * storage domain (with "enable TRIM for new disks" name) ordered from highest priority to lowest. These should be enabled by default for all but most performance-focused VMs (instance types) Lower layers: A: Storage domain level settings for file-based SD's. These should be enabled by default as well IMO. B: host-level (thus cluster- or DC-level) setting for all block domains (iSCSI, FC) in the cluster/DC. Should be off by default for reasons mentioned above
I'm most interested in use case labeled as 1.a) and even checkbox for this is fine with me. Feel free to split remaning parts to separate bugs or open a separate bug for use-case 1.a) as it suits you. I would like to see 1.a) covered in reasonable near future :-) Thank you very much!
issue_discards = 1 in LVM has nothing to do with this RFE. It is to issue discard when performing lvremove. (A better idea than setting it globally would to run 'blkdiscard' on the specific LV that we wish to discard before removing)
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
(In reply to Yaniv Kaul from comment #5) > issue_discards = 1 in LVM has nothing to do with this RFE. > It is to issue discard when performing lvremove. > (A better idea than setting it globally would to run 'blkdiscard' on the > specific LV that we wish to discard before removing) Yaniv, I'm trying to remember where I previously read this comment some month/year ago, maybe from you. I'm most interested in this issue_discards = 1 in LVM at the storage domain level and I'd like to know if a specific BZ has already been created for that. I know https://bugzilla.redhat.com/show_bug.cgi?id=1017284 but I doubt this will cover my use case.
(In reply to Nicolas Ecarnot from comment #10) > (In reply to Yaniv Kaul from comment #5) > > issue_discards = 1 in LVM has nothing to do with this RFE. > > It is to issue discard when performing lvremove. > > (A better idea than setting it globally would to run 'blkdiscard' on the > > specific LV that we wish to discard before removing) > > Yaniv, > > I'm trying to remember where I previously read this comment some month/year > ago, maybe from you. > > I'm most interested in this issue_discards = 1 in LVM at the storage domain > level and I'd like to know if a specific BZ has already been created for > that. > > I know https://bugzilla.redhat.com/show_bug.cgi?id=1017284 but I doubt this > will cover my use case. There are several reasons why I'd prefer we use blkdiscard on that LV: 1. If security is concerned, on some arrays you really need to wipe the data. This can be done using 'write_same' writing zeros (which was just introduced into blkdiscard upstream), or with our current 'wipe after delete' method which 'dd' zeros into it. Discarding may or may not be secure, depending on the array. 2. The assumption is that in most cases, using zero's should also reclaim space from smart arrays (but granted, not all). 3. Lastly, regardless if you did or did not use wipe with zero's or blkdiscard with write_same, running blkdiscard on the LV should result in the very same effect as issue_discards - so I personally see no reason to change LVM settings (global) when we can achieve the same effect with blkdiscard. Specifically, we might in the future add lvm-cache support, so really changing global LVM setting is discouraged (or perhaps there are LVs where you don't want to do that?).
(In reply to Yaniv Kaul from comment #11) > (or perhaps there are LVs where you don't want to do that?). issue_discards can be configured at LV level, but by having issue_discards=0 in lvm.conf we are blocking it everywhere. Since some not so smart arrays won't react to writing all zeroes, and since some customers might not be so sensitive to security considerations, why don't we allow customers to choose which mechanism is used? (and where?) Some customers might want to enable discards so they can run 'fstrim' periodically in their long living VMs and recover some of the wasted disk space.
(In reply to Julio Entrena Perez from comment #12) > Some customers might want to enable discards so they can run 'fstrim' > periodically in their long living VMs and recover some of the wasted disk > space. +1, that was the main motivation
(In reply to Julio Entrena Perez from comment #12) > (In reply to Yaniv Kaul from comment #11) > > (or perhaps there are LVs where you don't want to do that?). > > issue_discards can be configured at LV level, but by having issue_discards=0 > in lvm.conf we are blocking it everywhere. Really? That sounds like a LVM bug to me - and a serious one - are you sure with it disabled it doesn't passthrough discards? > Since some not so smart arrays won't react to writing all zeroes, and since > some customers might not be so sensitive to security considerations, why > don't we allow customers to choose which mechanism is used? (and where?) > Some customers might want to enable discards so they can run 'fstrim' > periodically in their long living VMs and recover some of the wasted disk > space. But those customers who wish to run 'fstrim' from their VMs can use TODAY the hook that enables discard passthrough on the disk. You are confusing two different use cases for discard - during VM lifecycle and when you delete a disk. If you are saying that issue_discards=0 disables passthrough of such discard, please open a bug on LVM2 (and CC me!). Lastly, we do want to allow the customer to configure to choose: 1. Use 'shred' 2. Use 'write_same' for writing zeros 3. Use blkdiscard (with write_same before or not) 4. Use our current behavior. I think it's flexible enough.
(In reply to Yaniv Kaul from comment #14) > You are confusing two > different use cases for discard - during VM lifecycle and when you delete a > disk. > If you are saying that issue_discards=0 disables passthrough of such > discard, please open a bug on LVM2 (and CC me!). Sorry, I'm confusing those two use cases indeed. > Lastly, we do want to allow the customer to configure to choose: > 1. Use 'shred' > 2. Use 'write_same' for writing zeros > 3. Use blkdiscard (with write_same before or not) > 4. Use our current behavior. > > I think it's flexible enough. Awesome, thank you Yaniv.
Note: enabling this option on the disk should be mutually exclusive with 'wipe after delete' - if you allow the guest to discard blocks at will, and trust the storage to actually wipe it, no sense in wiping, as theoretically those discarded blocks may already have been 'leaked' to others.
That's partially true. I am going to question the storage domain for his discard_zeroes_data value (in fact, an AND on all of its luns). If it's true, we can use both of them together as the storage guarantees to deterministically return zeroes when a discarded area is read. If it's false, then I am not going to allow to enable both of them together. Thanks for noticing.
Yaniv, this BZ shouldn't be on POST yet, as the patches are partial and were sent as drafts with no reviewers. When I send the full patches for review, I'll move it back to POST.
The fix for this issue should be included in oVirt 4.1.0 beta 1 released on December 1st. If not included please move back to modified.
Why is this ON_QA?
Hi, from what I understand, this should also work for local storage backed setups? kind regards Sven
(In reply to Sven Kieske from comment #22) > Hi, > > from what I understand, this should also work for local storage backed > setups? > > kind regards > > Sven we don't support local FC and ISCSI. If you passthrough, it should work, but not tested due.
A local storage domain is a file storage domain. For file domains, the Pass Discard feature works on the basis of best effort, i.e it "...can be enabled for any disk on any file storage domain, and it will actually work only if the underlying file system and block device support it." (taken from the feature page [1]). So generally it should work, but it was never tested for local storage AFAIK. [1] http://www.ovirt.org/develop/release-management/features/storage/pass-discard-from-guest-to-underlying-storage/#file-storage-1
Tested with the following code: ------------------------------------------ ovirt-engine-4.1.0.4-0.1.el7.noarch rhevm-4.1.0.4-0.1.el7.noarch vdsm-4.19.4-1.el7ev.x86_64 Moving to VERIFIED!