Bug 1241106 (RHEV_pass_discard_from_guest) - [RFE] Allow TRIM from within the guest to shrink thin-provisioned disks on iSCSI and FC storage domains
Summary: [RFE] Allow TRIM from within the guest to shrink thin-provisioned disks on iS...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: RHEV_pass_discard_from_guest
Product: ovirt-engine
Classification: oVirt
Component: RFEs
Version: 3.5.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ovirt-4.1.0-alpha
: 4.1.0.2
Assignee: Idan Shaby
QA Contact: Kevin Alon Goldblatt
URL:
Whiteboard:
Depends On: RHEV_Discard_on_DirectLUN
Blocks: 1213937 1422158
TreeView+ depends on / blocked
 
Reported: 2015-07-08 12:57 UTC by Petr Spacek
Modified: 2017-02-15 14:59 UTC (History)
22 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Previously, discard commands (UNMAP SCSI commands) that were sent from the guest were ignored by qemu and were not passed on to the underlying storage. This meant that storage that was no longer in use could not be freed up. In this release, it is now possible to pass on discard commands to the underlying storage. A new property called Pass Discard was added to the Virtual Disk window. When selected, and if all the restrictions are met, discard commands that are sent from the guest will not be ignored by qemu and will be passed on to the underlying storage. The reported unused blocks in the underlying storage thinly provisioned LUNs will be marked as free, and the reported consumed space will be reduced.
Clone Of:
Environment:
Last Closed: 2017-02-15 14:59:26 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.1+
ratamir: testing_plan_complete+
ylavi: planning_ack+
amureini: devel_ack+
ratamir: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 59225 0 master MERGED Add passDiscard to DiskAttachment 2021-01-19 07:05:44 UTC
oVirt gerrit 60628 0 master MERGED storage: add getDeviceList discard related fields 2021-01-19 07:05:44 UTC
oVirt gerrit 61897 0 master MERGED core: introduce Pass Discard support for vm disks 2021-01-19 07:05:44 UTC
oVirt gerrit 61898 0 master MERGED ui: add support for the Pass Discard property 2021-01-19 07:05:44 UTC
oVirt gerrit 64007 0 master MERGED Add discardMaxSize and discardZeroesData to LogicalUnit 2021-01-19 07:05:44 UTC
oVirt gerrit 64009 0 master MERGED core: add discard related fields to LUNs 2021-01-19 07:06:24 UTC
oVirt gerrit 64010 0 master MERGED backend: update sd luns when discard fields change 2021-01-19 07:05:45 UTC
oVirt gerrit 64011 0 master MERGED core: add calculated discard fields for storage domain 2021-01-19 07:05:45 UTC
oVirt gerrit 65751 0 master MERGED storage: add discard support for vm disks 2021-01-19 07:05:45 UTC
oVirt gerrit 66032 0 master MERGED restapi: Update to metamodel 1.1.8 2021-01-19 07:05:45 UTC

Description Petr Spacek 2015-07-08 12:57:35 UTC
It would be great if TRIM commands from withing the guests actually caused thin-provisioned disks on iSCSI and FC storage domains to shrink.

For example:
1. Guest OS writes 10 GB file to local filesystem -> thinly-provisioned disk grows by 10 GB.
2. User deletes the file from Guest file system.
3. Guest OS issues TRIM command for blocks occupied by the file.
4. Disk shrinks by ~ 10 GB.

According to discussion on rhev-tech:
> qcow2 discards are available in RHEL 7 hosts. They should work ok on NFS
> but in iscsi domains, it won't re-shrink the disk images, they just
> shouldn't grow

Comment 2 David Jaša 2015-07-09 10:43:23 UTC
There are many where TRIM/DISCARD have to be enabled in order to work. Common ones are:
1. Guest FS (scheduled - fstrim or immediate - fstab's discards approaches)
2. Guest LVM (lvm.conf: issue_discards = 1)
---- guest/host boundary ----
3. kvm disk driver (<disk><driver discard='unmap'/><target bus='scsi'/></disk>)
4. qcow2 (enabled by default in el7 hosts)

Lower layers for iscsi storage are:
5. storage domain's LVM (issue_discards = 1). This is host-wide setting!
6. dm-multipath
7. iscsi initiator
---- host/storage boundary ----
8. target server
9. backing storage


Please note that there are two main use cases for TRIM/DISCARD:
1. freeing up storage space taken by deleted data:
    a) host- (RHEV-) side
    b) storage-side
2. prevent performance degradation of SSD-backed storage
and a single reason to prevent discards from reaching storage: that they can block other I/O operations thus making I/O slow.

Use case 1.a) applies to pretty-much any RHEV setup and it can be enabled and working irrespective of lower layers set-up. Conversely, enabling discards in lower layers by default can be tricky due to system-wide nature of issue_discards @ lvm.conf, risk of performance regressions on existing setups and diversity of setup options.


Given all the above, I'd think that somewhat granular approach would make most sense:
Upper layers: expose some "enable TRIM" knob in properties of:
  * disk
  * VM
  * instance type
  * storage domain (with "enable TRIM for new disks" name)
ordered from highest priority to lowest. These should be enabled by default for all but most performance-focused VMs (instance types)

Lower layers:
A: Storage domain level settings for file-based SD's. These should be enabled by default as well IMO.
B: host-level (thus cluster- or DC-level) setting for all block domains (iSCSI, FC) in the cluster/DC. Should be off by default for reasons mentioned above

Comment 4 Petr Spacek 2015-07-13 10:14:28 UTC
I'm most interested in use case labeled as 1.a) and even checkbox for this is fine with me. Feel free to split remaning parts to separate bugs or open a separate bug for use-case 1.a) as it suits you. I would like to see 1.a) covered in reasonable near future :-)

Thank you very much!

Comment 5 Yaniv Kaul 2015-11-22 11:11:16 UTC
issue_discards = 1 in LVM has nothing to do with this RFE.
It is to issue discard when performing lvremove.
(A better idea than setting it globally would to run 'blkdiscard' on the specific LV that we wish to discard before removing)

Comment 7 Red Hat Bugzilla Rules Engine 2015-12-01 15:42:48 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 10 Nicolas Ecarnot 2016-04-05 14:23:34 UTC
(In reply to Yaniv Kaul from comment #5)
> issue_discards = 1 in LVM has nothing to do with this RFE.
> It is to issue discard when performing lvremove.
> (A better idea than setting it globally would to run 'blkdiscard' on the
> specific LV that we wish to discard before removing)

Yaniv,

I'm trying to remember where I previously read this comment some month/year ago, maybe from you.

I'm most interested in this issue_discards = 1 in LVM at the storage domain level and I'd like to know if a specific BZ has already been created for that.

I know https://bugzilla.redhat.com/show_bug.cgi?id=1017284 but I doubt this will cover my use case.

Comment 11 Yaniv Kaul 2016-04-05 14:40:28 UTC
(In reply to Nicolas Ecarnot from comment #10)
> (In reply to Yaniv Kaul from comment #5)
> > issue_discards = 1 in LVM has nothing to do with this RFE.
> > It is to issue discard when performing lvremove.
> > (A better idea than setting it globally would to run 'blkdiscard' on the
> > specific LV that we wish to discard before removing)
> 
> Yaniv,
> 
> I'm trying to remember where I previously read this comment some month/year
> ago, maybe from you.
> 
> I'm most interested in this issue_discards = 1 in LVM at the storage domain
> level and I'd like to know if a specific BZ has already been created for
> that.
> 
> I know https://bugzilla.redhat.com/show_bug.cgi?id=1017284 but I doubt this
> will cover my use case.

There are several reasons why I'd prefer we use blkdiscard on that LV:
1. If security is concerned, on some arrays you really need to wipe the data. This can be done using 'write_same' writing zeros (which was just introduced into blkdiscard upstream), or with our current 'wipe after delete' method which 'dd' zeros into it. Discarding may or may not be secure, depending on the array.
2. The assumption is that in most cases, using zero's should also reclaim space from smart arrays (but granted, not all).
3. Lastly, regardless if you did or did not use wipe with zero's or blkdiscard with write_same, running blkdiscard on the LV should result in the very same effect as issue_discards - so I personally see no reason to change LVM settings (global) when we can achieve the same effect with blkdiscard. Specifically, we might in the future add lvm-cache support, so really changing global LVM setting is discouraged (or perhaps there are LVs where you don't want to do that?).

Comment 12 Julio Entrena Perez 2016-04-05 14:46:12 UTC
(In reply to Yaniv Kaul from comment #11)
> (or perhaps there are LVs where you don't want to do that?).

issue_discards can be configured at LV level, but by having issue_discards=0 in lvm.conf we are blocking it everywhere.
Since some not so smart arrays won't react to writing all zeroes, and since some customers might not be so sensitive to security considerations, why don't we allow customers to choose which mechanism is used? (and where?)
Some customers might want to enable discards so they can run 'fstrim' periodically in their long living VMs and recover some of the wasted disk space.

Comment 13 Petr Spacek 2016-04-06 08:06:04 UTC
(In reply to Julio Entrena Perez from comment #12)
> Some customers might want to enable discards so they can run 'fstrim'
> periodically in their long living VMs and recover some of the wasted disk
> space.

+1, that was the main motivation

Comment 14 Yaniv Kaul 2016-04-06 08:39:24 UTC
(In reply to Julio Entrena Perez from comment #12)
> (In reply to Yaniv Kaul from comment #11)
> > (or perhaps there are LVs where you don't want to do that?).
> 
> issue_discards can be configured at LV level, but by having issue_discards=0
> in lvm.conf we are blocking it everywhere.

Really? That sounds like a LVM bug to me - and a serious one - are you sure with it disabled it doesn't passthrough discards?

> Since some not so smart arrays won't react to writing all zeroes, and since
> some customers might not be so sensitive to security considerations, why
> don't we allow customers to choose which mechanism is used? (and where?)
> Some customers might want to enable discards so they can run 'fstrim'
> periodically in their long living VMs and recover some of the wasted disk
> space.

But those customers who wish to run 'fstrim' from their VMs can use TODAY the hook that enables discard passthrough on the disk. You are confusing two different use cases for discard - during VM lifecycle and when you delete a disk.
If you are saying that issue_discards=0 disables passthrough of such discard, please open a bug on LVM2 (and CC me!).

Lastly, we do want to allow the customer to configure to choose:
1. Use 'shred'
2. Use 'write_same' for writing zeros
3. Use blkdiscard (with write_same before or not)
4. Use our current behavior.

I think it's flexible enough.

Comment 15 Julio Entrena Perez 2016-04-06 09:33:09 UTC
(In reply to Yaniv Kaul from comment #14)

> You are confusing two
> different use cases for discard - during VM lifecycle and when you delete a
> disk.
> If you are saying that issue_discards=0 disables passthrough of such
> discard, please open a bug on LVM2 (and CC me!).

Sorry, I'm confusing those two use cases indeed.

> Lastly, we do want to allow the customer to configure to choose:
> 1. Use 'shred'
> 2. Use 'write_same' for writing zeros
> 3. Use blkdiscard (with write_same before or not)
> 4. Use our current behavior.
> 
> I think it's flexible enough.

Awesome, thank you Yaniv.

Comment 16 Yaniv Kaul 2016-07-13 14:23:38 UTC
Note: enabling this option on the disk should be mutually exclusive with 'wipe after delete' - if you allow the guest to discard blocks at will, and trust the storage to actually wipe it, no sense in wiping, as theoretically those discarded blocks may already have been 'leaked' to others.

Comment 17 Idan Shaby 2016-07-14 05:32:00 UTC
That's partially true.
I am going to question the storage domain for his discard_zeroes_data value (in fact, an AND on all of its luns).
If it's true, we can use both of them together as the storage guarantees to deterministically return zeroes when a discarded area is read.
If it's false, then I am not going to allow to enable both of them together.
Thanks for noticing.

Comment 18 Idan Shaby 2016-08-16 05:08:55 UTC
Yaniv, this BZ shouldn't be on POST yet, as the patches are partial and were sent as drafts with no reviewers.
When I send the full patches for review, I'll move it back to POST.

Comment 20 Sandro Bonazzola 2016-12-12 13:59:02 UTC
The fix for this issue should be included in oVirt 4.1.0 beta 1 released on December 1st. If not included please move back to modified.

Comment 21 Raz Tamir 2016-12-12 15:12:57 UTC
Why is this ON_QA?

Comment 22 Sven Kieske 2017-01-04 14:01:13 UTC
Hi,

from what I understand, this should also work for local storage backed setups?

kind regards

Sven

Comment 23 Yaniv Lavi 2017-01-04 14:49:34 UTC
(In reply to Sven Kieske from comment #22)
> Hi,
> 
> from what I understand, this should also work for local storage backed
> setups?
> 
> kind regards
> 
> Sven



we don't support local FC and ISCSI. 
If you passthrough, it should work, but not tested due.

Comment 24 Idan Shaby 2017-01-05 08:46:52 UTC
A local storage domain is a file storage domain.
For file domains, the Pass Discard feature works on the basis of best effort, i.e it "...can be enabled for any disk on any file storage domain, and it will actually work only if the underlying file system and block device support it." (taken from the feature page [1]).
So generally it should work, but it was never tested for local storage AFAIK.


[1] http://www.ovirt.org/develop/release-management/features/storage/pass-discard-from-guest-to-underlying-storage/#file-storage-1

Comment 25 Kevin Alon Goldblatt 2017-02-09 15:44:54 UTC
Tested with the following code:
------------------------------------------
ovirt-engine-4.1.0.4-0.1.el7.noarch 
rhevm-4.1.0.4-0.1.el7.noarch
vdsm-4.19.4-1.el7ev.x86_64

Moving to VERIFIED!


Note You need to log in before you can comment on or make changes to this bug.