Bug 981626 (send_discard_when_deleting_disks)

Summary:	[RFE] Send discard when deleting virtual disks from block based storage domain to regain space in thin provisioned storage
Product:	Red Hat Enterprise Virtualization Manager	Reporter:	Julio Entrena Perez <jentrena>
Component:	RFEs	Assignee:	Idan Shaby <ishaby>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Elad <ebenahar>
Severity:	high	Docs Contact:
Priority:	high
Version:	3.2.0	CC:	acanan, adevolder, amarchuk, amureini, anande, aperotti, ebenahar, fsimonce, gscott, iheim, ishaby, jentrena, kgoldbla, lpeer, mgoldboi, mtessun, nicolas, obockows, pablo.iranzo, pdwyer, perobins, rbalakri, redhat, scohen, sherold, sputhenp, tnisan, trichard, ykaul, ylavi
Target Milestone:	ovirt-4.0.4	Keywords:	FutureFeature, Reopened
Target Release:	---	Flags:	sherold: Triaged+
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Enhancement
Doc Text:	This feature adds a new parameter to the VDSM config file: "discard_enable", which is false by default. When changed to true for a specific host (in /etc/vdsm/vdsm.conf), for each disk image/snapshot that is going to be removed, that host will issue a "blkdiscard" call right before removing it in order to free the used space from the thinly-provisioned LUN that the storage domain is built on. If the underline storage does not support discard operations, the command will fail with a log but will not break the disk/snapshot removal.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-10-17 13:10:39 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	Storage	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1342919

Comment 1 Sadique Puthen 2013-07-12 13:16:10 UTC

Adding some details from case:

AFAIK, lvm already has this feature where lvremove has a flag to discard blocks.

This bug https://bugzilla.redhat.com/show_bug.cgi?id=651493 added discard support to lvm2.

Default on rhev-h lvm.conf is

    # 1 enables; 0 disables.
    issue_discards = 0

So RHEV just need to implement it to make this possible.

Comment 10 Federico Simoncelli 2014-10-08 14:05:45 UTC

To use the lvm issue_discards option we should make sure that:

1. lvm is issuing the discard ioctl *before* deallocating the segments of the lv (put it differently: it sends the discard ioctl before removing the lv from the mda)

2. lvm is waiting for all the ioctl to complete before removing the lv from the mda

3. the ioctl is synchronous, the kernel is non caching the requests: from the client point of view once the lv is removed from the mda there shouldn't be any pending io to the old segments (not even discards)

4. depending on the discard granularity this operation can take a long time (a storage server round-trip per discard granularity of the lv size), implication in vdsm is that to a certain extent this could become a long operation.

5. it seems that during lvremove with discard it's not possible to execute other lvm commands on the same host (write-lock) for the running time: this would block the normal vdsm operation


Thinking more about the issue_discards use case, it is more similar to the postZero operation we take in the deleteImage flow (more than lvremove): in fact in both cases the previous data will become unrecoverable (also for security reasons).

I think the best implementation would be using blkdiscard(8) over the activated lv, in fact:

- it maps 1:1 to the postZero operation (they both need to activate the lv, then dd takes much more time, but also blkdiscard takes some time to complete)

- the lv device map will pass the discard ioctl down to the pvs

- once the blkdiscard operation is terminated it is possible to quickly remove (no additional time required) the lv as before (issue_discards=0) 

- there's no fear of pending ioctl cached on the client side as once blkdiscard is finished we deactivate the lv (no other io allowed)

- blkdiscard fails explicitly (instead issue_discards fails silently)


VDSM has enough information to decide whether to use dd or blkdiscard to postZero a volume, question is if we want to make it automatic or if we want to configure it on the engine side.

Comment 11 Federico Simoncelli 2014-10-08 22:04:30 UTC

Given comment 10 this bug is very similar to bug 1069557.

Comment 15 Yaniv Lavi 2015-10-14 09:29:08 UTC

*** Bug 1226977 has been marked as a duplicate of this bug. ***

Comment 16 Nicolas Ecarnot 2016-02-19 15:39:23 UTC

Hi Allon,

I'm not sure to understand why this is targeted to 4.0.0, as this seems quite far.
May you precise what is preventing from being shorter?

Thank you.

Comment 17 Allon Mureinik 2016-02-21 11:54:32 UTC

(In reply to Nicolas Ecarnot from comment #16)
> Hi Allon,
> 
> I'm not sure to understand why this is targeted to 4.0.0, as this seems
> quite far.
> May you precise what is preventing from being shorter?
> 
> Thank you.
It is hardly targeted - notice the conditional devel nack.
Once (if?) we have a sensible design for this, we can start considering timelines.

Comment 18 Julio Entrena Perez 2016-02-22 09:23:25 UTC

As per comment 4 setting LVM's "issue_discards = 1" should do.

Comment 19 Allon Mureinik 2016-02-22 09:47:19 UTC

(In reply to Julio Entrena Perez from comment #18)
> As per comment 4 setting LVM's "issue_discards = 1" should do.
See the rest of the discussion. This a partial solution at best.

Comment 20 Yaniv Kaul 2016-03-02 17:04:21 UTC

Dup of bug 872530 ?

Comment 21 Allon Mureinik 2016-03-06 11:59:38 UTC

(In reply to Yaniv Kaul from comment #20)
> Dup of bug 872530 ?
Nope.
Bug 872530 is about a disk that's being removed from the system. This one is about a live guest removing files from its filesystem and passing down the discard to the storage.

Comment 22 Allon Mureinik 2016-03-06 12:03:20 UTC

(In reply to Allon Mureinik from comment #21)
> (In reply to Yaniv Kaul from comment #20)
> > Dup of bug 872530 ?
> Nope.
> Bug 872530 is about a disk that's being removed from the system. This one is
> about a live guest removing files from its filesystem and passing down the
> discard to the storage.

Sorry, too many open windows. If this isn't a dup it's at the very lest closely related.

Comment 25 Yaniv Kaul 2016-04-26 14:23:18 UTC


*** This bug has been marked as a duplicate of bug 872530 ***

Comment 28 Idan Shaby 2016-05-02 08:53:17 UTC

Hi Yaniv,

IMHO, this bug is not a duplicate of bug 872530.

Bug 872530 is about the way we wipe disks.
This bug is about whether to issue a discard call or not after a disk removal.
Note that we don't wipe every disk. We wipe a disk (on a block domain, of course) only if it is marked as "wipe after delete".

The plan was to add the vdsm config file an option to issue a discard call for a removed disk (and in the future, also in the new/edit disk window, right beneath the "wipe after delete" check box, for example).

And as for bug 872530, the plan is to support new wipe methods for removed disks (might also be in the config file).

Of course that if the disk has the "wipe after delete" option enabled, and the user chose (as planned for bug 872530) to issue "discard -z" on a removed disk instead of dd, another discard call won't be performed.

Therefore, I think we should reopen this bug.

Comment 29 Yaniv Kaul 2016-05-02 20:01:25 UTC

Idan is right. Forgot that one can just simply discard to re-gain allocated space.

I wonder why not send discard always - why do we need a checkbox for it?

Comment 30 Idan Shaby 2016-05-03 05:10:39 UTC

That's because sending discard has performance implications.

From qemu's change log[1]: "'discard' commands...can cause performance degradation and fragmentation".

From libvirt's documentation[2]: "The guest time may be delayed, unless the OS has explicit handling of lost ticks".

So by default it should be false, to keep the same performance that we had previously.

[1] http://wiki.qemu.org/ChangeLog/1.5 (under the System emulation -> Block devices)
[2] https://libvirt.org/formatdomain.html (under Time keeping -> discard)

Comment 31 Yaniv Kaul 2016-05-04 14:38:32 UTC

(In reply to Idan Shaby from comment #30)
> That's because sending discard has performance implications.

That's a thing of the past in most scenarios, for two reasons:
1. Today's enterprise storages are VERY efficient in discarding. Those are not the plain old SSDs which used to do very inefficient garbage collection when discarding.
2. The discard from the guest is now rarely done in realtime. In Linux, 'fstrim' would be set on a cron job (as opposed to mounting the filesystem - XFS or EXT4 - with discard on)

In any case, this SPECIFIC RFE is about discarding when the disk is removed - which won't hurt the performance of the VM most likely.
I do agree it needs an on/off switch, possibly per storage domain. We can probably keep it off by default.

> 
> From qemu's change log[1]: "'discard' commands...can cause performance
> degradation and fragmentation".
> 
> From libvirt's documentation[2]: "The guest time may be delayed, unless the
> OS has explicit handling of lost ticks".
> 
> So by default it should be false, to keep the same performance that we had
> previously.
> 
> [1] http://wiki.qemu.org/ChangeLog/1.5 (under the System emulation -> Block
> devices)
> [2] https://libvirt.org/formatdomain.html (under Time keeping -> discard)

Comment 33 Yaniv Kaul 2016-05-26 12:20:16 UTC

*** Bug 1321608 has been marked as a duplicate of this bug. ***

Comment 34 Greg Scott 2016-06-05 12:16:49 UTC

> This feature adds a new parameter to the vdsm config file - "discard_enable", which is false by default.
>
> When changed to true for a specific host, for each disk/snapshot that is going to be removed, that host will issue a "blkdiscard" call right before removing it in order to free the used space from the thinly provisioned lun that the storage domain is built on.

*************

Let's say I have a cluster(s) with lots of fiberchannel hosts.  I get rid of a VM, which deletes its virtual disk.  Does every host in the data center do the appropriate LVM commands, and then I set discard_enable = true on one host so it tells the underlying storage about it?  Or do I need discard_enable = true on the SPM host, which means I need it everywhere since I won't know in advance which host takes the SPM role?

Comment 35 Idan Shaby 2016-06-05 12:45:54 UTC

Hi Greg, the second option is the right one - you need discard_enable = true on the SPM host, which means that you need it everywhere since you don't know in advance which host takes the SPM role.

Anyway, this feature is merged only on master, so that it will be available only on 4.0 for now.

Comment 36 Greg Scott 2016-06-05 14:44:19 UTC

OK, thanks.  If it's off by default, would it be possible to set up a choice at install time to set it?  Or some way to persist vdsm.conf so admins can set it once and it's set everywhere in the datacenter after that?

thanks

Comment 37 Yaniv Kaul 2016-06-06 03:09:33 UTC

(In reply to Greg Scott from comment #36)
> OK, thanks.  If it's off by default, would it be possible to set up a choice
> at install time to set it?  Or some way to persist vdsm.conf so admins can
> set it once and it's set everywhere in the datacenter after that?

In 4.0, you can configure (and persist) vdsm.conf via Cockpit (on RHEVH NGN).
In 4.1, we'll make it more granular (per storage domain, per disk) - similar to wipe-after-delete setting. Idan - please make sure there's a 4.1 RFE for this. 
> 
> thanks

Comment 38 Idan Shaby 2016-06-06 06:40:29 UTC

Please see bug 1342919 .

Comment 39 Yaniv Lavi 2016-07-31 10:26:27 UTC

Can you please open a test only ticket on 4.0 describing how to enable discard, so QE can try it out? We have customer looking to use this feature with 4.0.

Comment 40 Allon Mureinik 2016-08-03 10:22:05 UTC

(In reply to Yaniv Dary from comment #39)
> Can you please open a test only ticket on 4.0 describing how to enable
> discard, so QE can try it out? We have customer looking to use this feature
> with 4.0.
All the info is in the doctext, why do we need yet another bug?

Comment 41 Yaniv Lavi 2016-08-10 12:00:05 UTC

(In reply to Allon Mureinik from comment #40)
> (In reply to Yaniv Dary from comment #39)
> > Can you please open a test only ticket on 4.0 describing how to enable
> > discard, so QE can try it out? We have customer looking to use this feature
> > with 4.0.
> All the info is in the doctext, why do we need yet another bug?

We need a ticket to describe what to test that we can target to 4.0.z.

Comment 42 Allon Mureinik 2016-08-15 15:10:53 UTC

I'll retarget. THIS bz represents the 4.0 offering. The full-fledged feature, including GUI etc is tracked by bug 1342919.

Comment 43 Elad 2016-09-04 09:23:55 UTC

Tested the following:

1. On the storage server, created a 30G thin LUN. Exposed the LUN to the hyper-visors
2. Created a VM and installed RHEL OS.
3. Created, attached and activated a direct LUN with virtIO-SCSI interface to the VM. Checked 'Enable SCSI Pass-Through', in the direct LUN attachment phase.
4. Created an ext4 file system on the attached direct LUN disk. Added it to /etc/fstab and specified 'discard' instead of 'defaults': 

/dev/sdX                    /mount_point                   ext4    discard        0 0

And executed 'mount -a'
	
5. Created a 5G file on the new disk.
6. Examined the actual LUN size on the storage server for the exposed LUN.
7. On the VM, deleted the 5G file.
8. Once the file got deleted, executed 'sync' command on the guest.

Results:
LUN actual size, in the storage server decreased back to the size it was before the file creation



==========================
Used:
vdsm-4.18.12-1.el7ev.x86_64
rhevm-4.0.4-0.1.el7ev.noarch
XtremIO storage
RHEL7.2 guest

Comment 45 Yaniv Kaul 2016-09-04 09:36:40 UTC

(In reply to Elad from comment #43)
> Tested the following:
> 
> 1. On the storage server, created a 30G thin LUN. Exposed the LUN to the
> hyper-visors
> 2. Created a VM and installed RHEL OS.
> 3. Created, attached and activated a direct LUN with virtIO-SCSI interface
> to the VM. Checked 'Enable SCSI Pass-Through', in the direct LUN attachment
> phase.
> 4. Created an ext4 file system on the attached direct LUN disk. Added it to
> /etc/fstab and specified 'discard' instead of 'defaults': 
> 
> /dev/sdX                    /mount_point                   ext4    discard  
> 0 0
> 
> And executed 'mount -a'
> 	
> 5. Created a 5G file on the new disk.
> 6. Examined the actual LUN size on the storage server for the exposed LUN.
> 7. On the VM, deleted the 5G file.
> 8. Once the file got deleted, executed 'sync' command on the guest.
> 
> Results:
> LUN actual size, in the storage server decreased back to the size it was
> before the file creation
> 
> 
> 
> ==========================
> Used:
> vdsm-4.18.12-1.el7ev.x86_64
> rhevm-4.0.4-0.1.el7ev.noarch
> XtremIO storage
> RHEL7.2 guest

This is not the right test. The test should be per this feature - send discard when the virtual disk is DELETED, not when contents within the virtual disk are (i.e., passthrough of the DISCARD command from the guest all the way to the storage.

Comment 46 Elad 2016-09-04 10:35:07 UTC

I mistakenly tested discard for direct LUN. Will test for virtual disk

Comment 47 Idan Shaby 2016-09-04 10:39:53 UTC

Also note that this virtual disk should be an image, not a direct lun.

Comment 48 Elad 2016-09-04 13:57:02 UTC

1) Added 'discard_enable = true' to vdsm.conf and restarted vdsmd service 
2) Created a block domain based on a thin LUN on a discard supported storage server
3) Created a virtual disk on the block domain and attached it to a RHEL guest
4) Checked the actual LUN size on the storage server for later comparison
5) Wrote to virtual the disk from the guest. LUN actual size increased
6) Deleted the virtual disk

Result: LUN actual size got reduced to the size it was before virtual disk creation.

Tested with FC and iSCSI

==========================
Used:
vdsm-4.18.12-1.el7ev.x86_64
rhevm-4.0.4-0.1.el7ev.noarch
XtremIO storage
RHEL7.2 guest

Comment 50 Yaniv Kaul 2016-09-04 14:10:44 UTC

Elad - did you test with qcow2, raw, with extended disk, with multiple LUNs in the SD, with 1TB disk, multiple disk deletions, the impact on the storage (performance, bandwidth in iSCSI and FC), etc.?

Comment 51 Elad 2016-09-04 14:17:08 UTC

I tested the basic scenario here:
A raw preallocated virtual disk, not extended, single LUN in SD (35G for iSCSI and 80G for FC), a single disk deletion. 

Did not examine the impact on the storage, but other than the deletion itself, which impact should be tested here?

Comment 52 Yaniv Kaul 2016-09-04 14:20:15 UTC

(In reply to Elad from comment #51)
> I tested the basic scenario here:
> A raw preallocated virtual disk, not extended, single LUN in SD (35G for
> iSCSI and 80G for FC), a single disk deletion. 
> 
> Did not examine the impact on the storage, but other than the deletion
> itself, which impact should be tested here?

1. That the SPM doesn't choke.
2. That the interface to the storage (FC or iSCSI) doesn't choke. Compare it with the same command using the old 'dd' - how much bandwidth it's taking.

Comment 53 Idan Shaby 2016-09-04 14:23:48 UTC

(In reply to Yaniv Kaul from comment #52)
> (In reply to Elad from comment #51)
> > I tested the basic scenario here:
> > A raw preallocated virtual disk, not extended, single LUN in SD (35G for
> > iSCSI and 80G for FC), a single disk deletion. 
> > 
> > Did not examine the impact on the storage, but other than the deletion
> > itself, which impact should be tested here?
> 
> 1. That the SPM doesn't choke.
> 2. That the interface to the storage (FC or iSCSI) doesn't choke. Compare it
> with the same command using the old 'dd' - how much bandwidth it's taking.

Why comparing to 'dd'? We are not wiping the disk here. We are just discarding the lv.

Comment 54 Yaniv Kaul 2016-09-04 14:31:40 UTC

(In reply to Idan Shaby from comment #53)
> (In reply to Yaniv Kaul from comment #52)
> > (In reply to Elad from comment #51)
> > > I tested the basic scenario here:
> > > A raw preallocated virtual disk, not extended, single LUN in SD (35G for
> > > iSCSI and 80G for FC), a single disk deletion. 
> > > 
> > > Did not examine the impact on the storage, but other than the deletion
> > > itself, which impact should be tested here?
> > 
> > 1. That the SPM doesn't choke.
> > 2. That the interface to the storage (FC or iSCSI) doesn't choke. Compare it
> > with the same command using the old 'dd' - how much bandwidth it's taking.
> 
> Why comparing to 'dd'? We are not wiping the disk here. We are just
> discarding the lv.

There's nothing else to compare to...

Comment 55 Idan Shaby 2016-09-04 14:41:00 UTC

(In reply to Yaniv Kaul from comment #54)
> (In reply to Idan Shaby from comment #53)
> > (In reply to Yaniv Kaul from comment #52)
> > > (In reply to Elad from comment #51)
> > > > I tested the basic scenario here:
> > > > A raw preallocated virtual disk, not extended, single LUN in SD (35G for
> > > > iSCSI and 80G for FC), a single disk deletion. 
> > > > 
> > > > Did not examine the impact on the storage, but other than the deletion
> > > > itself, which impact should be tested here?
> > > 
> > > 1. That the SPM doesn't choke.
> > > 2. That the interface to the storage (FC or iSCSI) doesn't choke. Compare it
> > > with the same command using the old 'dd' - how much bandwidth it's taking.
> > 
> > Why comparing to 'dd'? We are not wiping the disk here. We are just
> > discarding the lv.
> 
> There's nothing else to compare to...

Right, but previously we didn't do anything instead of discarding the lv, so I am not sure if we can gain anything from comparing it to a very long process like "dd".
IMHO, we should not compare it to anything. But I agree that it's a good idea to inspect everything else that you mentioned (performance etc.).

Comment 56 Kevin Alon Goldblatt 2016-09-25 15:04:48 UTC

Tested with the following code:
----------------------------------------
rhevm-4.0.4.2-0.1.el7ev.noarch
vdsm-4.18.13-1.el7ev.x86_64


Tested with the following scenario:

Steps to Reproduce:
Tested according to Test Plan below. Tests included
Basic discard disk with Thin disk
Basic discard disk with Preallocated disk
Discard disk of multiple disks on same SD
Discard disk on SD comprised of multiple LUNs
Discard disk after extending LUN, SD and Disk
Discard disk and Performance
Discard disk failed discard operation


https://polarion.engineering.redhat.com/polarion/#/project/RHEVM3/wiki/Storage_4_0/4_1_Storage_Support%20discard%20when%20deleting%20virtual%20disks%20from%20block%20based%20storage%20domains

Actual results:
All operations were successfull.

Expected results:




Moving to VERIFIED!

Comment 57 Yaniv Kaul 2016-09-25 15:21:47 UTC

Kevin - did you actually see the benefit, that the discard freed up space on the storage?

Comment 58 Nicolas Ecarnot 2016-10-04 07:17:57 UTC

(In reply to Kevin Alon Goldblatt from comment #56)
> Tested with the following code:
> ----------------------------------------
> rhevm-4.0.4.2-0.1.el7ev.noarch
> vdsm-4.18.13-1.el7ev.x86_64
> 
> 
> Tested with the following scenario:
> 
> Steps to Reproduce:
> Tested according to Test Plan below. Tests included
> Basic discard disk with Thin disk
> Basic discard disk with Preallocated disk
> Discard disk of multiple disks on same SD
> Discard disk on SD comprised of multiple LUNs
> Discard disk after extending LUN, SD and Disk
> Discard disk and Performance
> Discard disk failed discard operation
> 
> 
> https://polarion.engineering.redhat.com/polarion/#/project/RHEVM3/wiki/
> Storage_4_0/
> 4_1_Storage_Support%20discard%20when%20deleting%20virtual%20disks%20from%20bl
> ock%20based%20storage%20domains
> 
> Actual results:
> All operations were successfull.
> 
> Expected results:
> 
> 
> 
> 
> Moving to VERIFIED!

Hello,

For mere users like me, waiting for this feature for years [1], it would be nice if we could have an access to the tests results (polarion...).

On our block storage, we don't mind security considerations, so every task related to conscientiously wipe out every bit of a LV before deleting is not as important to us as what Idan tested in the first place, ie. make sure the block storage thin LUN is showing its available space back after LV deletion.

[1] http://lists.ovirt.org/pipermail/users/2014-November/029595.html

Comment 59 Idan Shaby 2016-10-05 10:38:32 UTC

Aharon, I am not sure if and what we can expose here -> your call.

Nicolas, if security is not in your priority, I would use wipe after delete = false for my disks and discard_enable = true in vdsm conf file, for now.
In the future we will make it possible to configure discard per storage domains [1].
For more info, please refer to this [2] feature page.

[1] Bug 1342919 - (configure_discard_per_domain_and_disk) [RFE] Make discard configurable by a storage domain rather than a host
[2] http://www.ovirt.org/develop/release-management/features/storage/discard-after-delete/

Comment 60 Kevin Alon Goldblatt 2016-11-29 13:04:14 UTC

(In reply to Yaniv Kaul from comment #57)
> Kevin - did you actually see the benefit, that the discard freed up space on
> the storage?
Yes I confirmed via the Storage Provider (XtremIO that the space-in-use on the thinly provisioned LUN is reclaimed after the each of the discard operations