Bug 1367806

Summary: [RFE] - Add "blkdiscard" as a new method to zero volumes
Product: [oVirt] vdsm Reporter: Idan Shaby <ishaby>
Component: RFEsAssignee: Idan Shaby <ishaby>
Status: CLOSED CURRENTRELEASE QA Contact: Elad <ebenahar>
Severity: high Docs Contact:
Priority: medium    
Version: 4.18.15CC: amureini, apinnick, bugs, ishaby, ratamir, ykaul, ylavi
Target Milestone: ovirt-4.2.0Keywords: FutureFeature
Target Release: 4.20.9Flags: rule-engine: ovirt-4.2+
ratamir: testing_plan_complete-
ylavi: planning_ack+
rule-engine: devel_ack+
ratamir: testing_ack+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
A new VDSM parameter enables a host to remove a disk/snapshot on block storage, where "Wipe After Delete" is enabled, in much less time than the "dd" command, especially if the storage supports "Write same."
Story Points: ---
Clone Of:
: 1475780 (view as bug list) Environment:
Last Closed: 2017-12-20 10:53:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1327886    
Bug Blocks: 872530, 1314382, 1475780, 1487151    

Description Idan Shaby 2016-08-17 14:16:49 UTC
Description of problem:
Today, vdsm uses the "dd" linux command to wipe volumes.

The problem with using "dd" to wipe volumes is that it is very slow (~ 7 minutes to wipe a 10GB volume on netapp in my environment).
To zero volumes more efficiently, vdsm can use the "blkdiscard" command from the util-linux package, which can run up to ~ 10 times faster.

Version-Release number of selected component (if applicable):
7cf1dbe1b669e9dab203b33baae34192bf01e114

Steps to Reproduce:
1. Create a disk on a block storage domain.
2. Set its "Wipe After Delete" property to true.
3. Remove the disk and see in the vdsm log that it is done very slow.
You can do that by calculating the time that passes between the log message "Zero volume thread started for volume <volume_id>" and the log message "Zero volume <volume_id> task <task_id> completed".

Actual results:
In my environment it takes ~ 7 minutes to wipe a 10GB disk.

Expected results:
Should be quicker, if it's possible.

Additional info:
Calling "blkdiscard -z <block_device>" should work at least fast as "dd", and up to ~ 10 times faster, as it calls "write same" if the device supports it.

Comment 3 Idan Shaby 2016-11-07 06:12:42 UTC
No, I tried to figure out what we should do so that won't happen, but we decided that we should be done with bug 1241106 first, and then come back to this one.
The last thing I saw was that the command failed with timeouts only when I ran it from vdsm. I guess that we should run it with higher priority or maybe run it in a different way (not like we run dd today).
Anyway, right now there's no need for a bug.

Comment 4 Idan Shaby 2017-07-27 10:44:59 UTC
Bug 1475780 was opened to switch the default zero method to "blkdiscard".

Comment 5 Raz Tamir 2017-08-31 09:52:58 UTC
Tested on ovirt-engine-4.2.0-0.0.master.20170828065003.git0619c76.el7.centos

wipe after delete took 1.5 minutes for a 10GB disk on a block domain

Is there any change to move the messages
Zero volume thread started for volume <VOL_ID>
and
Zero volume thread finished for volume <VOL_ID>

to be on INFO logger level or should I open a new bug for that request?

Comment 6 Allon Mureinik 2017-08-31 09:58:50 UTC
(In reply to Raz Tamir from comment #5)
> Tested on ovirt-engine-4.2.0-0.0.master.20170828065003.git0619c76.el7.centos
> 
> wipe after delete took 1.5 minutes for a 10GB disk on a block domain
On the same setup/storage, how long does it take to delete a 10GB disk with the "old" method?

> 
> Is there any change to move the messages
> Zero volume thread started for volume <VOL_ID>
> and
> Zero volume thread finished for volume <VOL_ID>
> 
> to be on INFO logger level or should I open a new bug for that request?
Let's have a new BZ for this please

Comment 7 Raz Tamir 2017-08-31 10:36:00 UTC
(In reply to Allon Mureinik from comment #6)
> (In reply to Raz Tamir from comment #5)
> > Tested on ovirt-engine-4.2.0-0.0.master.20170828065003.git0619c76.el7.centos
> > 
> > wipe after delete took 1.5 minutes for a 10GB disk on a block domain
> On the same setup/storage, how long does it take to delete a 10GB disk with
> the "old" method?
~ 5 minutes
> 
> > 
> > Is there any change to move the messages
> > Zero volume thread started for volume <VOL_ID>
> > and
> > Zero volume thread finished for volume <VOL_ID>
> > 
> > to be on INFO logger level or should I open a new bug for that request?
> Let's have a new BZ for this please
https://bugzilla.redhat.com/show_bug.cgi?id=1487151

Comment 8 Allon Mureinik 2017-08-31 11:45:10 UTC
(In reply to Raz Tamir from comment #7)
> (In reply to Allon Mureinik from comment #6)
> > (In reply to Raz Tamir from comment #5)
> > > Tested on ovirt-engine-4.2.0-0.0.master.20170828065003.git0619c76.el7.centos
> > > 
> > > wipe after delete took 1.5 minutes for a 10GB disk on a block domain
> > On the same setup/storage, how long does it take to delete a 10GB disk with
> > the "old" method?
> ~ 5 minutes
A x3 improvement for a 10GB disk? I'll take that!

> > 
> > > 
> > > Is there any change to move the messages
> > > Zero volume thread started for volume <VOL_ID>
> > > and
> > > Zero volume thread finished for volume <VOL_ID>
> > > 
> > > to be on INFO logger level or should I open a new bug for that request?
> > Let's have a new BZ for this please
> https://bugzilla.redhat.com/show_bug.cgi?id=1487151
Thanks!

Comment 9 Yaniv Kaul 2017-08-31 20:49:28 UTC
(In reply to Raz Tamir from comment #7)
> (In reply to Allon Mureinik from comment #6)
> > (In reply to Raz Tamir from comment #5)
> > > Tested on ovirt-engine-4.2.0-0.0.master.20170828065003.git0619c76.el7.centos
> > > 
> > > wipe after delete took 1.5 minutes for a 10GB disk on a block domain

That's a bit slow - ~113MBps for discard?

> > On the same setup/storage, how long does it take to delete a 10GB disk with
> > the "old" method?
> ~ 5 minutes

And that's VERY slow! 34MBps?!?!

Something wrong with that storage. Or my math.

My laptop (SSD, 16.5G):
[ykaul@ykaul sosreport-dvrhvm01.cbec.gov.in-20170831074327]$ time sudo blkdiscard --zero /dev/sda3

real	1m47.121s
user	0m0.005s
sys	0m0.167s


But:
[ykaul@ykaul sosreport-dvrhvm01.cbec.gov.in-20170831074327]$ time sudo blkdiscard  /dev/sda3
[sudo] password for ykaul: 

real	0m4.236s
user	0m0.019s
sys	0m0.017s

So perhaps my SSD doesn't support write_same? 

[ykaul@ykaul sosreport-dvrhvm01.cbec.gov.in-20170831074327]$ sudo sg_inq -p 0xb0 /dev/sda
VPD INQUIRY: Block limits page (SBC)
  Maximum compare and write length: 0 blocks
  Optimal transfer length granularity: 1 blocks
  Maximum transfer length: 0 blocks
  Optimal transfer length: 0 blocks
  Maximum prefetch transfer length: 0 blocks
  Maximum unmap LBA count: 0
  Maximum unmap block descriptor count: 0
  Optimal unmap granularity: 1
  Unmap granularity alignment valid: 0
  Unmap granularity alignment: 0
  Maximum write same length: 0x3fffc0 blocks
  Maximum atomic transfer length: 0
  Atomic alignment: 0
  Atomic transfer length granularity: 0


> > 
> > > 
> > > Is there any change to move the messages
> > > Zero volume thread started for volume <VOL_ID>
> > > and
> > > Zero volume thread finished for volume <VOL_ID>
> > > 
> > > to be on INFO logger level or should I open a new bug for that request?
> > Let's have a new BZ for this please
> https://bugzilla.redhat.com/show_bug.cgi?id=1487151

Comment 10 Sandro Bonazzola 2017-12-20 10:53:54 UTC
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.