Bug 1057371 - Using LVM for nova instance causes of performance slowness on instance deletion
Summary: Using LVM for nova instance causes of performance slowness on instance deletion
Keywords:
Status: CLOSED DUPLICATE of bug 1062377
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 3.0
Hardware: All
OS: All
medium
low
Target Milestone: Upstream M3
: 5.0 (RHEL 7)
Assignee: Xavier Queralt
QA Contact: Ami Jeain
URL: https://blueprints.launchpad.net/nova...
Whiteboard: upstream_milestone_icehouse-3 upstrea...
: 1047421 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-01-23 22:30 UTC by Daniel Kwon
Modified: 2023-09-18 09:58 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-02-19 22:10:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1059744 1 None None None 2021-01-20 06:05:38 UTC

Internal Links: 1059744

Description Daniel Kwon 2014-01-23 22:30:34 UTC
Description of problem:

Here's the issue from customer:
Hi everyone, we've found an OpenStack+LVM+SSDs corner case with low I/O performance. I'd like to draw your attention to it; maybe someone knows of a workaround/fix for it.

We're currently running Grizzly with nova.conf libvirt_images_type=default, which has resulted in instances in qcow2 files in ext4 filesystems on SSDs.
We've found in the past that with this configuration, IO to/from the qcow2 (from either guest or host) is much slower than the SSD's theoretical performance.

Consequently we're investigating libvirt_images_type=lvm, with libvirt_images_volume_group set to a LV backed by the same SSDs.
We've found that this yields much better iops than the original libvirt_images_type=default. It seems eliminating the qcow layer significantly increases performance.

However, we've found that with libvirt_images_type=lvm, we have a different problem: deleting instances is slow.
It looks like this is a known problem:
https://blueprints.launchpad.net/nova/+spec/lvm-clear-option

Sure enough, we do see a lot of 'dd' processes during a parallel 'nova delete' operation:
# ps aux | grep /bin/dd
root      62982  1.8  0.0 106212  1680 ?        D    19:01 0:01 /bin/dd bs=1048576 if=/dev/zero of=/dev/novainstances/instance-000000de_disk seek=0 count=20480 oflag=direct
root      62991  1.9  0.0 106212  1680 ?        D    19:01 0:01 /bin/dd bs=1048576 if=/dev/zero of=/dev/novainstances/instance-000000e1_disk seek=0 count=20480 oflag=direct
root      62998  1.8  0.0 106212  1680 ?        D    19:01 0:01 /bin/dd bs=1048576 if=/dev/zero of=/dev/novainstances/instance-000000d2_disk seek=0 count=20480 oflag=direct
root      63013  1.9  0.0 106212  1684 ?        D    19:01 0:01 /bin/dd bs=1048576 if=/dev/zero of=/dev/novainstances/instance-000000cf_disk seek=0 count=20480 oflag=direct
root      63289  1.7  0.0 106212  1680 ?        D    19:02 0:01 /bin/dd bs=1048576 if=/dev/zero of=/dev/novainstances/instance-000000d5_disk seek=0 count=20480 oflag=direct
root      63292  1.7  0.0 106212  1680 ?        D    19:02 0:01 /bin/dd bs=1048576 if=/dev/zero of=/dev/novainstances/instance-000000d8_disk seek=0 count=20480 oflag=direct
root      63295  1.6  0.0 106212  1684 ?        D    19:02 0:01 /bin/dd bs=1048576 if=/dev/zero of=/dev/novainstances/instance-000000db_disk seek=0 count=20480 oflag=direct
root      63322  0.0  0.0 103248   860 pts/1    S+   19:03 0:00 grep /bin/dd
#

And it looks like this is due to Grizzly's /usr/lib/python2.6/site-packages/nova/virt/libvirt/utils.py which does an unconditional 'dd if=/dev/zero' of LVs upon deletion:

def clear_logical_volume(path):
     """Obfuscate the logical volume.

     :param path: logical volume path
     """
     # TODO(p-draigbrady): We currently overwrite with zeros
     # but we may want to make this configurable in future
     # for more or less security conscious setups.

I perceive multiple less-than-ideal aspects of this, most of them outside OpenStack itself, but impacting on it. Here's my list:

a. Implementing the TODO, ie adding a boolean option to nova.conf to disable LV erasure on delete, would solve the immediate performance problem. However it would introduce potential inter-tenant information disclosure when enabled;
b. With knowledge of the partitions/filesystems in use on the LV, a more efficient but still reasonably secure method could be used. For example overwrite all copies of the superblock and the journal with random garbage;
c. Best of all would be if the LV layer itself knew how to securely destroy its LVs (think "lvdestroy --secure"), not a userland 'dd' process running as root that somehow 'knows' the block device corresponding to the LV that it wants to securely erase;
d. Separately, when destroying LVs backed by SSDs, instead of zeroing blocks, the vendor-specific 'secure delete' facility should be used (most SSD firmwares offer one), or mass discard/trim/unmap commands.
'dd' shouldn't know directly about the block layer, so this seems to me to belong in the LVM layer too.

Questions:
- Can you advise us a way to avoid both the original qcow2 overhead and the new dd overhead?
- Which (if any) of the above ideas are worth advancing? How should we best do that?
- What've I totally missed here?


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Using LVM for the instance: libvirt_images_type=lvm
2. Deleting instance and check with ps | grep dd
3.

Actual results:
Because of dd operations for any deletions, it causes of slowness.


Expected results:
Faster deletion than 'dd'-ing

Additional info:

Comment 2 Matt 2014-01-23 23:51:04 UTC
What we'd like (at least in the short term) is an additional option in nova.conf which would give us a way to disable the dd before LV delete, analogous to the existing option for Cinder.

- It's sufficient for us if the option is just a boolean (at least for now);

- But it would be better if it did more than merely disable the dd, like the Cinder option already does.

It looks to me like the Cinder code could be factored out quite easily.
This would give the new option more of a raison d'etre than merely re-introducing vulns fixed years ago.
 
- The help text for the new option should mention CVE-2012-5625

This is the vulnerability which would be re-introduced if there were no zeroing nor shredding of the LV before deletion. This vuln was the original motivation for addition of the dd.
See:
http://lists.openstack.org/pipermail/openstack-announce/2012-December/000059.html

- The new option should be as consistent as possible with the existing analogous cinder.conf option: volume_clear=none|zero|shred

Longer term, I personally feel it's the LVM layer's responsibility to delete LVs securely upon request from userspace, not OpenStack's responsibility. I'm imagining something like "lvremove --zero", analogous to the existing "lvcreate --zero".

Comment 5 Daniel Berrangé 2014-01-24 13:02:14 UTC
Per this upstream bug https://review.openstack.org/#/c/68507/ we believe the right solution here is not to have an option to the 'dd' step, but to ensure that thin-provisioned LVM can be used. Our current understanding from talking to LVM devs, is that with thin provisioned LVM the kernel guarantees we will see zero-filled blocks when sectors in the volume become allocated. A nova setup using thin provisioned LVM would therefore be able to skip the "dd" step. The key is that Nova would automatically do the right thing - if thick provisioning was used, it would 'dd', if thin provisioning was used it would skip the dd. This ought to allow for the benefits over using LVM over qcow2, without the performance penalty and without re-introducing the serious security flaw that the 'dd' step addressed.

Comment 9 Stephen Gordon 2014-01-28 19:38:30 UTC
*** Bug 1047421 has been marked as a duplicate of this bug. ***

Comment 10 Matt 2014-01-30 03:58:46 UTC
I understand your concerns about giving OS admins the ability to re-introduce the security flaw, although personally, I'm used to being root and possessing many loaded guns (dd if=/dev/zero of=/dev/sda, fdisk, rm -rf /, cat /home/someguy/.ssh/id_rsa, etc) and consequently being really, really careful what I do with them :-)

I understand that thin LVs would help us. However, we don't want to over-provision storage of any sort, ever. Is it possible to have thin LVs that are really fat?
My reading of 'man lvcreate' didn't shed any light on this, but I think I'm not understanding the interaction between --size and --virtualsize.

Regarding our expectations, we had previously hoped to have a fix for this issue backported to Havana, so that we could deploy Havana with LVM and without dd. It's my understanding that allowing disablement of the dd, while arguably unwise, would have been a small, unobtrusive change which could have made it into Havana.

Now that you're all intending to use thin provisioning, I understand that that's a larger change which couldn't be backported to Havana, and I see it's currently targeted at Icehouse. There's a possibility that we'll skip Havana and go straight from Grizzly to Icehouse (our discussions on this are ongoing). If we do go straight to Icehouse, and if a fix makes it into Icehouse, then our next RHOS deployment could use LVM instead of QCOW2, which would satisfy us.

What would definitely not be acceptable to us is if the fix (whatever it ends up being) were to slip to Juno. That would mean that we wouldn't get a fix for this issue for some time, and would give us undesired motivation to skip Icehouse as well as Havana.

Comment 11 Xavier Queralt 2014-02-19 22:10:12 UTC

*** This bug has been marked as a duplicate of bug 1062377 ***

Comment 12 Pádraig Brady 2014-02-19 22:54:22 UTC
There is a related bug 1059744 discussing performance issues with KVM images _within_ file systems, slowness of which is the main reason for using LVM based images.

Summarizing... To speed up file system based images, there are two things worth trying:

1. If using qcow images, creating them with preallocation=metadata helps significantly as benchmarked here:
  https://blueprints.launchpad.net/nova/+spec/preallocated-images
Note that is incompat currently, with CoW from a base image to instance images.

2. Currently, xfs may be significantly faster than ext4
due to more efficient handling of larger I/O sizes


Note You need to log in before you can comment on or make changes to this bug.