Bug 1383014
Summary: | nova instance performance issues while using ceph backend | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | VIKRANT <vaggarwa> |
Component: | ceph | Assignee: | Ben England <bengland> |
Status: | CLOSED NOTABUG | QA Contact: | Warren <wusui> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 8.0 (Liberty) | CC: | abond, bengland, dshaks, dwilson, jdurgin, jharriga, jomurphy, jtaleric, lhh, myllynen, nlevine, rsibley, rsussman, srevivo, twilkins, vaggarwa |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | 10.0 (Newton) | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-01-18 21:17:11 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
VIKRANT
2016-10-09 06:36:03 UTC
Performed test on RHEL 7 setup : Spawned two instances , one using qcow2 image and second using raw image. Raw image created parent child relation. ~~~ # rbd -p images children 04aad515-1fb8-4f79-8838-71d38dabba1f@snap vms/ca68fce0-76d2-4f85-b886-e4e5e02ccbff_disk ~~~ Now while using dd command to perform the test I can see that instance spawned from qcow2 image is giving twice more performance than instance spawned from raw image. I had thought that Ceph with Nova requires use of raw image, correct? Because Ceph is doing the "backing image" in RBD that formerly was done using qcow2, correct? But evidently not so. Is Ceph imposing some sort of copy-on-write overhead for the Nova image that it doesn't need to impose? For example, is it reading from the snapshot the 4-KB filesystem blocks that dd is writing to? It is necessary to read the backing image if you insert data into the middle of a block, but if you are writing out the entire block, in theory it should be unnecessary to read the block from the snapshot first - since it doesn't matter what was stored there before. As the post suggests, we should be able to perform this test with RBD volume backed with a snapshot vs RBD volume not backed by a snapshot, using librbd engine in fio, and see if it's something to do with use of RBD snapshots. If you do this same test to a Cinder volume, which is not backed by a snapshot, what do you get? Another interesting test would be to repeat the dd test again, on the same exact file, using "conv=notrunc", so you would be writing to the same physical blocks in storage. This would be a "re-write" test. There should be no copy-on-write overhead at this point because the Nova image has already diverged from the backing snapshot. Note that if the above hypothesis is correct about copy-on-write overhead, then the qcow2 image much smaller than the raw image (i.e. sparse?), so there is less reading to do, so this might explain the difference in performance perhaps? Strangely I have not seen the two fold difference this time. here the test results from OSP 7 setup. Commands used : # dd if=/dev/zero of=file1 bs=1024k count=1024 conv=fdatasync # dd if=/dev/zero of=file1 bs=1024k count=1024 conv=notrunc ------------------------------------ conv | qcow2 image | raw image | ----------------------------------- fdatasync| 104 MB/s | 85.8 MB/s| ------------------------------------ notrucn | 158 MB/s | 136 MB/s | ----------------------------------- Thanks, can you try "conv=fdatasync,notrunc" ? fdatasync is important because otherwise the data may not have reached persistent storage. Hello, I am facing some issue with test setup. I will be sure to update you once the setup is functional again. Sorry for delayed response. It's really difficult to get hold off of physical setup : This time I have used different HW with OSP 10 setup for re-producing the issue: Step 1 : Spawned two instances, one using qcow2 and other using raw image. ~~~ [root@overcloud-controller-0 ~]# nova list +--------------------------------------+-----------------+--------+------------+-------------+-----------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-----------------+--------+------------+-------------+-----------------------+ | 85814a47-2215-467e-a47f-63b191171c33 | qcow2-instance1 | ACTIVE | - | Running | internal1=10.10.10.11 | | 2b40849a-a99e-492f-93c1-db4c4b2ff80e | raw-instance1 | ACTIVE | - | Running | internal1=10.10.10.10 | +--------------------------------------+-----------------+--------+------------+-------------+-----------------------+ ~~~ Step 2 : Verified that disks are created on ceph backend. ~~~ [root@overcloud-controller-0 ~]# rbd -p images ls -l NAME SIZE PARENT FMT PROT LOCK 450293f9-8a49-4688-9667-85d2ee0a0fb8 10240M 2 450293f9-8a49-4688-9667-85d2ee0a0fb8@snap 10240M 2 yes c1728a18-f914-4222-93a6-692ff252eb6f 539M 2 c1728a18-f914-4222-93a6-692ff252eb6f@snap 539M 2 yes [root@overcloud-controller-0 ~]# rbd -p vms ls -l NAME SIZE PARENT FMT PROT LOCK 2b40849a-a99e-492f-93c1-db4c4b2ff80e_disk 20480M images/450293f9-8a49-4688-9667-85d2ee0a0fb8@snap 2 excl 85814a47-2215-467e-a47f-63b191171c33_disk 20480M 2 ~~~ Step 3 : Running tests Instance created using qcow2 image. ~~~ [root@host-10-10-10-11 ~]# time dd if=/dev/zero of=file1 bs=1024k count=1024 conv=fdatasync,notrunc 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 10.6126 s, 101 MB/s real 0m10.614s user 0m0.000s sys 0m0.532s ~~~ Instance created using raw image. ~~~ [root@host-10-10-10-10 ~]# time dd if=/dev/zero of=file1 bs=1024k count=1024 conv=fdatasync,notrunc 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 15.1281 s, 71.0 MB/s real 0m15.130s user 0m0.000s sys 0m0.628s ~~~ difference is still significant. I can run more tests if you want me to do that. Vikrant, I'd like to see whether this behaves the same in our own configuration. thanks for your help. It's on my to-do list. This is a significant perf. difference that you are observing. I didn't see the RHCS version you were using, is it in here? If not could you provide that? Also, did you run these tests more than once on the same volume and was there any difference in the performance the 2nd-Nth times? With RBD create, it's not actually allocating or initializing the volume when it creates it AFAIK, and this causes performance measurements for the first write to be different than measurements for subsequent writes. That's why Tim and I dd to entire cinder volume and treat this as a separate test from measuring its steady-state performance. Also, I didn't see you dropping cache in any of these tests, so this introduces variability between runs as well. Yes, you would expect the raw image to be written to faster than the qcow2 image, since there is no backing image to account for. There may be some complex behaviors around whether the backing image is cached or not, what kind of write I/O pattern is being done, how qcow2 images differ from raw images, etc. -ben Ceph version which is installed with OSP 10 : ~~~ puppet-ceph-2.2.1-3.el7ost.noarch ceph-osd-10.2.2-41.el7cp.x86_64 ceph-common-10.2.2-41.el7cp.x86_64 python-cephfs-10.2.2-41.el7cp.x86_64 ceph-base-10.2.2-41.el7cp.x86_64 ceph-mon-10.2.2-41.el7cp.x86_64 ceph-selinux-10.2.2-41.el7cp.x86_64 libcephfs1-10.2.2-41.el7cp.x86_64 ceph-radosgw-10.2.2-41.el7cp.x86_64 ~~~ C#9 results are shared while spawning instance using image. I have ran the test only once. Question: does this happen when a Ceph backend is not used? What happens when Ephemeral or other storage is used? I'm trying to determine if behavior described in this bz has anything to do with Ceph - that determines who needs to work on it. Question 2: why use qcow2 if you have Ceph RBD functionality to do copy-on-write? I think the answer is that qcow2 image is really really small, in initial post it is 1/2 GB of physical space representing a 10-GB virtual image. This makes it much quicker to load and cache the entire glance image, which can only help performance. The key observation here is that when we flatten the Nova images (eliminate the backing image), then the performance of the two images becomes the same. I think this is consistent with the hypothesis in comment 4. qcow2 images are not supported in ceph To be a little more specific, qcow2 images are not supported as Glance images, see http://docs.ceph.com/docs/master/rbd/rbd-openstack/ "Ceph doesn’t support QCOW2 for hosting a virtual machine disk. Thus if you want to boot virtual machines in Ceph (ephemeral backend or boot from volume), the Glance image format must be RAW." Josh Durgin and Jason Dillaman confirmed this. I do agree that we are not supporting qcow2 when using ceph backend. Cu. just this for showing the difference in performance results when using both disk types. |