Bug 1383014

Summary: nova instance performance issues while using ceph backend
Product: Red Hat OpenStack Reporter: VIKRANT <vaggarwa>
Component: cephAssignee: Ben England <bengland>
Status: CLOSED NOTABUG QA Contact: Warren <wusui>
Severity: medium Docs Contact:
Priority: medium    
Version: 8.0 (Liberty)CC: abond, bengland, dshaks, dwilson, jdurgin, jharriga, jomurphy, jtaleric, lhh, myllynen, nlevine, rsibley, rsussman, srevivo, twilkins, vaggarwa
Target Milestone: ---Keywords: Reopened
Target Release: 10.0 (Newton)   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-18 21:17:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description VIKRANT 2016-10-09 06:36:03 UTC
Description of problem:

nova instance performance issues while using ceph backend

Version-Release number of selected component (if applicable):

RHEL OSP 8 

How reproducible:
Everytime for cu. 

Steps to Reproduce:

1. Spawn an instance using qcow2 image or using volume created from qcow2 image.
2. Spawn an instance using raw image or using volume created from raw image.  It will create parent/child relation in both cases. 
3. Instance spawned in Step 1 giving twice performance in comparison to Step 2 instance while running dd test from instance.

~~~
Qcow2 image : 

[root@dawcow2image ~]# dd if=/dev/zero of=file1 bs=1024k count=1024
conv=fdatasync
1024+0 record dentro
1024+0 record fuori
1073741824 byte (1,1 GB) copiati, 2,18062 s, 492 MB/s

raw image : 

[root@darawimage ~]# dd if=/dev/zero of=file1 bs=1024k count=1024 conv=fdatasync
1024+0 record dentro
1024+0 record fuori
1073741824 byte (1,1 GB) copiati, 5,21842 s, 206 MB/s
[root@darawimage ~]#
~~~

Once we flatten the Step 2 instance backend on ceph then both instances are giving equal performance. 

Actual results:
Performance difference is very large. 

Expected results:
Performance difference should not be so large and what has been changed in driver code which is creating P/C relation for ceph volumes with raw image on OSP 8 but not on OSP 9.



Additional info:

+++++++++++
OSP 7 Setup
+++++++++++

No P/C relation exist when we are creating a cinder volume from a raw image. 


Step 1 : Checking the image status in glance. 

Raw image : 

~~~
# rbd -p images ls -l | grep bd0f6352-ca64-4476-94e0-a132d56b4399
bd0f6352-ca64-4476-94e0-a132d56b4399      10240M          2           
bd0f6352-ca64-4476-94e0-a132d56b4399@snap 10240M          2 yes       

# qemu-img info rbd:images/bd0f6352-ca64-4476-94e0-a132d56b4399
image: rbd:images/bd0f6352-ca64-4476-94e0-a132d56b4399
file format: raw
virtual size: 10G (10737418240 bytes)
disk size: unavailable
cluster_size: 8388608
Snapshot list:
ID        TAG                 VM SIZE                DATE       VM CLOCK
snap      snap                    10G 1970-01-01 05:30:00   00:00:00.000
~~~

qcow2 image : 

~~~
# rbd -p images ls -l | grep 7c764b1f-4726-4a92-834d-e117be22d0bf
7c764b1f-4726-4a92-834d-e117be22d0bf        452M          2           
7c764b1f-4726-4a92-834d-e117be22d0bf@snap   452M          2 yes  

# qemu-img info rbd:images/7c764b1f-4726-4a92-834d-e117be22d0bf
image: rbd:images/7c764b1f-4726-4a92-834d-e117be22d0bf
file format: qcow2
virtual size: 10G (10737418240 bytes)
disk size: unavailable
cluster_size: 65536
Format specific information:
    compat: 0.10
    refcount bits: 16     
~~~

Step 2 : Created volumes using the images.

Raw volume : Size of original raw image. 

~~~
# rbd -p volumes ls -l | grep 884273be-4361-4f62-b211-b620a64e0a76
volume-884273be-4361-4f62-b211-b620a64e0a76 10240M          2      
~~~

qcow2 volume : size more than qcow2 image. 

~~~
# rbd -p volumes ls -l | grep 9c7034c3-0f5c-427e-a414-0798ccdf2694
volume-9c7034c3-0f5c-427e-a414-0798ccdf2694 40960M          2     
~~~

Step 3 : Spawned instances using volumes.

# nova list
+--------------------------------------+-----------------+--------+------------+-------------+-------------------------+
| ID                                   | Name            | Status | Task State | Power State | Networks                |
+--------------------------------------+-----------------+--------+------------+-------------+-------------------------+
| 0cd131d5-8295-4c04-b159-671a6dda1c33 | qcow2-instance1 | ACTIVE | -          | Running     | internal=192.168.122.73 |
| ee5bcf8e-d921-4948-84a9-0d6131ae07af | raw-instance1   | ACTIVE | -          | Running     | internal=192.168.122.72 |
+--------------------------------------+-----------------+--------+------------+-------------+-------------------------+

Step 4 : Checking the children of the raw image nothing is getting displayed. 

qcow2 image : 

# rbd -p images children 7c764b1f-4726-4a92-834d-e117be22d0bf@snap

raw image : 

# rbd -p images children bd0f6352-ca64-4476-94e0-a132d56b4399@snap


+++++++++++
OSP 8 Setup
+++++++++++

P/C relation exist when we are creating a cinder volume from a raw image.


Step 1 : Created two glance images qcow2 and raw. 

~~~
$ rbd -p images ls -l
NAME                                        SIZE PARENT FMT PROT LOCK 
212af2e3-14fc-4c93-a8a1-75fa0449abb8      10240M          2           
212af2e3-14fc-4c93-a8a1-75fa0449abb8@snap 10240M          2 yes       
e954573b-5e75-4612-a80b-290134e07905        472M          2           
e954573b-5e75-4612-a80b-290134e07905@snap   472M          2 yes      
~~~

Step 2 : No children is present by default.

~~
$ rbd -p images children 212af2e3-14fc-4c93-a8a1-75fa0449abb8@snap
$ rbd -p images children e954573b-5e75-4612-a80b-290134e07905@snap
~~~

Step 3 : Created ceph volumes using both images. 

~~~
$ cinder create --image-id e954573b-5e75-4612-a80b-290134e07905 --display-name qcow2-volume1 20
+---------------------------------------+--------------------------------------+
|                Property               |                Value                 |
+---------------------------------------+--------------------------------------+
|              attachments              |                  []                  |
|           availability_zone           |                 nova                 |
|                bootable               |                false                 |
|          consistencygroup_id          |                 None                 |
|               created_at              |      2016-10-05T04:08:16.000000      |
|              description              |                 None                 |
|               encrypted               |                False                 |
|                   id                  | 1247c6d1-f7e2-49ef-b055-91002d1f817e |
|                metadata               |                  {}                  |
|            migration_status           |                 None                 |
|              multiattach              |                False                 |
|                  name                 |            qcow2-volume1             |
|         os-vol-host-attr:host         | hostgroup@tripleo_ceph#tripleo_ceph  |
|     os-vol-mig-status-attr:migstat    |                 None                 |
|     os-vol-mig-status-attr:name_id    |                 None                 |
|      os-vol-tenant-attr:tenant_id     |   743020b644804528b898ae3fd6a5a558   |
|   os-volume-replication:driver_data   |                 None                 |
| os-volume-replication:extended_status |                 None                 |
|           replication_status          |               disabled               |
|                  size                 |                  20                  |
|              snapshot_id              |                 None                 |
|              source_volid             |                 None                 |
|                 status                |               creating               |
|                user_id                |   b18953ac5c7f40b1953beb64f086c55b   |
|              volume_type              |                 None                 |
+---------------------------------------+--------------------------------------+

$ cinder create --image-id 212af2e3-14fc-4c93-a8a1-75fa0449abb8 --display-name raw-volume1 20
+---------------------------------------+--------------------------------------+
|                Property               |                Value                 |
+---------------------------------------+--------------------------------------+
|              attachments              |                  []                  |
|           availability_zone           |                 nova                 |
|                bootable               |                false                 |
|          consistencygroup_id          |                 None                 |
|               created_at              |      2016-10-05T04:16:42.000000      |
|              description              |                 None                 |
|               encrypted               |                False                 |
|                   id                  | 7ef90ad8-1ba9-45fc-b3ea-d9ac27eeb18b |
|                metadata               |                  {}                  |
|            migration_status           |                 None                 |
|              multiattach              |                False                 |
|                  name                 |             raw-volume1              |
|         os-vol-host-attr:host         | hostgroup@tripleo_ceph#tripleo_ceph  |
|     os-vol-mig-status-attr:migstat    |                 None                 |
|     os-vol-mig-status-attr:name_id    |                 None                 |
|      os-vol-tenant-attr:tenant_id     |   743020b644804528b898ae3fd6a5a558   |
|   os-volume-replication:driver_data   |                 None                 |
| os-volume-replication:extended_status |                 None                 |
|           replication_status          |               disabled               |
|                  size                 |                  20                  |
|              snapshot_id              |                 None                 |
|              source_volid             |                 None                 |
|                 status                |               creating               |
|                user_id                |   b18953ac5c7f40b1953beb64f086c55b   |
|              volume_type              |                 None                 |
+---------------------------------------+--------------------------------------+
~~~

Step 4 : Parent/child relation is showing for raw image. 

~~~
# rbd -p images children e954573b-5e75-4612-a80b-290134e07905@snap
# rbd -p images children 212af2e3-14fc-4c93-a8a1-75fa0449abb8@snap
volumes/volume-7ef90ad8-1ba9-45fc-b3ea-d9ac27eeb18b
~~~

==========================================

Tests created with rbd is not showing such huge difference however order of magnitude difference is seen in number of operations per second between compute and controller node. 

+++++++++++++++++
From compute node
+++++++++++++++++

CASE 1

[root@overcloud-compute-0 ~]# rbd create volumes/data-disk1 -s 204800 --image-format 2
[root@overcloud-compute-0 ~]# rbd bench-write volumes/data-disk1 --io-size 4096 --io-threads 16 --io-total 10000000000 --io-pattern seq
bench-write  io_size 4096 io_threads 16 bytes 10000000000 pattern seq
SEC       OPS   OPS/SEC   BYTES/SEC
1    133616  133642.59  547400032.10
2    273940  136983.43  561084135.83
3    413982  138003.01  565260308.65
4    554961  138746.85  568307080.88
5    699292  139863.75  572881913.64
6    839915  141229.15  578474586.94
7    983485  141909.04  581259434.77
8   1124749  142153.32  582259991.72
9   1267997  142607.29  584119447.63
10   1410098  142161.22  582292350.78
11   1550587  142165.28  582308990.86
12   1693153  141933.57  581359897.00
13   1834038  141857.92  581050028.05
14   1974950  141390.69  579136247.40
15   2118564  141693.16  580375181.91
16   2263242  142530.95  583806782.64
17   2398967  141162.67  578202276.85
elapsed:    17  ops:  2441407  ops/sec: 140094.59  bytes/sec: 573827429.96


CASE 2

[root@overcloud-compute-0 ~]# rbd create volumes/data-disk2 -s 204800 --image-format 2
[root@overcloud-compute-0 ~]# rbd snap create volumes/data-disk2@snap
[root@overcloud-compute-0 ~]# rbd snap protect volumes/data-disk2@snap
[root@overcloud-compute-0 ~]# rbd clone volumes/data-disk2@snap volumes/data-disk3
[root@overcloud-compute-0 ~]# rbd -p volumes ls -l
2016-10-06 07:48:30.235024 7f6310b0f7c0 -1 librbd::ImageCtx: error reading immutable metadata: (2) No such file or directory
2016-10-06 07:48:30.592282 7f6310b0f7c0 -1 librbd::ImageCtx: error reading immutable metadata: (2) No such file or directory
NAME                                                                                                  SIZE PARENT                                                                                                     FMT PROT LOCK
data-disk1                                                                                            200G                                                                                                              2
data-disk2                                                                                            200G                                                                                                              2
data-disk2@snap                                                                                       200G                                                                                                              2 yes
data-disk3                                                                                            200G volumes/data-disk2@snap                                                                                      2
volume-00c9b6d3-df75-4df3-a6ee-c20d7c5846ff                                                         10240M                                                                                                              2
volume-1177c82e-7e39-439a-a71f-f9b723c7cb52                                                         10240M                                                                                                              2
volume-183f6b49-16b9-4eaf-a5ac-63112a33be24                                                        102400M                                                                                                              2
volume-1b77b178-2540-436b-a14d-083e3a5a87c2                                                         20480M                                                                                                              2
volume-1d417105-10e4-44bc-91ce-7f8a712e49ca                                                         10240M                                                                                                              2
volume-26b256cb-ed41-4569-ae3f-d5e800070a1c                                                         26624M volumes/volume-c45d9aac-76f1-4417-b208-853f3ce078e8   2
volume-26d4c4b3-1ee4-46fe-bf3a-f63c59553d8b                                                         10240M                                                                                                              2
volume-2f959a39-7010-4da8-8106-b5716e04c648                                                         10240M                                                                                                              2
volume-36d68b80-d0e4-4a48-b919-b43815a04054                                                         10240M                                                                                                              2
volume-3a9573d7-1df5-4cbe-914c-48275a6e530c                                                         20480M images/37dc9447-11c4-4b58-99f8-e50305a02b06@snap                                                             2
volume-44212b31-06e6-4f2c-ad35-e8156fe46dae                                                         10240M                                                                                                              2
volume-4d96e569-6711-4ac0-85e1-3a4c13258a4e                                                         10240M                                                                                                              2
volume-4ff2d560-74bd-4df1-bc3e-aee7bd6ecc2b                                                         20480M                                                                                                              2
volume-52eb29c3-0bc0-4e2b-885d-27beabf95733                                                         10240M                                                                                                              2
volume-55b01632-cc3d-4200-93fd-a2b326f39711                                                         10240M                                                                                                              2
volume-56fb6c23-c8f2-475e-9960-8cc187b1cedd                                                         10240M                                                                                                              2
volume-615b4bfa-51ce-4529-ae65-4361e67d0252                                                         10240M                                                                                                              2
volume-644cfdda-9d94-4784-ba76-7487e1f4ae6b                                                         20480M                                                                                                              2
volume-661a8823-fbf2-40f2-b8ac-1789e3c8bb3f                                                         10240M                                                                                                              2
volume-661b765e-68ec-4b9b-9482-6289c8f0d470                                                         20480M images/37dc9447-11c4-4b58-99f8-e50305a02b06@snap                                                             2
volume-686931f1-0c97-4a28-a23a-e21470c40dbe                                                         10240M                                                                                                              2
volume-6931d441-6dbe-4b51-81d4-208be35d27e0                                                         10240M                                                                                                              2
volume-6d5b6019-5055-4683-af12-a47dd88d2c51                                                         20480M images/37dc9447-11c4-4b58-99f8-e50305a02b06@snap                                                             2
volume-6d887507-a804-41e3-a633-4a5b0f651a90                                                         10240M                                                                                                              2
volume-6e98a776-0679-417c-88b2-590394fa5bdc                                                         10240M                                                                                                              2
volume-6ecc3b25-0cfb-46fb-8a10-2648ce1cc849                                                         20480M                                                                                                              2
volume-72990018-260f-4880-939a-1b0566c2a262                                                         20480M images/37dc9447-11c4-4b58-99f8-e50305a02b06@snap                                                             2
volume-88c88720-8963-4ecb-8dc3-6557681b98f1                                                         20480M                                                                                                              2
volume-9b3c1ec4-7832-485e-94df-020b4b1fcd70                                                         20480M                                                                                                              2
volume-9de05763-535a-4e28-8013-a28c3bd5dce1                                                         10240M                                                                                                              2
volume-9debd6ff-760c-4c85-bcfa-7beec0614e48                                                         10240M                                                                                                              2
volume-9ffa1c5f-f679-4261-b8d5-22371218f095                                                         20480M images/acd4bd27-dff9-4d75-9c20-fed7d31adb28@snap                                                             2
volume-a87cc971-a70e-49a9-bfcb-e519652b6e9a                                                         20480M                                                                                                              2
volume-adcc668d-8ae7-4e79-bf86-4ea5b9054fe2                                                         20480M images/37dc9447-11c4-4b58-99f8-e50305a02b06@snap                                                             2
volume-b14f9c4f-4d05-4509-8482-adf63d4702a5                                                         10240M                                                                                                              2
volume-b60e26e1-a7f9-425a-8eca-5b8ee0fe76df                                                         40960M                                                                                                              2
volume-b765d604-81b6-4386-88d7-df9fa1ff76c5                                                         20480M                                                                                                              2
volume-b7ba67ef-c247-456d-9558-db04fd91799d                                                        102400M                                                                                                              2
volume-c30438fb-238b-4768-8080-85479576fd41                                                         30720M images/acd4bd27-dff9-4d75-9c20-fed7d31adb28@snap                                                             2
volume-c45d9aac-76f1-4417-b208-853f3ce078e8                                                         26624M images/acd4bd27-dff9-4d75-9c20-fed7d31adb28@snap                                                             2
volume-c45d9aac-76f1-4417-b208-853f3ce078e8  26624M images/acd4bd27-dff9-4d75-9c20-fed7d31adb28@snap                                                             2 yes
volume-c9e89560-41bb-426b-9b8e-a06e9bcbb0e4                                                         20480M images/37dc9447-11c4-4b58-99f8-e50305a02b06@snap                                                             2
volume-ce4ddc0b-31ce-4e6f-a481-73bd339befa7                                                         10240M                                                                                                              2
volume-d125897c-b3d4-4547-825b-69b3ee08a2b6                                                         10240M                                                                                                              2
volume-d40c5dd1-3ed8-4d3c-b628-1adc062a4d7b                                                         10240M                                                                                                              2
volume-da3b0d29-3706-48e5-8628-044856794d78                                                        102400M                                                                                                              2
volume-db03cd9d-d5d6-4b09-8c2e-a9d0be7bbb47                                                         10240M                                                                                                              2
volume-e08c721e-059f-41c6-8f55-cb79797688ea                                                         40960M                                                                                                              2
volume-e08c721e-059f-41c6-8f55-cb79797688ea@snapshot-bfecf3c0-0835-4fc1-a710-685e1f944acc           40960M                                                                                                              2 yes
volume-e49a946d-15e2-4c63-9573-7a7fb3f2938b                                                         10240M                                                                                                              2
volume-e62c6254-e54b-4a40-a979-96a25862b5b2                                                         10240M                                                                                                              2
volume-e77208c6-50bc-4174-bb21-4089067c6c29                                                         10240M                                                                                                              2
volume-edb013b3-7682-4129-97f7-23d5f483783f                                                         20480M                                                                                                              2
volume-f3a7432b-c6c3-4966-97e5-a6866950c24b                                                         20480M                                                                                                              2
volume-f4f66f86-c41b-4914-abf4-a77d0bbefa49                                                         10240M                                                                                                              2
volume-fa06ffb5-600f-4a7b-9019-65d9ec1729d4                                                         10240M                                                                                                              2
[root@overcloud-compute-0 ~]#
[root@overcloud-compute-0 ~]# rbd bench-write volumes/data-disk3 --io-size 4096 --io-threads 16 --io-total 10000000000 --io-pattern seq
bench-write  io_size 4096 io_threads 16 bytes 10000000000 pattern seq
SEC       OPS   OPS/SEC   BYTES/SEC
1    145978  145999.49  598013923.68
2    288173  144092.62  590203353.91
3    420567  140134.09  573989252.78
4    555840  138965.49  569202660.08
5    699641  139920.23  573113262.20
6    826637  136076.33  557368644.26
7    967951  135928.91  556764816.49
8   1045903  125062.80  512257217.08
9   1188959  126570.31  518431998.55
10   1318842  123812.25  507134975.81
11   1431096  120918.36  495281616.36
12   1537583  113870.90  466415215.19
13   1648178  120491.21  493531984.39
14   1784090  119054.84  487648620.73
15   1914959  119260.89  488492608.94
16   2052732  124350.70  509340450.06
17   2193732  131321.36  537892292.11
18   2337242  137812.97  564481936.55
elapsed:    18  ops:  2441407  ops/sec: 130009.18  bytes/sec: 532517618.61


++++++++++++++++++++
From controller node
++++++++++++++++++++

CASE 1

[root@overcloud-controller-0 ~]# rbd create volumes/data-disk1 -s 204800 --image-format 2
[root@overcloud-controller-0 ~]# rbd bench-write volumes/data-disk1 --io-size 4096 --io-threads 16 --io-total 10000000000 --io-pattern seq
bench-write  io_size 4096 io_threads 16 bytes 10000000000 pattern seq
SEC       OPS   OPS/SEC   BYTES/SEC
1     96516  96538.25  395420656.90
2    174398  87208.69  357206774.84
3    262656  87559.26  358642739.55
4    344730  86182.22  353002358.55
5    437353  87474.84  358296955.56
6    520981  84892.94  347721462.55
7    605124  86145.58  352852312.00
8    686550  84778.77  347253830.83
9    764774  84013.07  344117524.11
10    855034  83536.25  342164460.11
11    934666  82736.99  338890727.33
12   1017128  82400.82  337513775.69
13   1107463  84182.51  344811566.17
14   1180503  83145.92  340565679.11
15   1262974  81587.36  334181825.41
16   1332059  79478.62  325544410.06
17   1437295  84033.42  344200907.94
18   1565676  91642.72  375368578.86
19   1683208  100540.86  411815379.39
20   1802435  107892.99  441929675.70
21   1920824  117752.98  482316214.98
22   2009194  114379.85  468499869.47
23   2104777  107820.23  441631655.53
24   2191861  101730.73  416689082.30
25   2292499  98012.88  401460768.52
26   2386867  93208.64  381782600.61
elapsed:    28  ops:  2441407  ops/sec: 86784.91  bytes/sec: 355470991.53
[root@overcloud-controller-0 ~]#

CASE 2

[root@overcloud-controller-0 ~]# rbd create volumes/data-disk2 -s 204800 --image-format 2
[root@overcloud-controller-0 ~]# rbd snap create volumes/data-disk2@snap
[root@overcloud-controller-0 ~]# rbd snap protect volumes/data-disk2@snap
[root@overcloud-controller-0 ~]# rbd clone volumes/data-disk2@snap volumes/data-disk3
[root@overcloud-controller-0 ~]# rbd -p volumes ls -l
2016-10-05 13:10:38.858010 7f8080a067c0 -1 librbd::ImageCtx: error reading immutable metadata: (2) No such file or directory
2016-10-05 13:10:39.007380 7f8080a067c0 -1 librbd::ImageCtx: error reading immutable metadata: (2) No such file or directory
NAME                                                                                                  SIZE PARENT                                                                                                     FMT PROT LOCK
data-disk1                                                                                            200G                                                                                                              2
data-disk2                                                                                            200G                                                                                                              2
data-disk2@snap                                                                                       200G                                                                                                              2 yes
data-disk3                                                                                            200G volumes/data-disk2@snap                                                                                      2
volume-00c9b6d3-df75-4df3-a6ee-c20d7c5846ff                                                         10240M                                                                                                              2
volume-1177c82e-7e39-439a-a71f-f9b723c7cb52                                                         10240M                                                                                                              2
volume-1d417105-10e4-44bc-91ce-7f8a712e49ca                                                         10240M                                                                                                              2
volume-26b256cb-ed41-4569-ae3f-d5e800070a1c                                                         26624M volumes/volume-c45d9aac-76f1-4417-b208-853f3ce078e8   2
volume-26d4c4b3-1ee4-46fe-bf3a-f63c59553d8b                                                         10240M                                                                                                              2
volume-2f959a39-7010-4da8-8106-b5716e04c648                                                         10240M                                                                                                              2
volume-36d68b80-d0e4-4a48-b919-b43815a04054                                                         10240M                                                                                                              2
volume-3a9573d7-1df5-4cbe-914c-48275a6e530c                                                         20480M images/37dc9447-11c4-4b58-99f8-e50305a02b06@snap                                                             2
volume-44212b31-06e6-4f2c-ad35-e8156fe46dae                                                         10240M                                                                                                              2
volume-4d96e569-6711-4ac0-85e1-3a4c13258a4e                                                         10240M                                                                                                              2
volume-52eb29c3-0bc0-4e2b-885d-27beabf95733                                                         10240M                                                                                                              2
volume-55b01632-cc3d-4200-93fd-a2b326f39711                                                         10240M                                                                                                              2
volume-56fb6c23-c8f2-475e-9960-8cc187b1cedd                                                         10240M                                                                                                              2
volume-615b4bfa-51ce-4529-ae65-4361e67d0252                                                         10240M                                                                                                              2
volume-661a8823-fbf2-40f2-b8ac-1789e3c8bb3f                                                         10240M                                                                                                              2
volume-661b765e-68ec-4b9b-9482-6289c8f0d470                                                         20480M images/37dc9447-11c4-4b58-99f8-e50305a02b06@snap                                                             2
volume-686931f1-0c97-4a28-a23a-e21470c40dbe                                                         10240M                                                                                                              2
volume-6931d441-6dbe-4b51-81d4-208be35d27e0                                                         10240M                                                                                                              2
volume-6d5b6019-5055-4683-af12-a47dd88d2c51                                                         20480M images/37dc9447-11c4-4b58-99f8-e50305a02b06@snap                                                             2
volume-6d887507-a804-41e3-a633-4a5b0f651a90                                                         10240M                                                                                                              2
volume-6e98a776-0679-417c-88b2-590394fa5bdc                                                         10240M                                                                                                              2
volume-6ecc3b25-0cfb-46fb-8a10-2648ce1cc849                                                         20480M                                                                                                              2
volume-72990018-260f-4880-939a-1b0566c2a262                                                         20480M images/37dc9447-11c4-4b58-99f8-e50305a02b06@snap                                                             2
volume-88c88720-8963-4ecb-8dc3-6557681b98f1                                                         20480M                                                                                                              2
volume-9b3c1ec4-7832-485e-94df-020b4b1fcd70                                                         20480M                                                                                                              2
volume-9de05763-535a-4e28-8013-a28c3bd5dce1                                                         10240M                                                                                                              2
volume-9debd6ff-760c-4c85-bcfa-7beec0614e48                                                         10240M                                                                                                              2
volume-9ffa1c5f-f679-4261-b8d5-22371218f095                                                         20480M images/acd4bd27-dff9-4d75-9c20-fed7d31adb28@snap                                                             2
volume-adcc668d-8ae7-4e79-bf86-4ea5b9054fe2                                                         20480M images/37dc9447-11c4-4b58-99f8-e50305a02b06@snap                                                             2
volume-b14f9c4f-4d05-4509-8482-adf63d4702a5                                                         10240M                                                                                                              2
volume-b60e26e1-a7f9-425a-8eca-5b8ee0fe76df                                                         40960M                                                                                                              2
volume-c30438fb-238b-4768-8080-85479576fd41                                                         30720M images/acd4bd27-dff9-4d75-9c20-fed7d31adb28@snap                                                             2
volume-c45d9aac-76f1-4417-b208-853f3ce078e8                                                         26624M images/acd4bd27-dff9-4d75-9c20-fed7d31adb28@snap                                                             2
volume-c45d9aac-76f1-4417-b208-853f3ce078e8  26624M images/acd4bd27-dff9-4d75-9c20-fed7d31adb28@snap                                                             2 yes
volume-c9e89560-41bb-426b-9b8e-a06e9bcbb0e4                                                         20480M images/37dc9447-11c4-4b58-99f8-e50305a02b06@snap                                                             2
volume-ce4ddc0b-31ce-4e6f-a481-73bd339befa7                                                         10240M                                                                                                              2
volume-d125897c-b3d4-4547-825b-69b3ee08a2b6                                                         10240M                                                                                                              2
volume-d40c5dd1-3ed8-4d3c-b628-1adc062a4d7b                                                         10240M                                                                                                              2
volume-da3b0d29-3706-48e5-8628-044856794d78                                                        102400M                                                                                                              2
volume-db03cd9d-d5d6-4b09-8c2e-a9d0be7bbb47                                                         10240M                                                                                                              2
volume-e08c721e-059f-41c6-8f55-cb79797688ea                                                         40960M                                                                                                              2
volume-e08c721e-059f-41c6-8f55-cb79797688ea@snapshot-bfecf3c0-0835-4fc1-a710-685e1f944acc           40960M                                                                                                              2 yes
volume-e49a946d-15e2-4c63-9573-7a7fb3f2938b                                                         10240M                                                                                                              2
volume-e62c6254-e54b-4a40-a979-96a25862b5b2                                                         10240M                                                                                                              2
volume-e77208c6-50bc-4174-bb21-4089067c6c29                                                         10240M                                                                                                              2
volume-f4f66f86-c41b-4914-abf4-a77d0bbefa49                                                         10240M                                                                                                              2
volume-fa06ffb5-600f-4a7b-9019-65d9ec1729d4                                                         10240M                                                                                                              2
[root@overcloud-controller-0 ~]#
[root@overcloud-controller-0 ~]# rbd info volumes/data-disk3
rbd image 'data-disk3':
size 200 GB in 51200 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.4d73c3d1b58ba
format: 2
features: layering
flags:
parent: volumes/data-disk2@snap
overlap: 200 GB
[root@overcloud-controller-0 ~]#
[root@overcloud-controller-0 ~]# rbd bench-write volumes/data-disk3 --io-size 4096 --io-threads 16 --io-total 10000000000 --io-pattern seq
bench-write  io_size 4096 io_threads 16 bytes 10000000000 pattern seq
SEC       OPS   OPS/SEC   BYTES/SEC
1     99560  99580.71  407882596.19
2    200988  100413.26  411292721.31
3    255062  84975.27  348058726.17
4    326363  81595.70  334216002.90
5    400676  80094.24  328066014.35
6    465726  73232.75  299961331.04
7    539850  67796.92  277696179.89
8    609898  70993.29  290788509.75
9    687556  72238.57  295889165.59
10    743935  68690.23  281355182.58
11    807579  68370.98  280047550.12
12    868570  65744.03  269287546.04
13    927884  63588.56  260458745.64
14    977269  57938.93  237317852.52
15   1050866  61386.27  251438151.73
16   1135875  65640.19  268862226.02
17   1207778  67841.66  277879432.47
18   1280540  70540.73  288934842.32
19   1340824  72715.62  297843184.23
20   1411871  72200.93  295734990.68
21   1501737  73193.58  299800898.93
22   1568769  72198.20  295723812.41
23   1631478  70187.53  287488134.47
24   1684841  68803.34  281818473.38
25   1766260  70877.86  290315716.85
26   1838098  67272.24  275547088.59
27   1922422  70730.60  289712551.74
28   1991644  72033.30  295048414.43
29   2051437  73318.63  300313125.68
30   2125091  71738.08  293839181.94
31   2181269  68634.12  281125370.31
32   2250781  65660.90  268947027.05
33   2334020  68471.92  280460984.39
34   2389046  67522.39  276571699.69
elapsed:    35  ops:  2441407  ops/sec: 69155.75  bytes/sec: 283261938.87


++++++++++
conclusion
++++++++++

~~~
==> Qcow2 image : 

a) controller node : 

elapsed:    28  ops:  2441407  ops/sec: 86784.91  bytes/sec: 355470991.53

b) compute node : 

elapsed:    17  ops:  2441407  ops/sec: 140094.59  bytes/sec: 573827429.96

==> raw image : 

a) controller node : 

elapsed:    35  ops:  2441407  ops/sec: 69155.75  bytes/sec: 283261938.87

b) compute node : 

elapsed:    18  ops:  2441407  ops/sec: 130009.18  bytes/sec: 532517618.61
~~~

Order of magnitude difference is seen in controller and compute performance. Still, relative difference from compute node is not twofold.

Comment 1 VIKRANT 2016-10-09 09:15:01 UTC
Performed test on RHEL 7 setup :

Spawned two instances , one using qcow2 image and second using raw image.  

Raw image created parent child relation.

~~~
# rbd -p images children 04aad515-1fb8-4f79-8838-71d38dabba1f@snap
vms/ca68fce0-76d2-4f85-b886-e4e5e02ccbff_disk
~~~

Now while using dd command to perform the test  I can see that instance spawned from qcow2 image is giving twice more performance than instance spawned from raw image.

Comment 4 Ben England 2016-10-19 16:48:47 UTC
I had thought that Ceph with Nova requires use of raw image, correct?  Because Ceph is doing the "backing image" in RBD that formerly was done using qcow2, correct?  But evidently not so.

Is Ceph imposing some sort of copy-on-write overhead for the Nova image that it doesn't need to impose?  For example, is it reading from the snapshot the 4-KB filesystem blocks that dd is writing to?   It is necessary to read the backing image if you insert data into the middle of a block, but if you are writing out the entire block, in theory it should be unnecessary to read the block from the snapshot first - since it doesn't matter what was stored there before.

As the post suggests, we should be able to perform this test with RBD volume backed with a snapshot vs RBD volume not backed by a snapshot, using librbd engine in fio, and see if it's something to do with use of RBD snapshots.  If you do this same test to a Cinder volume, which is not backed by a snapshot, what do you get?  

Another interesting test would be to repeat the dd test again, on the same exact file, using "conv=notrunc", so you would be writing to the same physical blocks in storage.  This would be a "re-write" test.  There should be no copy-on-write overhead at this point because the Nova image has already diverged from the backing snapshot.

Note that if the above hypothesis is correct about copy-on-write overhead, then the qcow2 image much smaller than the raw image (i.e. sparse?), so there is less reading to do, so this might explain the difference in performance perhaps?

Comment 5 VIKRANT 2016-10-20 05:13:06 UTC
Strangely I have not seen the two fold difference this time. 

here the test results from OSP 7 setup. 

Commands used : 

# dd if=/dev/zero of=file1 bs=1024k count=1024 conv=fdatasync
# dd if=/dev/zero of=file1 bs=1024k count=1024 conv=notrunc

------------------------------------
conv     | qcow2 image | raw image |
-----------------------------------
fdatasync|  104 MB/s   |  85.8 MB/s|
------------------------------------
notrucn  |  158 MB/s   |  136 MB/s |
-----------------------------------

Comment 6 Ben England 2016-10-20 13:13:53 UTC
Thanks, can you try "conv=fdatasync,notrunc" ?  fdatasync is important because otherwise the data may not have reached persistent storage.

Comment 7 VIKRANT 2016-10-24 04:44:25 UTC
Hello, 

I am facing some issue with test setup. I will be sure to update you once the setup is functional again.

Comment 9 VIKRANT 2016-11-23 10:52:47 UTC
Sorry for delayed response. It's really difficult to get hold off of physical setup :

This time I have used different HW with OSP 10 setup for re-producing the issue:

Step 1 : Spawned two instances, one using qcow2 and other using raw image. 

~~~
[root@overcloud-controller-0 ~]# nova list
+--------------------------------------+-----------------+--------+------------+-------------+-----------------------+
| ID                                   | Name            | Status | Task State | Power State | Networks              |
+--------------------------------------+-----------------+--------+------------+-------------+-----------------------+
| 85814a47-2215-467e-a47f-63b191171c33 | qcow2-instance1 | ACTIVE | -          | Running     | internal1=10.10.10.11 |
| 2b40849a-a99e-492f-93c1-db4c4b2ff80e | raw-instance1   | ACTIVE | -          | Running     | internal1=10.10.10.10 |
+--------------------------------------+-----------------+--------+------------+-------------+-----------------------+
~~~

Step 2 : Verified that disks are created on ceph backend.

~~~
[root@overcloud-controller-0 ~]# rbd -p images ls -l
NAME                                        SIZE PARENT FMT PROT LOCK 
450293f9-8a49-4688-9667-85d2ee0a0fb8      10240M          2           
450293f9-8a49-4688-9667-85d2ee0a0fb8@snap 10240M          2 yes       
c1728a18-f914-4222-93a6-692ff252eb6f        539M          2           
c1728a18-f914-4222-93a6-692ff252eb6f@snap   539M          2 yes       


[root@overcloud-controller-0 ~]# rbd -p vms ls -l
NAME                                        SIZE PARENT                                           FMT PROT LOCK 
2b40849a-a99e-492f-93c1-db4c4b2ff80e_disk 20480M images/450293f9-8a49-4688-9667-85d2ee0a0fb8@snap   2      excl 
85814a47-2215-467e-a47f-63b191171c33_disk 20480M                                                    2           
~~~

Step 3 : Running tests

Instance created using qcow2 image. 

~~~
[root@host-10-10-10-11 ~]# time dd if=/dev/zero of=file1 bs=1024k count=1024 conv=fdatasync,notrunc
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 10.6126 s, 101 MB/s

real	0m10.614s
user	0m0.000s
sys	0m0.532s
~~~

Instance created using raw image.

~~~
[root@host-10-10-10-10 ~]# time dd if=/dev/zero of=file1 bs=1024k count=1024 conv=fdatasync,notrunc
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 15.1281 s, 71.0 MB/s

real	0m15.130s
user	0m0.000s
sys	0m0.628s
~~~

difference is still significant. I can run more tests if you want me to do that.

Comment 11 Ben England 2016-11-25 16:20:54 UTC
Vikrant, 

I'd like to see whether this behaves the same in our own configuration.   thanks for your help.  It's on my to-do list. This is a significant perf. difference that you are observing.

I didn't see the RHCS version you were using, is it in here?  If not could you provide that?  

Also, did you run these tests more than once on the same volume and was there any difference in the performance the 2nd-Nth times?  With RBD create, it's not actually allocating or initializing the volume when it creates it AFAIK, and this causes performance measurements for the first write to be different than measurements for subsequent writes.  That's why Tim and I dd to entire cinder volume and treat this as a separate test from measuring its steady-state performance.  

Also, I didn't see you dropping cache in any of these tests, so this introduces variability between runs as well.

Yes, you would expect the raw image to be written to faster than the qcow2 image, since there is no backing image to account for.    There may be some complex behaviors around whether the backing image is cached or not, what kind of write I/O pattern is being done, how qcow2 images differ from raw images, etc.

-ben

Comment 13 VIKRANT 2016-11-25 16:39:36 UTC
Ceph version which is installed with OSP 10 : 

~~~
puppet-ceph-2.2.1-3.el7ost.noarch
ceph-osd-10.2.2-41.el7cp.x86_64
ceph-common-10.2.2-41.el7cp.x86_64
python-cephfs-10.2.2-41.el7cp.x86_64
ceph-base-10.2.2-41.el7cp.x86_64
ceph-mon-10.2.2-41.el7cp.x86_64
ceph-selinux-10.2.2-41.el7cp.x86_64
libcephfs1-10.2.2-41.el7cp.x86_64
ceph-radosgw-10.2.2-41.el7cp.x86_64
~~~

C#9 results are shared while spawning instance using image. I have ran the test only once.

Comment 17 Ben England 2017-01-04 21:15:12 UTC
Question: does this happen when a Ceph backend is not used?  What happens when Ephemeral or other storage is used?  I'm trying to determine if behavior described in this bz has anything to do with Ceph - that determines who needs to work on it.

Question 2: why use qcow2 if you have Ceph RBD functionality to do copy-on-write?  I think the answer is that qcow2 image is really really small, in initial post it is 1/2 GB of physical space representing a 10-GB virtual image.  This makes it much quicker to load and cache the entire glance image, which can only help performance.  

The key observation here is that when we flatten the Nova images (eliminate the backing image), then the performance of the two images becomes the same.  I think this is consistent with the hypothesis in comment 4.

Comment 18 Dave Wilson 2017-01-18 21:17:11 UTC
qcow2 images are not supported in ceph

Comment 19 Ben England 2017-01-18 21:21:28 UTC
To be a little more specific, qcow2 images are not supported as Glance images, see

http://docs.ceph.com/docs/master/rbd/rbd-openstack/

"Ceph doesn’t support QCOW2 for hosting a virtual machine disk. Thus if you want to boot virtual machines in Ceph (ephemeral backend or boot from volume), the Glance image format must be RAW." 

Josh Durgin and Jason Dillaman confirmed this.

Comment 20 VIKRANT 2017-01-23 12:56:43 UTC
I do agree that we are not supporting qcow2 when using ceph backend. Cu. just this for showing the difference in performance results when using both disk types.