Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1389112

Summary:	RBD Python bindings script is taking 3x more time to import a file as compare to RBD cli
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Vikhyat Umrao <vumrao>
Component:	RBD	Assignee:	Jason Dillaman <jdillama>
Status:	CLOSED NOTABUG	QA Contact:	ceph-qe-bugs <ceph-qe-bugs>
Severity:	high	Docs Contact:
Priority:	high
Version:	1.3.3	CC:	akaris, anharris, ceph-eng-bugs, wlehman
Target Milestone:	rc
Target Release:	4.0
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-09-06 18:10:27 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1389529, 1647041, 1919160
Bug Blocks:

Description Vikhyat Umrao 2016-10-26 21:32:44 UTC

Description of problem:
RBD Python bindings script it is taking 3x more time to import a file as compare to RBD cli 

Version-Release number of selected component (if applicable):
Red Hat Ceph Storage 1.3.z

How reproducible:
Always

Steps to Reproduce:
- Import the image with rbd cli and next time do it with python bindings. 

---- 

- Initially we thought this one is the issue: http://tracker.ceph.com/issues/17178 and https://bugzilla.redhat.com/show_bug.cgi?id=1373643 as we are focusing on glance images upload is slow.

- As mentioned in http://tracker.ceph.com/issues/17178 RBD python still do not support skipping fully zeroed blocks when importing the image to the RBD pool.

- We thought it is because of that normal RBD import from rbd cli is completing in 20 seconds for 1.1G raw image and with python bindings it is taking nearly 4 minutes for same image.

----
# du -sh /root/rhel-guest-image-7.2-20151102.0.x86_64.raw
1.1G	/root/rhel-guest-image-7.2-20151102.0.x86_64.raw


# time python image-upload.py rhel7.1 /root/rhel-guest-image-7.2-20151102.0.x86_64.raw 8

[......]


Writing data at offset 10645143552(MB: 10152)
Writing data at offset 10653532160(MB: 10160)
Writing data at offset 10661920768(MB: 10168)
Writing data at offset 10670309376(MB: 10176)
Writing data at offset 10678697984(MB: 10184)
Writing data at offset 10687086592(MB: 10192)
Writing data at offset 10695475200(MB: 10200)
Writing data at offset 10703863808(MB: 10208)
Writing data at offset 10712252416(MB: 10216)
Writing data at offset 10720641024(MB: 10224)
Writing data at offset 10729029632(MB: 10232)
done

real	4m25.849s
user	0m4.523s
sys	0m7.037s
[root@dell-per630-11 ceph]# rbd info images/rhel7.1
rbd image 'rhel7.1':
	size 10240 MB in 1280 objects
	order 23 (8192 kB objects)
	block_name_prefix: rbd_data.17632b6238e1f29
	format: 2
	features: layering
	flags: 

# time rbd --id=glance --image-format=2 -p images import /root/rhel-guest-image-7.2-20151102.0.x86_64.raw rhel7 
Importing image: 100% complete...done.

real	0m20.950s
user	0m9.691s
sys	0m3.212s

# rbd info images/rhel7
rbd image 'rhel7':
	size 10240 MB in 2560 objects
	order 22 (4096 kB objects)
	block_name_prefix: rbd_data.1743256238e1f29
	format: 2
	features: layering
	flags:
---

---- Script which we used for python binding is referenced from RBD glance driver ----------------------------------------------------------------------


import os
import sys
import math

from oslo_utils import units

try:
    import rados
    import rbd
except ImportError:
    rados = None
    rbd = None


if len(sys.argv) <4 :
    sys.exit('Usage: %s <image_name> <image_path> <chunk_size_in_MB>')
image_name = sys.argv[1]
image_path = sys.argv[2]
chunk = int(sys.argv[3])

chunk_size = chunk * units.Mi
pool = 'images'
#pool = 'svl-fab1-aos-glance-pool-01'
user = 'glance'
#user = 'svl-fab1-aos-glance-usr'
conf_file = '/etc/ceph/ceph.conf'
radosid= 'glance'
#radosid='svl-fab1-aos-glance-usr'
connect_timeout = 0
image_size = os.path.getsize(image_path)
order = int(math.log(chunk_size, 2))

print image_name
print image_path
print image_size
print chunk
print chunk_size
print pool
print user
print conf_file
print connect_timeout
print radosid

with rados.Rados(conffile=conf_file, rados_id=radosid) as cluster:
    with cluster.open_ioctx(pool) as ioctx:
        rbd_inst = rbd.RBD()
        size = image_size
        rbd_inst.create(ioctx, image_name, size, order,old_format=False,features=1)
        with rbd.Image(ioctx, image_name) as image:
          f = open(image_path, "rb")
          try:
              offset = 0
              data = f.read(chunk_size)
              while data != "":
                  print "Writing data at offset " + str(offset) + "(MB: " + str(offset / units.Mi) + ")"
                  offset += image.write(data,offset)
                  data = f.read(chunk_size)
          finally:
              f.close()

print "done"

-----------

Comment 1 Vikhyat Umrao 2016-10-26 21:38:38 UTC

- To avoid the zero block stuff we created a file of 9.3G and 1G with /dev/urandom and reproduced once again and it is mostly same the only difference is now rbd cli is not have same performance as it is taking whole file in consideration. 

-------------------------------------

- I created a 9.3G tesfile with /dev/urandom

# du -sh /etc/ceph/testfile 
9.3G	/etc/ceph/testfile

# time rbd --id=glance --image-format=2 --order=23 -p images import /etc/ceph/testfile testfile.rbd.cli 
Importing image: 100% complete...done.

real	1m41.499s
user	0m7.554s
sys	0m9.129s

# rbd info images/testfile.rbd.cli
rbd image 'testfile.rbd.cli':
	size 9462 MB in 1183 objects
	order 23 (8192 kB objects)
	block_name_prefix: rbd_data.1743a3f238e1f29
	format: 2
	features: layering
	flags: 


^^ rbd cli which was taking 20 seconds now it is taking 1 minute 41 seconds. But python binding is nearly same.

# time python image-upload.py testfile.rbd.python.bindings /etc/ceph/testfile 8

[......]
Writing data at offset 9873391616(MB: 9416)
Writing data at offset 9881780224(MB: 9424)
Writing data at offset 9890168832(MB: 9432)
Writing data at offset 9898557440(MB: 9440)
Writing data at offset 9906946048(MB: 9448)
Writing data at offset 9915334656(MB: 9456)
done

real	4m4.008s
user	0m4.093s
sys	0m6.611s

# rbd info images/testfile.rbd.python.bindings
rbd image 'testfile.rbd.python.bindings':
	size 9462 MB in 1183 objects
	order 23 (8192 kB objects)
	block_name_prefix: rbd_data.1763af9238e1f29
	format: 2
	features: layering
	flags: 
-----------------------------

For 1G file:
------------

# du -sh /etc/ceph/testfile.1G 
1.0G	/etc/ceph/testfile.1G

# time rbd --id=glance --order=23 --image-format=2 -p images import /etc/ceph/testfile.1G testfile.rbd.cli.1G 
Importing image: 100% complete...done.

real	0m11.504s
user	0m0.870s
sys	0m1.432s

# rbd info images/testfile.rbd.cli.1G
rbd image 'testfile.rbd.cli.1G':
	size 1024 MB in 128 objects
	order 23 (8192 kB objects)
	block_name_prefix: rbd_data.176787e238e1f29
	format: 2
	features: layering
	flags: 


# time python image-upload.py testfile.rbd.python.bindings.1G /etc/ceph/testfile.1G 8

[.....]
Writing data at offset 1031798784(MB: 984)
Writing data at offset 1040187392(MB: 992)
Writing data at offset 1048576000(MB: 1000)
Writing data at offset 1056964608(MB: 1008)
Writing data at offset 1065353216(MB: 1016)
done

real	0m26.687s
user	0m0.578s
sys	0m0.816s

# rbd info images/testfile.rbd.python.bindings.1G
rbd image 'testfile.rbd.python.bindings.1G':
	size 1024 MB in 128 objects
	order 23 (8192 kB objects)
	block_name_prefix: rbd_data.17678ab238e1f29
	format: 2
	features: layering
	flags:
---------------------------------------------------------

Comment 2 Jason Dillaman 2016-10-27 19:18:21 UTC

There are two factors are play: (1) the rbd CLI will skip zeroed, object-size extents and (2) the rbd CLI uses aio to have 10 concurrent read/write requests in flight concurrently (controlled by the "rbd concurrent management ops" config value). Therefore, this BZ is not comparing apples to apples.

Comment 3 Andreas Karis 2016-10-27 19:24:00 UTC

Hi Jason! Well, for users, the end result is important. Glance is leveraging the python bindings, and we needed to know where this performance difference came from. Thanks for the explanation! How could one increase the concurrency of the python bindings? Would this be an RFE for the library code, or should the user of python rbd bindings somehow take care of the threading to create concurrent read/write requests in the application code? Meaning that this should be improved in glance's code?

Comment 4 Vikhyat Umrao 2016-10-27 19:43:34 UTC

(In reply to Andreas Karis from comment #3)
> Hi Jason! Well, for users, the end result is important. Glance is leveraging
> the python bindings, and we needed to know where this performance difference
> came from. Thanks for the explanation! How could one increase the
> concurrency of the python bindings? Would this be an RFE for the library
> code, or should the user of python rbd bindings somehow take care of the
> threading to create concurrent read/write requests in the application code?
> Meaning that this should be improved in glance's code?

Hey Andreas, I had discussion with Jason and right we need to Open a RFE to enhance the glance RBD driver code for import to make use of concurrency.

Comment 5 Andreas Karis 2016-10-27 20:02:21 UTC

Created RFE agains glance https://bugzilla.redhat.com/show_bug.cgi?id=1389529

Comment 6 Jason Dillaman 2017-09-06 18:10:27 UTC

BZ1389529 covers the ability to sparse upload images to RBD-backed Glance.