Description of problem: There is a noticeable performance delay between glance image uploads into ceph and uploads directly into ceph via CLI "There are two factors are play: (1) the rbd CLI will skip zeroed, object-size extents and (2) the rbd CLI uses aio to have 10 concurrent read/write requests in flight concurrently (controlled by the "rbd concurrent management ops" config value). Therefore, this BZ is not comparing apples to apples." Version-Release number of selected component (if applicable): all versions How reproducible: Verify the procedure and comments here https://bugzilla.redhat.com/show_bug.cgi?id=1389112 Steps to Reproduce: ---- # du -sh /root/rhel-guest-image-7.2-20151102.0.x86_64.raw 1.1G /root/rhel-guest-image-7.2-20151102.0.x86_64.raw # time python image-upload.py rhel7.1 /root/rhel-guest-image-7.2-20151102.0.x86_64.raw 8 [......] Writing data at offset 10645143552(MB: 10152) Writing data at offset 10653532160(MB: 10160) Writing data at offset 10661920768(MB: 10168) Writing data at offset 10670309376(MB: 10176) Writing data at offset 10678697984(MB: 10184) Writing data at offset 10687086592(MB: 10192) Writing data at offset 10695475200(MB: 10200) Writing data at offset 10703863808(MB: 10208) Writing data at offset 10712252416(MB: 10216) Writing data at offset 10720641024(MB: 10224) Writing data at offset 10729029632(MB: 10232) done real 4m25.849s user 0m4.523s sys 0m7.037s [root@dell-per630-11 ceph]# rbd info images/rhel7.1 rbd image 'rhel7.1': size 10240 MB in 1280 objects order 23 (8192 kB objects) block_name_prefix: rbd_data.17632b6238e1f29 format: 2 features: layering flags: # time rbd --id=glance --image-format=2 -p images import /root/rhel-guest-image-7.2-20151102.0.x86_64.raw rhel7 Importing image: 100% complete...done. real 0m20.950s user 0m9.691s sys 0m3.212s # rbd info images/rhel7 rbd image 'rhel7': size 10240 MB in 2560 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.1743256238e1f29 format: 2 features: layering flags: --- ---- Script which we used for python binding is referenced from RBD glance driver ---------------------------------------------------------------------- import os import sys import math from oslo_utils import units try: import rados import rbd except ImportError: rados = None rbd = None if len(sys.argv) <4 : sys.exit('Usage: %s <image_name> <image_path> <chunk_size_in_MB>') image_name = sys.argv[1] image_path = sys.argv[2] chunk = int(sys.argv[3]) chunk_size = chunk * units.Mi pool = 'images' #pool = 'svl-fab1-aos-glance-pool-01' user = 'glance' #user = 'svl-fab1-aos-glance-usr' conf_file = '/etc/ceph/ceph.conf' radosid= 'glance' #radosid='svl-fab1-aos-glance-usr' connect_timeout = 0 image_size = os.path.getsize(image_path) order = int(math.log(chunk_size, 2)) print image_name print image_path print image_size print chunk print chunk_size print pool print user print conf_file print connect_timeout print radosid with rados.Rados(conffile=conf_file, rados_id=radosid) as cluster: with cluster.open_ioctx(pool) as ioctx: rbd_inst = rbd.RBD() size = image_size rbd_inst.create(ioctx, image_name, size, order,old_format=False,features=1) with rbd.Image(ioctx, image_name) as image: f = open(image_path, "rb") try: offset = 0 data = f.read(chunk_size) while data != "": print "Writing data at offset " + str(offset) + "(MB: " + str(offset / units.Mi) + ")" offset += image.write(data,offset) data = f.read(chunk_size) finally: f.close() print "done" ----------- Additional info: This is the glance code: glance_store/_drivers/rbd.py 168 with self.store.get_connection(conffile=self.conf_file, 169 rados_id=self.user) as conn: 170 with conn.open_ioctx(self.pool) as ioctx: 171 with rbd.Image(ioctx, self.name, 172 snapshot=self.snapshot) as image: 173 img_info = image.stat() 174 size = img_info['size'] 175 bytes_left = size 176 while bytes_left > 0: 177 length = min(self.chunk_size, bytes_left) 178 data = image.read(size - bytes_left, length) 179 bytes_left -= len(data) 180 yield data 181 raise StopIteration() 182 except rbd.ImageNotFound: 183 raise exceptions.NotFound( 184 _('RBD image %s does not exist') % self.name)
The main reason here is that the rbd CLI uses aio to have 10 concurrent read/write requests in flight concurrently (controlled by the "rbd concurrent management ops" config value). The python API bindings for RBD do not use aio; thus, threading needs to be implemented in the client code (=glance) to improve performance.
@Sean: why was this closed? There is an associated upstream Glance review in-progress for this feature [1] [1] https://review.openstack.org/#/c/430641/
Just to rehash this BZ once more: the RBD Python bindings now offer AIO interfaces (since the kraken release) [1] so there is no need to use "threading" to solve this performance bottleneck. Instead, offer a new "max concurrent IOs"-like config override (we use 10 in the rbd CLI) and issue up to the configured max concurrent IO limit when reading from and writing to RBD from Glance. [1] https://github.com/ceph/ceph/blob/luminous/src/pybind/rbd/rbd.pyx#L2534
> (we use 10 in the rbd CLI) @Jason: could you point us to the relevant code in the rbd CLI?
(In reply to Cyril Roelandt from comment #12) > > (we use 10 in the rbd CLI) > > @Jason: could you point us to the relevant code in the rbd CLI? The rbd CLI is written in C++ not Python, but it's here [1]. [1] https://github.com/ceph/ceph/blob/master/src/tools/rbd/action/Import.cc#L743
Closing as duplicate of the sparse Image RFE This RFE is meant to improve Glance RBD image upload which will be solved by: 1. Support for sparse images - https://bugzilla.redhat.com/show_bug.cgi?id=1647041 2. Ramp up rbd resize to avoid excessive calls - https://bugzilla.redhat.com/show_bug.cgi?id=1690726 3. Set default rbd concurrent management ops = 20 - https://bugzilla.redhat.com/show_bug.cgi?id=1886175 + Comment 11 mentioning that "RBD Python bindings now offer AIO interfaces" Feel free to reopen this RFE and add more inputs if the above does not meet the expectations. *** This bug has been marked as a duplicate of bug 1647041 ***