Bug 1794704

Summary: Luminous Ceph client takes 10X as long to export image as mimic or nautilus
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Tupper Cole <tcole>
Component: RBDAssignee: Jason Dillaman <jdillama>
Status: CLOSED ERRATA QA Contact: Madhavi Kasturi <mkasturi>
Severity: medium Docs Contact: Karen Norteman <knortema>
Priority: medium    
Version: 3.3CC: ceph-eng-bugs, jdillama, mkasturi, pasik, rmandyam, sergio.carvalho, tserlin
Target Milestone: z4   
Target Release: 3.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-12.2.12-91.el7cp Ubuntu: ceph_12.2.12-88redhat1 Doc Type: Bug Fix
Doc Text:
.The Ceph v12 client (luminous) takes significantly longer to export an image to the Ceph cluster than do the Ceph v13 (mimic) or v14 (nautilus) clients on the same cluster The `rbd export` command in the v12 client had a read queue depth of 1, which means that the command issued only one read request at a time to the cluster when exporting to STDOUT. The v12 client now supports up to 10 concurrent read requests to the cluster, resulting in a significant increase in speed.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-06 08:27:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1726135    
Attachments:
Description Flags
Python script to read an RBD image from Ceph similar to the RBD driver for Glance none

Description Tupper Cole 2020-01-24 11:35:09 UTC
Description of problem:Customer uses glance to export rbd images. Using the luminous client, a ~6GB image takes roughly a minute to export the first time (subsequent reads get faster, presumably due to cacheing). Using mimc or nautilus the export takes ~6 seconds.


Version-Release number of selected component (if applicable):

ceph version 12.2.11 (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable)
VS
ceph version 14.2.4-29.0.TEST.bz1777064.el7cp (7680734a26329b5cbae64651a22ac1605344d785) nautilus (stable)

(also tested with mimic- similar results to luminous.

How reproducible: consistent


Steps to Reproduce:
1.List glance images
2.Export image

Actual results:
Luminous takes ~60 seconds, vs ~6 seonds for mimic or nautilus.

Expected results:
Consistent behavior across versions. 

Additional info:

Comment 1 Sergio de Carvalho 2020-01-24 12:06:26 UTC
I did some further tests where I generated a new random 6.7GB file and did an RBD import & export with each client, to avoid any caching interfering with the results. The three clients connected to the same Ceph cluster (running Luminous) using the same client configuration, same credentials, same pool. The luminous client is approximately 10 times slower on the first RBD export than both Mimic and Nautilus clients.


Ceph cluster
============

ceph version 12.2.8-128.el7cp (030358773c5213a14c1444a5147258672b2dc15f) luminous (stable)
8 nodes, 72 OSDs

Client config
=============

# Same config for all clients
$ cat ceph.conf_devprod
[global]
fsid = 45e17393-84f4-4d31-bce3-0765a92314cb
mon host = s-65kw2w2.sys.az1.eng.pdx.wd,s-65dy2w2.sys.az1.eng.pdx.wd,s-65l13w2.sys.az1.eng.pdx.wd

Luminous client
===============

$ ceph --version
ceph version 12.2.11 (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable)

# Create a random 6.7GB file
$ dd if=<(openssl enc -aes-256-ctr -pass pass:"$(dd if=/dev/urandom bs=128 count=1 2>/dev/null | base64)" -nosalt < /dev/zero) bs=1M count=6800 iflag=fullblock of=/tmp/large.image
6800+0 records in
6800+0 records out
7130316800 bytes (7.1 GB) copied, 10.2746 s, 694 MB/s

# Import file as an RBD image
$ uuid=$(uuidgen); time rbd import -c /etc/ceph/ceph.conf_devprod -p wpc4-pg1-glance --user wpc4-pg1-glance -k /etc/ceph/ceph.client.wpc4-pg1-glance.keyring_devprod /tmp/large.image $uuid
rbd: --pool is deprecated for import, use --dest-pool
Importing image: 100% complete...done.

real	0m5.836s
user	0m4.387s
sys	0m4.525s

# Export RBD image (not saving to disk)
$ time rbd export -c /etc/ceph/ceph.conf_devprod -p wpc4-pg1-glance --user wpc4-pg1-glance -k /etc/ceph/ceph.client.wpc4-pg1-glance.keyring_devprod $uuid - > /dev/null
Exporting image: 100% complete...done.

real	1m9.896s
user	0m1.401s
sys	0m6.136s

Mimic client
============

$ ceph --version
ceph version 13.2.8 (5579a94fafbc1f9cc913a0f5d362953a5d9c3ae0) mimic (stable)

# Create a random 6.7GB file
$ dd if=<(openssl enc -aes-256-ctr -pass pass:"$(dd if=/dev/urandom bs=128 count=1 2>/dev/null | base64)" -nosalt < /dev/zero) bs=1M count=6800 iflag=fullblock of=/tmp/large.image
6800+0 records in
6800+0 records out
7130316800 bytes (7.1 GB) copied, 8.93855 s, 798 MB/s

# Import file as an RBD image
$ uuid=$(uuidgen); time rbd import -c ceph.conf_devprod --dest-pool wpc4-pg1-glance --user wpc4-pg1-glance -k ceph.client.wpc4-pg1-glance.keyring_devprod /tmp/large.image $uuid
Importing image: 100% complete...done.

real	0m5.571s
user	0m3.943s
sys	0m4.021s

# Export RBD image (not saving to disk)
$ time rbd export -c ceph.conf_devprod -p wpc4-pg1-glance --user wpc4-pg1-glance -k ceph.client.wpc4-pg1-glance.keyring_devprod $uuid - > /dev/null
Exporting image: 100% complete...done.

real	0m8.800s
user	0m1.033s
sys	0m6.756s

Nautilus client
===============

$ ceph --version
ceph version 14.2.4-29.0.TEST.bz1777064.el7cp (7680734a26329b5cbae64651a22ac1605344d785) nautilus (stable)

# Create a random 6.7GB file
$ dd if=<(openssl enc -aes-256-ctr -pass pass:"$(dd if=/dev/urandom bs=128 count=1 2>/dev/null | base64)" -nosalt < /dev/zero) bs=1M count=6800 iflag=fullblock of=/tmp/large.image
6800+0 records in
6800+0 records out
7130316800 bytes (7.1 GB) copied, 8.0649 s, 884 MB/s

# Import file as an RBD image
$ uuid=$(uuidgen); time rbd import -c ceph.conf_devprod --dest-pool wpc4-pg1-glance --user wpc4-pg1-glance -k ceph.client.wpc4-pg1-glance.keyring_devprod /tmp/large.image $uuid
Importing image: 100% complete...done.

real	0m5.742s
user	0m3.810s
sys	0m5.302s

# Export RBD image (not saving to disk)
$ time rbd export -c ceph.conf_devprod -p wpc4-pg1-glance --user wpc4-pg1-glance -k ceph.client.wpc4-pg1-glance.keyring_devprod $uuid - > /dev/null
Exporting image: 100% complete...done.

real	0m9.494s
user	0m1.502s
sys	0m11.228s

Comment 2 Sergio de Carvalho 2020-01-24 12:18:10 UTC
I also repeated the tests pointing the three clients to a 8-node 72-OSD Red Hat Ceph Storage 4 cluster (ceph version 14.2.4-29.0.TEST.bz1777064.el7cp (7680734a26329b5cbae64651a22ac1605344d785) nautilus (stable)), and noticed the same slowdown with the Nautilus client.

Comment 3 Jason Dillaman 2020-01-24 13:30:20 UTC
Older releases of Ceph (including older Mimic and Nautilus releases) limited the read queue depth to 1 when exporting to STDOUT. This was improved with later point releases (i.e. v14.2.3 was the first Nautilus release to include the improvement).

Comment 4 Sergio de Carvalho 2020-01-24 13:52:47 UTC
I'm also seeing a significant slowdown when using the Python RBD library to read the image. I wrote a simple Python script that simulates what the Glance RBD driver does when reading and image from Ceph. I'll attach the Python script to this ticket. It is based on the Glance RBD driver:

https://github.com/openstack/glance_store/blob/master/glance_store/_drivers/rbd.py#L223

I ran this script from one of the Ceph nodes inside this RHCS 3 cluster (Nautilus) and it also took more than a minute to read a new 7GB image in 4MB chunks.

# ceph --version
ceph version 12.2.8-128.el7cp (030358773c5213a14c1444a5147258672b2dc15f) luminous (stable)

# dd if=<(openssl enc -aes-256-ctr -pass pass:"$(dd if=/dev/urandom bs=128 count=1 2>/dev/null | base64)" -nosalt < /dev/zero) bs=1M count=6800 iflag=fullblock of=/tmp/large.image
6800+0 records in
6800+0 records out
7130316800 bytes (7.1 GB) copied, 8.14506 s, 875 MB/s

# uuid=$(uuidgen); time rbd import --dest-pool wpc4-pg1-glance /tmp/large.image $uuid
Importing image: 100% complete...done.

real	0m6.675s
user	0m3.405s
sys	0m4.231s

# time ./rbdread.py $uuid
Image size: 7130316800 bytes
Image read successfully

real	1m1.907s
user	0m2.314s
sys	0m5.409s

Comment 5 Sergio de Carvalho 2020-01-24 13:54:04 UTC
Created attachment 1655080 [details]
Python script to read an RBD image from Ceph similar to the RBD driver for Glance

Comment 6 Jason Dillaman 2020-01-24 14:08:43 UTC
(In reply to Sergio de Carvalho from comment #5)
> Created attachment 1655080 [details]
> Python script to read an RBD image from Ceph similar to the RBD driver for
> Glance

This is essentially a queue depth of 1 just like Glance, and therefore suffers the same performance problems due to client<->OSD read latency and lack of parallelism. See BZ1389529 for the RFE against Glance to support queue depth >1 for Glance import operations.

Comment 8 Sergio de Carvalho 2020-01-24 14:47:03 UTC
That makes sense now, Jason.

I did a test exporting the image to a file and noticed an improvement in download speed. Even though there was a big delay in writing the actual file to disk - the total export time dropped from 70s to about 35s.

I see what you mean with the Glance RBD driver being inefficient. I guess there isn't much we can do there other than writing a threaded version ourselves...

Thanks very much for your help.

Comment 16 errata-xmlrpc 2020-04-06 08:27:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1320