Bug 1813960 - [RFE] improve IO throughput and latency of librbd
Summary: [RFE] improve IO throughput and latency of librbd
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RBD
Version: 5.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 5.0
Assignee: Ilya Dryomov
QA Contact: Harish Munjulur
Ranjini M N
URL:
Whiteboard:
: 1906857 (view as bug list)
Depends On:
Blocks: 1897572 1959686
TreeView+ depends on / blocked
 
Reported: 2020-03-16 15:01 UTC by Jason Dillaman
Modified: 2024-03-05 13:00 UTC (History)
22 users (show)

Fixed In Version: ceph-16.0.0-8633.el8cp
Doc Type: Enhancement
Doc Text:
.Improved librbd small I/O performance Previously, in an NVMe based Ceph cluster, there were limitations in the internal threading architecture resulting in a single librbd client struggling to achieve more than 20K 4KiB IOPS. With this release, librbd is switched to an asynchronous reactor model on top of the new ASIO-based neorados API thereby increasing the small I/O throughput potentially by several folds and reducing latency.
Clone Of:
Environment:
Last Closed: 2021-08-30 08:23:52 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-1137 0 None None None 2021-08-30 00:12:48 UTC
Red Hat Product Errata RHBA-2021:3294 0 None None None 2021-08-30 08:24:33 UTC

Internal Links: 1825035

Description Jason Dillaman 2020-03-16 15:01:06 UTC
In flash-based Ceph clusters, a single librbd client would struggle to achieve more than 20K 4KiB IOPS while a krbd client could achieve nearly 4x more single-client throughput. Improvements to the librbd IO path have improved small IO throughput and reduced latency to vastly narrow the gap between krbd and librbd in such clusters.

Comment 1 RHEL Program Management 2020-03-16 15:01:14 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 2 David Hill 2020-11-05 21:19:12 UTC
Hey!  How close to the krbd performances will this improvement bring us ?   It looks like using krbd intead of librados brings us 2x the performances at least .   Will it be like this [1] ?


[1] https://access.redhat.com/solutions/5514611

Comment 3 Jason Dillaman 2020-11-05 21:49:35 UTC
I think you'll need to test it against specific clusters and workloads to know for sure. There isn't a one-size-fits-all answer unfortunately. The 20K IOPS wall was a known issue in librbd when the cache was enabled, so I'd expect the results to be much closer if retested.

Comment 9 Harish Munjulur 2021-04-25 12:14:15 UTC
QA verified. 

Completed testing by Teuthology Librbd tests.

Comment 10 Yaniv Kaul 2021-04-25 12:21:14 UTC
(In reply to Harish Munjulur from comment #9)
> QA verified. 
> 
> Completed testing by Teuthology Librbd tests.

That's great functionality-wise, but this BZ is about performance - we need to ensure we deliver better throughput and lower latency here.

Comment 17 Giulio Fidente 2021-04-27 10:25:37 UTC
*** Bug 1906857 has been marked as a duplicate of this bug. ***

Comment 35 errata-xmlrpc 2021-08-30 08:23:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 5.0 bug fix and enhancement), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3294


Note You need to log in before you can comment on or make changes to this bug.