Bug 1225171 - librbd: aio calls may block
Summary: librbd: aio calls may block
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RBD
Version: 1.2.3
Hardware: All
OS: Linux
unspecified
high
Target Milestone: rc
: 1.2.4
Assignee: Jason Dillaman
QA Contact: Warren
URL:
Whiteboard:
Depends On:
Blocks: 1228319 1231969
TreeView+ depends on / blocked
 
Reported: 2015-05-26 18:31 UTC by Josh Durgin
Modified: 2017-07-30 15:26 UTC (History)
6 users (show)

Fixed In Version: ceph-0.80.8-10.el6cp ceph-0.80.8-10.el7cp
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1225172 1225188 1225189 1231969 (view as bug list)
Environment:
Last Closed: 2015-08-05 22:18:06 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 11056 0 None None None Never
Ceph Project Bug Tracker 11769 0 None None None Never

Description Josh Durgin 2015-05-26 18:31:21 UTC
QEMU runs these from its threads, so blocking here causes the vcpu to lock up from the guest's perspective, resulting in messages in linux guests like

BUG: soft lockup - CPU#1 stuck for 520s!

One cause that has been observed is via Objecter-level throttling of requests:

With caching disabled, this blocks directly in aio_read/write().

With caching enabled, the cache does i/o while holding its lock, and if it is throttled in its flusher thread, for example, readx() and writex() will block waiting on the cache lock.

Comment 1 Ken Dreyer (Red Hat) 2015-06-02 21:12:24 UTC
Josh, is https://github.com/ceph/ceph/pull/4364 good to take downstream?

Comment 2 Josh Durgin 2015-06-02 21:49:32 UTC
I'd like to add the first and last commits of https://github.com/ceph/ceph/pull/4827 to that too, in case some workload we haven't tried has a performance regression from the added workqueue.

Comment 3 Ken Dreyer (Red Hat) 2015-06-03 20:11:53 UTC
Jason closed https://github.com/ceph/ceph/pull/4364 today, so we won't take that one downstream. I'll continue monitoring the upstream tickets for progress.

Comment 4 Jason Dillaman 2015-06-04 00:09:17 UTC
New upstream PR is https://github.com/ceph/ceph/pull/4854

Comment 5 Ken Dreyer (Red Hat) 2015-06-10 01:35:56 UTC
Jason, I'm having trouble cherry-picking the changes in https://github.com/ceph/ceph/pull/4854 onto the downstream version of v0.80.8. It's branch "rhcs-0.80.8" in GitHub.

$ git checkout rhcs-0.80.8
$ git cherry-pick  3fea27c7f6b1b1403bce4d7736367975798a8634~..upstream/wip-11769-firefly
[rhcs-v0.80.8 8afe9dd] CephContext: Add AssociatedSingletonObject to allow CephContext's singleton
 Author: Haomai Wang <haomaiwang>
 Date: Mon Dec 1 23:54:16 2014 +0800
 2 files changed, 26 insertions(+)
[rhcs-v0.80.8 6153821] common/ceph_context: don't import std namespace
 Author: Sage Weil <sage>
 Date: Fri Dec 5 14:21:08 2014 -0800
 1 file changed, 2 insertions(+), 3 deletions(-)
error: could not apply 9faaeae... librbd: add task pool / work queue for AIO requests
hint: after resolving the conflicts, mark the corrected paths
hint: with 'git add <paths>' or 'git rm <paths>'
hint: and commit the result with 'git commit'

Hopefully I'm doing that right?

Can you please cherry-pick these changes onto this branch?

Comment 11 Ken Dreyer (Red Hat) 2015-08-05 22:18:06 UTC
The Ceph builds @ https://access.redhat.com/errata/RHBA-2015:1527 included the bug fix for this bug.


Note You need to log in before you can comment on or make changes to this bug.