QEMU runs these from its threads, so blocking here causes the vcpu to lock up from the guest's perspective, resulting in messages in linux guests like BUG: soft lockup - CPU#1 stuck for 520s! One cause that has been observed is via Objecter-level throttling of requests: With caching disabled, this blocks directly in aio_read/write(). With caching enabled, the cache does i/o while holding its lock, and if it is throttled in its flusher thread, for example, readx() and writex() will block waiting on the cache lock.
Josh, is https://github.com/ceph/ceph/pull/4364 good to take downstream?
I'd like to add the first and last commits of https://github.com/ceph/ceph/pull/4827 to that too, in case some workload we haven't tried has a performance regression from the added workqueue.
Jason closed https://github.com/ceph/ceph/pull/4364 today, so we won't take that one downstream. I'll continue monitoring the upstream tickets for progress.
New upstream PR is https://github.com/ceph/ceph/pull/4854
Jason, I'm having trouble cherry-picking the changes in https://github.com/ceph/ceph/pull/4854 onto the downstream version of v0.80.8. It's branch "rhcs-0.80.8" in GitHub. $ git checkout rhcs-0.80.8 $ git cherry-pick 3fea27c7f6b1b1403bce4d7736367975798a8634~..upstream/wip-11769-firefly [rhcs-v0.80.8 8afe9dd] CephContext: Add AssociatedSingletonObject to allow CephContext's singleton Author: Haomai Wang <haomaiwang> Date: Mon Dec 1 23:54:16 2014 +0800 2 files changed, 26 insertions(+) [rhcs-v0.80.8 6153821] common/ceph_context: don't import std namespace Author: Sage Weil <sage> Date: Fri Dec 5 14:21:08 2014 -0800 1 file changed, 2 insertions(+), 3 deletions(-) error: could not apply 9faaeae... librbd: add task pool / work queue for AIO requests hint: after resolving the conflicts, mark the corrected paths hint: with 'git add <paths>' or 'git rm <paths>' hint: and commit the result with 'git commit' Hopefully I'm doing that right? Can you please cherry-pick these changes onto this branch?
The Ceph builds @ https://access.redhat.com/errata/RHBA-2015:1527 included the bug fix for this bug.