Description of problem: During stress testing with a high I/O load the system appeared to deadlock with an ionice process having the following backtrace: PID: 7275 TASK: ffff810ce88d9040 CPU: 15 COMMAND: "ionice" #0 [ffff81087fc66f20] crash_nmi_callback at ffffffff8007bf44 #1 [ffff81087fc66f40] do_nmi at ffffffff8006688a #2 [ffff81087fc66f50] nmi at ffffffff80065eef [exception RIP: .text.lock.spinlock+2] RIP: ffffffff80065bfc RSP: ffff810d4188be80 RFLAGS: 00000086 RAX: 0000000000000000 RBX: ffff810c7eda5428 RCX: 0000000000000220 RDX: ffff810e08dd8860 RSI: ffff810c7eda5428 RDI: ffffffff8032c680 RBP: ffff81047fb6f000 R8: 0000000000003bd2 R9: 0000000000000000 R10: 0000000000000001 R11: 0000000000000202 R12: ffff810c59c70c30 R13: 0000000000000220 R14: ffff810e08dd8860 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #3 [ffff810d4188be80] .text.lock.spinlock at ffffffff80065bfc (via _spin_lock) #4 [ffff810d4188be80] cfq_drop_dead_cic at ffffffff8014cbf9 #5 [ffff810d4188bea0] cfq_cic_rb_lookup at ffffffff8014cc6f #6 [ffff810d4188bec0] cfq_get_queue at ffffffff8014d1bd #7 [ffff810d4188bf00] cfq_ioc_set_ioprio at ffffffff8014e82d #8 [ffff810d4188bf20] set_task_ioprio at ffffffff800f5a9c #9 [ffff810d4188bf40] sys_ioprio_set at ffffffff800f5d0b #10 [ffff810d4188bf80] tracesys at ffffffff8005e28d (via system_call) RIP: 000000358f6d07c9 RSP: 00007fff399969a8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: ffffffff8005e28d RCX: ffffffffffffffff RDX: 0000000000004002 RSI: 0000000000003bd2 RDI: 0000000000000001 RBP: 0000000000003bd2 R8: 0000000000000005 R9: 0000000000000000 R10: 0000000000000001 R11: 0000000000000202 R12: 0000000000000002 R13: 0000000000004002 R14: 0000000000000001 R15: 0000000000000005 ORIG_RAX: 00000000000000fb CS: 0033 SS: 002b This seems to be attempting to acquire cfq_exit_lock(*2, *3) recursively while holding tasklist_lock(*1): sys_ioprio_set() read_lock_irq(&tasklist_lock) --(*1) set_task_ioprio() cfq_ioc_set_ioprio() spin_lock(&cfq_exit_lock) --(*2) <------------* changed_ioprio() | cfq_qet_queue() | cfq_cic_rb_lookup() | cfq_drop_dead_cic() | spin_lock(&cfq_exit_lock) --(*3) Version-Release number of selected component (if applicable): 2.6.18-194.el5 How reproducible: Very difficult; during one test run the problem was seen after ~4hrs testing. In another run the system ran for 60hrs without hitting the deadlock. Steps to Reproduce: 1. Present a high I/O load to a system using cfq 2. Repeatedly run ionice 3. Actual results: Deadlock shown above. Expected results: No deadlock even under high load. Additional info: Upstream killed cfq_exit_lock in the following commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fc46379daf90dce57bf765c81d3b39f55150aac2
Seemed to be a few sticking points to backporting the change referenced in comment #0 so getting this into BZ so that others can comment. The changes modify the io_context struct: diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index a1e2880..79cb9fa 100644 (file) --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -90,7 +90,7 @@ struct io_context { atomic_t refcount; struct task_struct *task; - int (*set_ioprio)(struct io_context *, unsigned int); + unsigned int ioprio_changed; /* * For request batching Not sure how much this matters since allocation seems to be done centrally and I couldn't find any of the symbols this would change on the kabi lists. The commit in comment #0 does seem to rely on Jens' cfq cleanups however: commit 89850f7ee905410c89f9295e89dc4c33502a34ac Author: Jens Axboe <axboe> Date: Sat Jul 22 16:48:31 2006 +0200 [PATCH] cfq-iosched: cleanups, fixes, dead code removal A collection of little fixes and cleanups: - We don't use the 'queued' sysfs exported attribute, since the may_queue() logic was rewritten. So kill it. - Remove dead defines. - cfq_set_active_queue() can be rewritten cleaner with else if conditions. - Several places had cfq_exit_cfqq() like logic, abstract that out and use that. - Annotate the cfqq kmem_cache_alloc() so the allocator knows that this is a repeat allocation if it fails with __GFP_WAIT set. Allows the allocator to start freeing some memory, if needed. CFQ already loops for this condition, so might as well pass the hint down. - Remove cfqd->rq_starved logic. It's not needed anymore after we dropped the crq allocation in cfq_set_request(). - Remove uneeded parameter passing. Signed-off-by: Jens Axboe <axboe> I'll look at this again when I have more time to sit down and concentrate on it but wanted to get it into BZ without too much delay.
This is already fixed. *** This bug has been marked as a duplicate of bug 582435 ***