Bug 618211 - Running ionice under I/O load may deadlock on cfq_exit_lock
Summary: Running ionice under I/O load may deadlock on cfq_exit_lock
Keywords:
Status: CLOSED DUPLICATE of bug 582435
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.5
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Red Hat Kernel Manager
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-07-26 12:31 UTC by Bryn M. Reeves
Modified: 2010-07-26 20:33 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-07-26 20:33:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Bryn M. Reeves 2010-07-26 12:31:41 UTC
Description of problem:
During stress testing with a high I/O load the system appeared to deadlock with an ionice process having the following backtrace:

PID: 7275   TASK: ffff810ce88d9040  CPU: 15  COMMAND: "ionice"
 #0 [ffff81087fc66f20] crash_nmi_callback at ffffffff8007bf44
 #1 [ffff81087fc66f40] do_nmi at ffffffff8006688a
 #2 [ffff81087fc66f50] nmi at ffffffff80065eef
    [exception RIP: .text.lock.spinlock+2]
    RIP: ffffffff80065bfc  RSP: ffff810d4188be80  RFLAGS: 00000086
    RAX: 0000000000000000  RBX: ffff810c7eda5428  RCX: 0000000000000220
    RDX: ffff810e08dd8860  RSI: ffff810c7eda5428  RDI: ffffffff8032c680
    RBP: ffff81047fb6f000   R8: 0000000000003bd2   R9: 0000000000000000
    R10: 0000000000000001  R11: 0000000000000202  R12: ffff810c59c70c30
    R13: 0000000000000220  R14: ffff810e08dd8860  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #3 [ffff810d4188be80] .text.lock.spinlock at ffffffff80065bfc (via _spin_lock)
 #4 [ffff810d4188be80] cfq_drop_dead_cic at ffffffff8014cbf9
 #5 [ffff810d4188bea0] cfq_cic_rb_lookup at ffffffff8014cc6f
 #6 [ffff810d4188bec0] cfq_get_queue at ffffffff8014d1bd
 #7 [ffff810d4188bf00] cfq_ioc_set_ioprio at ffffffff8014e82d
 #8 [ffff810d4188bf20] set_task_ioprio at ffffffff800f5a9c
 #9 [ffff810d4188bf40] sys_ioprio_set at ffffffff800f5d0b
#10 [ffff810d4188bf80] tracesys at ffffffff8005e28d (via system_call)
    RIP: 000000358f6d07c9  RSP: 00007fff399969a8  RFLAGS: 00000202
    RAX: ffffffffffffffda  RBX: ffffffff8005e28d  RCX: ffffffffffffffff
    RDX: 0000000000004002  RSI: 0000000000003bd2  RDI: 0000000000000001
    RBP: 0000000000003bd2   R8: 0000000000000005   R9: 0000000000000000
    R10: 0000000000000001  R11: 0000000000000202  R12: 0000000000000002
    R13: 0000000000004002  R14: 0000000000000001  R15: 0000000000000005
    ORIG_RAX: 00000000000000fb  CS: 0033  SS: 002b

This seems to be attempting to acquire cfq_exit_lock(*2, *3) recursively while holding tasklist_lock(*1):

  sys_ioprio_set()
     read_lock_irq(&tasklist_lock) --(*1)
     set_task_ioprio()
         cfq_ioc_set_ioprio()
             spin_lock(&cfq_exit_lock) --(*2) <------------*
             changed_ioprio()                              |
                 cfq_qet_queue()                           |
                     cfq_cic_rb_lookup()                   |
                         cfq_drop_dead_cic()               |
                             spin_lock(&cfq_exit_lock) --(*3)

Version-Release number of selected component (if applicable):
2.6.18-194.el5

How reproducible:
Very difficult; during one test run the problem was seen after ~4hrs testing. In another run the system ran for 60hrs without hitting the deadlock.

Steps to Reproduce:
1. Present a high I/O load to a system using cfq
2. Repeatedly run ionice
3.
  
Actual results:
Deadlock shown above.

Expected results:
No deadlock even under high load.

Additional info:
Upstream killed cfq_exit_lock in the following commit:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fc46379daf90dce57bf765c81d3b39f55150aac2

Comment 2 Bryn M. Reeves 2010-07-26 12:52:11 UTC
Seemed to be a few sticking points to backporting the change referenced in comment #0 so getting this into BZ so that others can comment. The changes modify the io_context struct:

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index a1e2880..79cb9fa 100644 (file)
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -90,7 +90,7 @@ struct io_context {
       atomic_t refcount;
       struct task_struct *task;

-       int (*set_ioprio)(struct io_context *, unsigned int);
+       unsigned int ioprio_changed;

       /*
        * For request batching

Not sure how much this matters since allocation seems to be done centrally and I couldn't find any of the symbols this would change on the kabi lists.

The commit in comment #0 does seem to rely on Jens' cfq cleanups however:

commit 89850f7ee905410c89f9295e89dc4c33502a34ac
Author: Jens Axboe <axboe>
Date:   Sat Jul 22 16:48:31 2006 +0200

   [PATCH] cfq-iosched: cleanups, fixes, dead code removal
  
   A collection of little fixes and cleanups:
  
   - We don't use the 'queued' sysfs exported attribute, since the
     may_queue() logic was rewritten. So kill it.
  
   - Remove dead defines.
  
   - cfq_set_active_queue() can be rewritten cleaner with else if conditions.
  
   - Several places had cfq_exit_cfqq() like logic, abstract that out and
     use that.
  
   - Annotate the cfqq kmem_cache_alloc() so the allocator knows that this
     is a repeat allocation if it fails with __GFP_WAIT set. Allows the
     allocator to start freeing some memory, if needed. CFQ already loops for
     this condition, so might as well pass the hint down.
  
   - Remove cfqd->rq_starved logic. It's not needed anymore after we dropped
     the crq allocation in cfq_set_request().
  
   - Remove uneeded parameter passing.
  
   Signed-off-by: Jens Axboe <axboe>

I'll look at this again when I have more time to sit down and concentrate on it but wanted to get it into BZ without too much delay.

Comment 3 Jeff Moyer 2010-07-26 20:33:16 UTC
This is already fixed.

*** This bug has been marked as a duplicate of bug 582435 ***


Note You need to log in before you can comment on or make changes to this bug.