Bug 818220

Summary: qla2xxx does a spinlock with interrupts disabled
Product: Red Hat Enterprise MRG Reporter: David Sommerseth <davids>
Component: realtime-kernelAssignee: John Kacur <jkacur>
Status: CLOSED ERRATA QA Contact: David Sommerseth <davids>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 2.1CC: bhu, lgoncalv, ovasik, williams
Target Milestone: 2.1.8   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, the qla2x00_poll() function did the local_irq_save() call before calling qla24xx_intr_handler(), which had a spinlock. Since spinlocks are sleepable in the real-time kernel, it is not allowed to call them with interrupts disabled. This scenario produced error messages and could cause a system deadlock. With this update, the local_irq_save_nort(flags) function is used to save flags without disabling interrupts, which prevents potential deadlocks and removes the error messages.
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-05-15 23:09:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Sommerseth 2012-05-02 13:45:34 UTC
Description of problem:
When booting kernel-rt-debug-3.0.25-rt44.57.el6rt on some boxes with the qla2xxx adapter, the following splat can be observed:

[   10.349737] BUG: sleeping function called from invalid context at kernel/rtmutex.c:646 
[   10.349744] in_atomic(): 0, irqs_disabled(): 1, pid: 2845, name: work_for_cpu 
[   10.349753] Pid: 2845, comm: work_for_cpu Not tainted 3.0.25-rt44.57.el6rt.x86_64.debug #1 
[   10.349759] Call Trace: 
[   10.349786]  [<ffffffff8103dbde>] __might_sleep+0xce/0xf0 
[   10.349801]  [<ffffffff814d0734>] rt_spin_lock+0x24/0x50 
[   10.349856]  [<ffffffffa02163fb>] qla24xx_intr_handler+0x5b/0x370 [qla2xxx] 
[   10.349871]  [<ffffffff814ce644>] ? wait_for_common+0x144/0x1a0 
[   10.349884]  [<ffffffff814d3efd>] ? sub_preempt_count+0x9d/0xd0 
[   10.349920]  [<ffffffffa0208da4>] qla2x00_poll+0x44/0x50 [qla2xxx] 
[   10.349952]  [<ffffffffa0209173>] qla2x00_mailbox_command+0x3c3/0x8c0 [qla2xxx] 
[   10.349986]  [<ffffffffa020be55>] qla2x00_mbx_reg_test+0x65/0xf0 [qla2xxx] 
[   10.350000]  [<ffffffff81307c8a>] ? __dev_printk+0x3a/0x90 
[   10.350006]  [<ffffffff81307fc5>] ? dev_printk+0x45/0x50 
[   10.350006]  [<ffffffffa0201494>] qla24xx_chip_diag+0x64/0xc0 [qla2xxx] 
[   10.350006]  [<ffffffffa020660d>] qla2x00_initialize_adapter+0x2fd/0x3a0 [qla2xxx] 
[   10.350006]  [<ffffffffa01f9e34>] ? kzalloc+0x14/0x20 [qla2xxx] 
[   10.350006]  [<ffffffffa0234043>] qla2x00_probe_one+0xd5e/0x1d1b [qla2xxx] 
[   10.350006]  [<ffffffff814d0ad3>] ? _raw_spin_lock+0x23/0x30 
[   10.350006]  [<ffffffff8126251f>] local_pci_probe+0x5f/0xd0 
[   10.350006]  [<ffffffff8106d4b0>] ? cpumask_weight+0x20/0x20 
[   10.350006]  [<ffffffff8106d4c8>] do_work_for_cpu+0x18/0x30 
[   10.350006]  [<ffffffff81075ee6>] kthread+0xa6/0xb0 
[   10.350006]  [<ffffffff810419fc>] ? finish_task_switch+0x6c/0xf0 
[   10.350006]  [<ffffffff814d8f34>] kernel_thread_helper+0x4/0x10 
[   10.350006]  [<ffffffff81075e40>] ? kthreadd+0x180/0x180 
[   10.350006]  [<ffffffff814d8f30>] ? gs_change+0xb/0xb 

Version-Release number of selected component (if applicable):
3.0.25-rt44.57.el6rt

How reproducible:
Always on affected hardware.

Steps to Reproduce:
1. Install kernel-rt-debug-3.0.25-rt44.57.el6rt
2. Boot kernel without 'quiet' and 'rhgb' in the kernel command line
3. Observer console output, or dmesg after logging in.
  
Additional info:
This core issue is also present on all other kernel variants, but kernel-rt-debug is the one which complains about sleeping function being called from wrong context.

Comment 3 David Sommerseth 2012-05-09 10:57:51 UTC
Verified by booting kernel-rt-debug kernels on a box with a qla2xxx adapter.

Double checked that 3.0.25-rt44.57.el6rt.x86_64.debug provides the backtrace, which still was the issue.  Upgraded to 3.0.30-rt50.62.el6rt.x86_64.debug and this issue is solved.

-> VERIFIED

Comment 4 John Kacur 2012-05-09 20:33:08 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: Function qla2x00_poll does local_irq_save() before calling qla24xx_intr_hand which has a spinlock. Since spinlocks are sleepable on rt, it is not allowed to call them with interrupts disabled.

Consequence: BUG: sleeping function called from invalid context at kernel/rtmutex.c:646 reported multiple times.

Fix: Use local_irq_save_nort(flags) to save flags without disabling interrupts.

Result: Potential deadlock is avoided, and the error message goes away.

Comment 5 Murray McAllister 2012-05-15 06:14:35 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,4 +1,4 @@
-Cause: Function qla2x00_poll does local_irq_save() before calling qla24xx_intr_hand which has a spinlock. Since spinlocks are sleepable on rt, it is not allowed to call them with interrupts disabled.
+Cause: Function qla2x00_poll does local_irq_save() before calling qla24xx_intr_handler which has a spinlock. Since spinlocks are sleepable on rt, it is not allowed to call them with interrupts disabled.
 
 Consequence: BUG: sleeping function called from invalid context at kernel/rtmutex.c:646 reported multiple times.

Comment 6 errata-xmlrpc 2012-05-15 23:09:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0670.html

Comment 7 Tomas Capek 2012-05-16 11:21:51 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,7 +1 @@
-Cause: Function qla2x00_poll does local_irq_save() before calling qla24xx_intr_handler which has a spinlock. Since spinlocks are sleepable on rt, it is not allowed to call them with interrupts disabled.
+Previously, the qla2x00_poll() function did the local_irq_save() call before calling qla24xx_intr_handler(), which had a spinlock. Since spinlocks are sleepable in the real-time kernel, it is not allowed to call them with interrupts disabled. This scenario produced error messages and could cause a system deadlock. With this update, the local_irq_save_nort(flags) function is used to save flags without disabling interrupts, which prevents potential deadlocks and removes the error messages.-
-Consequence: BUG: sleeping function called from invalid context at kernel/rtmutex.c:646 reported multiple times.
-
-Fix: Use local_irq_save_nort(flags) to save flags without disabling interrupts.
-
-Result: Potential deadlock is avoided, and the error message goes away.