Bug 818220 - qla2xxx does a spinlock with interrupts disabled
qla2xxx does a spinlock with interrupts disabled
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: realtime-kernel (Show other bugs)
2.1
x86_64 Unspecified
unspecified Severity unspecified
: 2.1.8
: ---
Assigned To: John Kacur
David Sommerseth
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-02 09:45 EDT by David Sommerseth
Modified: 2016-05-22 19:34 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, the qla2x00_poll() function did the local_irq_save() call before calling qla24xx_intr_handler(), which had a spinlock. Since spinlocks are sleepable in the real-time kernel, it is not allowed to call them with interrupts disabled. This scenario produced error messages and could cause a system deadlock. With this update, the local_irq_save_nort(flags) function is used to save flags without disabling interrupts, which prevents potential deadlocks and removes the error messages.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-05-15 19:09:56 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description David Sommerseth 2012-05-02 09:45:34 EDT
Description of problem:
When booting kernel-rt-debug-3.0.25-rt44.57.el6rt on some boxes with the qla2xxx adapter, the following splat can be observed:

[   10.349737] BUG: sleeping function called from invalid context at kernel/rtmutex.c:646 
[   10.349744] in_atomic(): 0, irqs_disabled(): 1, pid: 2845, name: work_for_cpu 
[   10.349753] Pid: 2845, comm: work_for_cpu Not tainted 3.0.25-rt44.57.el6rt.x86_64.debug #1 
[   10.349759] Call Trace: 
[   10.349786]  [<ffffffff8103dbde>] __might_sleep+0xce/0xf0 
[   10.349801]  [<ffffffff814d0734>] rt_spin_lock+0x24/0x50 
[   10.349856]  [<ffffffffa02163fb>] qla24xx_intr_handler+0x5b/0x370 [qla2xxx] 
[   10.349871]  [<ffffffff814ce644>] ? wait_for_common+0x144/0x1a0 
[   10.349884]  [<ffffffff814d3efd>] ? sub_preempt_count+0x9d/0xd0 
[   10.349920]  [<ffffffffa0208da4>] qla2x00_poll+0x44/0x50 [qla2xxx] 
[   10.349952]  [<ffffffffa0209173>] qla2x00_mailbox_command+0x3c3/0x8c0 [qla2xxx] 
[   10.349986]  [<ffffffffa020be55>] qla2x00_mbx_reg_test+0x65/0xf0 [qla2xxx] 
[   10.350000]  [<ffffffff81307c8a>] ? __dev_printk+0x3a/0x90 
[   10.350006]  [<ffffffff81307fc5>] ? dev_printk+0x45/0x50 
[   10.350006]  [<ffffffffa0201494>] qla24xx_chip_diag+0x64/0xc0 [qla2xxx] 
[   10.350006]  [<ffffffffa020660d>] qla2x00_initialize_adapter+0x2fd/0x3a0 [qla2xxx] 
[   10.350006]  [<ffffffffa01f9e34>] ? kzalloc+0x14/0x20 [qla2xxx] 
[   10.350006]  [<ffffffffa0234043>] qla2x00_probe_one+0xd5e/0x1d1b [qla2xxx] 
[   10.350006]  [<ffffffff814d0ad3>] ? _raw_spin_lock+0x23/0x30 
[   10.350006]  [<ffffffff8126251f>] local_pci_probe+0x5f/0xd0 
[   10.350006]  [<ffffffff8106d4b0>] ? cpumask_weight+0x20/0x20 
[   10.350006]  [<ffffffff8106d4c8>] do_work_for_cpu+0x18/0x30 
[   10.350006]  [<ffffffff81075ee6>] kthread+0xa6/0xb0 
[   10.350006]  [<ffffffff810419fc>] ? finish_task_switch+0x6c/0xf0 
[   10.350006]  [<ffffffff814d8f34>] kernel_thread_helper+0x4/0x10 
[   10.350006]  [<ffffffff81075e40>] ? kthreadd+0x180/0x180 
[   10.350006]  [<ffffffff814d8f30>] ? gs_change+0xb/0xb 

Version-Release number of selected component (if applicable):
3.0.25-rt44.57.el6rt

How reproducible:
Always on affected hardware.

Steps to Reproduce:
1. Install kernel-rt-debug-3.0.25-rt44.57.el6rt
2. Boot kernel without 'quiet' and 'rhgb' in the kernel command line
3. Observer console output, or dmesg after logging in.
  
Additional info:
This core issue is also present on all other kernel variants, but kernel-rt-debug is the one which complains about sleeping function being called from wrong context.
Comment 3 David Sommerseth 2012-05-09 06:57:51 EDT
Verified by booting kernel-rt-debug kernels on a box with a qla2xxx adapter.

Double checked that 3.0.25-rt44.57.el6rt.x86_64.debug provides the backtrace, which still was the issue.  Upgraded to 3.0.30-rt50.62.el6rt.x86_64.debug and this issue is solved.

-> VERIFIED
Comment 4 John Kacur 2012-05-09 16:33:08 EDT
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: Function qla2x00_poll does local_irq_save() before calling qla24xx_intr_hand which has a spinlock. Since spinlocks are sleepable on rt, it is not allowed to call them with interrupts disabled.

Consequence: BUG: sleeping function called from invalid context at kernel/rtmutex.c:646 reported multiple times.

Fix: Use local_irq_save_nort(flags) to save flags without disabling interrupts.

Result: Potential deadlock is avoided, and the error message goes away.
Comment 5 Murray McAllister 2012-05-15 02:14:35 EDT
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,4 +1,4 @@
-Cause: Function qla2x00_poll does local_irq_save() before calling qla24xx_intr_hand which has a spinlock. Since spinlocks are sleepable on rt, it is not allowed to call them with interrupts disabled.
+Cause: Function qla2x00_poll does local_irq_save() before calling qla24xx_intr_handler which has a spinlock. Since spinlocks are sleepable on rt, it is not allowed to call them with interrupts disabled.
 
 Consequence: BUG: sleeping function called from invalid context at kernel/rtmutex.c:646 reported multiple times.
Comment 6 errata-xmlrpc 2012-05-15 19:09:56 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0670.html
Comment 7 Tomas Capek 2012-05-16 07:21:51 EDT
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,7 +1 @@
-Cause: Function qla2x00_poll does local_irq_save() before calling qla24xx_intr_handler which has a spinlock. Since spinlocks are sleepable on rt, it is not allowed to call them with interrupts disabled.
+Previously, the qla2x00_poll() function did the local_irq_save() call before calling qla24xx_intr_handler(), which had a spinlock. Since spinlocks are sleepable in the real-time kernel, it is not allowed to call them with interrupts disabled. This scenario produced error messages and could cause a system deadlock. With this update, the local_irq_save_nort(flags) function is used to save flags without disabling interrupts, which prevents potential deadlocks and removes the error messages.-
-Consequence: BUG: sleeping function called from invalid context at kernel/rtmutex.c:646 reported multiple times.
-
-Fix: Use local_irq_save_nort(flags) to save flags without disabling interrupts.
-
-Result: Potential deadlock is avoided, and the error message goes away.

Note You need to log in before you can comment on or make changes to this bug.