Bug 2041772

Summary: [ark] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:577
Product: [Fedora] Fedora Reporter: Bruno Goncalves <bgoncalv>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: acaringi, adscvr, airlied, alciregi, bskeggs, czhong, hdegoede, jarodwilson, jeremy, jglisse, jonathan, josef, kernel-maint, lgoncalv, linville, masami256, mchehab, omosnace, ptalbert, steved, tglx
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-5.18.0-0.rc5.20220506gitfe27d189e3f42e3.44.fc37 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-12 13:55:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Bruno Goncalves 2022-01-18 09:24:18 UTC
1. Please describe the problem:
While booting the machine with kernel 5.17.0-0.rc0.20220117git0c947b893d69.68.test.fc36.aarch64 we hit:

[   72.875140] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:577
[   72.883581] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 0, name: swapper/18
[   72.891668] preempt_count: 10001, expected: 0
[   72.896021] INFO: lockdep is turned off.
[   72.899940] irq event stamp: 42820
[   72.903338] hardirqs last  enabled at (42819): [<ffff8000091b2300>] default_idle_call+0x58/0xe0
[   72.912040] hardirqs last disabled at (42820): [<ffff8000091a0e38>] el1_interrupt+0x28/0xb0
[   72.920396] softirqs last  enabled at (42802): [<ffff800008060704>] __do_softirq+0x2cc/0x384
[   72.928833] softirqs last disabled at (42793): [<ffff80000810c96c>] __irq_exit_rcu+0x100/0x1c0
[   72.937449] CPU: 18 PID: 0 Comm: swapper/18 Tainted: G        W        --------- ---  5.17.0-0.rc0.20220117git0c947b893d69.68.test.fc36.aarch64 #1
[   72.950572] Hardware name: GIGABYTE R120-T34-00/MT30-GS2-00, BIOS F02 08/06/2019
[   72.957965] Call trace:
[   72.960408]  dump_backtrace+0xfc/0x14c
[   72.964157]  show_stack+0x24/0x70
[   72.967468]  dump_stack_lvl+0x7c/0xa0
[   72.971129]  dump_stack+0x18/0x44
[   72.974446]  __might_resched+0x224/0x234
[   72.978370]  __might_sleep+0x54/0x84
[   72.981945]  __mutex_lock_common+0x54/0xbd4
[   72.986131]  mutex_lock_nested+0x5c/0x68
[   72.990054]  msi_get_virq+0x78/0xf0
[   72.993545]  pci_irq_vector+0x30/0x5c
[   72.997208]  nic_mbx_intr_handler+0x44/0x13c [nicpf]
[   73.002178]  __handle_irq_event_percpu+0xc8/0x1d0
[   73.006883]  handle_irq_event+0x54/0x15c
[   73.010807]  handle_fasteoi_irq+0x100/0x1d4
[   73.014995]  generic_handle_domain_irq+0x48/0x6c
[   73.019612]  gic_handle_irq+0x58/0x11c
[   73.023365]  call_on_irq_stack+0x2c/0x38
[   73.027289]  el1_interrupt+0x7c/0xb0
[   73.030860]  el1h_64_irq_handler+0x18/0x24
[   73.034953]  el1h_64_irq+0x7c/0x80
[   73.038355]  arch_local_irq_enable+0xc/0x18
[   73.042538]  default_idle_call+0x6c/0xe0
[   73.046461]  do_idle+0x114/0x2b4
[   73.049695]  cpu_startup_entry+0x30/0x34
[   73.053619]  secondary_start_kernel+0x1a8/0x1dc
[   73.058152]  __secondary_switched+0x94/0x98

2. What is the Version-Release number of the kernel:
kernel 5.17.0-0.rc0.20220117git0c947b893d69.68.test.fc36.aarch64

Comment 2 Ondrej Mosnacek 2022-04-24 10:30:27 UTC
This seems to have been caused by the combination of these commits:

commit 495c66aca3da704e063fa373fdbe371e71d3f4ee
Author: Thomas Gleixner <tglx>
Date:   Mon Dec 6 23:51:45 2021 +0100

    genirq/msi: Convert to new functions

commit 82ff8e6b78fc4587a4255301f0a283506daf11b6
Author: Thomas Gleixner <tglx>
Date:   Fri Dec 10 23:19:25 2021 +0100

    PCI/MSI: Use msi_get_virq() in pci_get_vector()

...which make pci_irq_vector() call msi_get_virq() and msi_get_virq() call msi_lock_descs(), which calls mutex_lock(), which is not allowed in this context. CC'ing Thomas, who will hopefully know how to fix it.