This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1468720 - Boot causes BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
Boot causes BUG: sleeping function called from invalid context at kernel/lock...
Status: NEW
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
rawhide
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-07 14:04 EDT by John Bieren
Modified: 2017-07-26 09:27 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description John Bieren 2017-07-07 14:04:55 EDT
Description of problem:
During boot of kernel-4.13.0-0.rc0.git1.1.fc27.x86_64, dmesg catches 
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747 
[    0.115000] in_atomic():


Version-Release number of selected component (if applicable):
kernel-4.13.0-0.rc0.git1.1.fc27.x86_64


How reproducible:
Always


Steps to Reproduce:
1. Boot host
2.
3.

Actual results:
[    0.112477] smpboot: Max logical packages: 6 
[    0.115000] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747 
[    0.115000] in_atomic(): 0, irqs_disabled(): 1, pid: 1, name: swapper/0 
[    0.115000] no locks held by swapper/0/1. 
[    0.115000] irq event stamp: 314 
[    0.115000] hardirqs last  enabled at (313): [<ffffffff8698eb36>] _raw_spin_unlock_irqrestore+0x36/0x60 
[    0.115000] hardirqs last disabled at (314): [<ffffffff87392466>] enable_IR_x2apic+0x75/0x1af 
[    0.115000] softirqs last  enabled at (146): [<ffffffff86994312>] __do_softirq+0x382/0x4ed 
[    0.115000] softirqs last disabled at (141): [<ffffffff860b7a2f>] irq_exit+0x10f/0x120 
[    0.115000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-0.rc0.git1.1.fc27.x86_64 #1 
[    0.115000] Hardware name:          PowerEdge C6145 /040N24, BIOS 3.0.0 09/10/2012 
[    0.115000] Call Trace: 
[    0.115000]  dump_stack+0x8e/0xcd 
[    0.115000]  ___might_sleep+0x164/0x250 
[    0.115000]  __might_sleep+0x4a/0x80 
[    0.115000]  __mutex_lock+0x59/0x9f0 
[    0.115000]  ? iommu_completion_wait.part.20+0x81/0xc0 
[    0.115000]  ? iommu_flush_all_caches+0x141/0x150 
[    0.115000]  ? iommu_flush_all_caches+0x141/0x150 
[    0.115000]  mutex_lock_nested+0x1b/0x20 
[    0.115000]  ? mutex_lock_nested+0x1b/0x20 
[    0.115000]  register_syscore_ops+0x26/0x70 
[    0.115000]  iommu_go_to_state+0xa46/0x121f 
[    0.115000]  amd_iommu_enable+0x13/0x23 
[    0.115000]  irq_remapping_enable+0x20/0x37 
[    0.115000]  enable_IR_x2apic+0x8e/0x1af 
[    0.115000]  default_setup_apic_routing+0x16/0x6c 
[    0.115000]  native_smp_prepare_cpus+0x294/0x310 
[    0.115000]  kernel_init_freeable+0x12f/0x2a8 
[    0.115000]  ? rest_init+0xe0/0xe0 
[    0.115000]  kernel_init+0xe/0x110 
[    0.115000]  ret_from_fork+0x2a/0x40 
[    0.115008] Switched APIC routing to physical flat. 


Expected results:
No failure


Additional info:
This failure also reproduced with kernel 4.13.0-0.rc0.git2.1.fc27.x86_64.
It looks similar to error that is addressed in this patch: 
https://patchwork.kernel.org/patch/9670379/
However, that is specific to nvdimm and the error hit here does not seem to be.  Perhaps there is some library function that is broken that both nvdimm and amd_iommu call.
Comment 2 John Bieren 2017-07-07 14:09:52 EDT
I also found the following thread which mentions the exact line number that the problem exists in (747), which may or may not be useful
https://lkml.org/lkml/2017/4/15/81
Comment 3 Artem Savkov 2017-07-25 10:01:37 EDT
The issue has been there on AMD machines for a while, but was masked before this commit:
  1c3c5ea sched/core: Enable might_sleep() and smp_processor_id() checks early

I've reported this upstream: https://lkml.org/lkml/2017/7/25/407
Comment 4 Artem Savkov 2017-07-26 09:27:36 EDT
Patch available: https://lkml.org/lkml/2017/7/26/292
Fix should make it to 4.13-rc3

Note You need to log in before you can comment on or make changes to this bug.