Bug 877366 - EFI pstore: BUG: scheduling while atomic, and possible circular locking dependency
Summary: EFI pstore: BUG: scheduling while atomic, and possible circular locking depen...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 19
Hardware: x86_64
OS: Unspecified
medium
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-11-16 11:10 UTC by Lingzhu Xiang
Modified: 2014-01-06 13:44 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 879086 (view as bug list)
Environment:
Last Closed: 2014-01-06 13:44:37 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Screenshot of kernel oops (188.73 KB, image/jpeg)
2012-11-23 19:20 UTC, Rolf Fokkens
no flags Details

Description Lingzhu Xiang 2012-11-16 11:10:38 UTC
Description of problem:

The kernel prints lots of these messages during pstore read and write. When the kernel panics and dumps kmsg to pstore with EFI backend, the limited space of NVRAM storage is flushed with such messages, defeating the purpose of pstore dump. 

Version-Release number of selected component (if applicable):
3.7.0-0.rc0.git2.4.fc19
3.7.0-0.rc0.git5.2.fc19
3.7.0-0.rc5.git1.3.fc19

How reproducible:
Always.

Not reproduced in 3.6.6-4.fc18 and earlier ones.


Steps to Reproduce:
0. configure kernel parameters: pstore.backend=efi panic=3
1. mkdir -p /dev/pstore
2. mount -t pstore -o kmsg_bytes=8000 - /dev/pstore
3. echo c >/proc/sysrq-trigger

And after reboot

1. mkdir -p /dev/pstore
2. mount -t pstore -o kmsg_bytes=8000 - /dev/pstore
3. ls -l /dev/pstore
  
Actual results:

Repeatedly printing this during the crash, and mount

[   29.330599] BUG: sleeping function called from invalid context at mm/slub.c:930
[   29.331985] in_atomic(): 1, irqs_disabled(): 0, pid: 558, name: mount
[   29.333153] 3 locks held by mount/558:
[   29.333816]  #0:  (&type->s_umount_key#38/1){+.+.+.}, at: [<ffffffff811d37e0>] sget+0x3b0/0x670
[   29.335586]  #1:  (&psinfo->read_mutex){+.+.+.}, at: [<ffffffff812c59fb>] pstore_get_records+0x3b/0x130
[   29.337414]  #2:  (&(&efivars->lock)->rlock){+.+.+.}, at: [<ffffffff8155bf0d>] efi_pstore_open+0x1d/0x40
[   29.339248] Pid: 558, comm: mount Tainted: G        W    3.7.0-0.rc0.git2.4.fc19.x86_64 #1
[   29.340739] Call Trace:
[   29.341238]  [<ffffffff810a399a>] __might_sleep+0x18a/0x240
[   29.342267]  [<ffffffff811b7767>] __kmalloc+0x67/0x2d0
[   29.343216]  [<ffffffff8155bc57>] ? efi_pstore_read+0x1b7/0x210
[   29.344300]  [<ffffffff8155bc57>] efi_pstore_read+0x1b7/0x210
[   29.345352]  [<ffffffff812c5a51>] pstore_get_records+0x91/0x130
[   29.346445]  [<ffffffff812e926c>] ? selinux_d_instantiate+0x1c/0x20
[   29.347576]  [<ffffffff812c5070>] ? pstore_get_inode+0x50/0x50
[   29.348858]  [<ffffffff812c5112>] pstore_fill_super+0xa2/0xc0
[   29.349904]  [<ffffffff811d4832>] mount_single+0xb2/0xe0
[   29.350880]  [<ffffffff812c4ef8>] pstore_mount+0x18/0x20
[   29.352068]  [<ffffffff811d4a53>] mount_fs+0x43/0x1b0
[   29.352968]  [<ffffffff811f27c3>] vfs_kern_mount+0x73/0x110
[   29.353994]  [<ffffffff811f3264>] do_kern_mount+0x54/0x110
[   29.355014]  [<ffffffff811f4e15>] do_mount+0x315/0x8e0
[   29.355929]  [<ffffffff811f496a>] ? copy_mount_options+0x3a/0x180
[   29.357043]  [<ffffffff811f546d>] sys_mount+0x8d/0xe0
[   29.357956]  [<ffffffff816f3469>] system_call_fastpath+0x16/0x1b
[   29.359048] BUG: scheduling while atomic: mount/558/0x10000002
[   29.360136] 3 locks held by mount/558:
[   29.360804]  #0:  (&type->s_umount_key#38/1){+.+.+.}, at: [<ffffffff811d37e0>] sget+0x3b0/0x670
[   29.362626]  #1:  (&psinfo->read_mutex){+.+.+.}, at: [<ffffffff812c59fb>] pstore_get_records+0x3b/0x130
[   29.364407]  #2:  (&(&efivars->lock)->rlock){+.+.+.}, at: [<ffffffff8155bf0d>] efi_pstore_open+0x1d/0x40
[   29.366487] Modules linked in: iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack vfat fat microcode virtio_net i2c_piix4 i2c_core
[   29.369180] Pid: 558, comm: mount Tainted: G        W    3.7.0-0.rc0.git2.4.fc19.x86_64 #1
[   29.370660] Call Trace:
[   29.371133]  [<ffffffff816dd1f1>] __schedule_bug+0x68/0x76
[   29.372147]  [<ffffffff816e7ede>] __schedule+0x99e/0xa00
[   29.373128]  [<ffffffff810a623a>] __cond_resched+0x2a/0x40
[   29.374137]  [<ffffffff816e7fbf>] _cond_resched+0x2f/0x40
[   29.375130]  [<ffffffff811b776c>] __kmalloc+0x6c/0x2d0
[   29.376080]  [<ffffffff8155bc57>] ? efi_pstore_read+0x1b7/0x210
[   29.377166]  [<ffffffff8155bc57>] efi_pstore_read+0x1b7/0x210
[   29.378260]  [<ffffffff812c5a51>] pstore_get_records+0x91/0x130
[   29.379341]  [<ffffffff812e926c>] ? selinux_d_instantiate+0x1c/0x20
[   29.380488]  [<ffffffff812c5070>] ? pstore_get_inode+0x50/0x50
[   29.381512]  [<ffffffff812c5112>] pstore_fill_super+0xa2/0xc0
[   29.382565]  [<ffffffff811d4832>] mount_single+0xb2/0xe0
[   29.383706]  [<ffffffff812c4ef8>] pstore_mount+0x18/0x20
[   29.384667]  [<ffffffff811d4a53>] mount_fs+0x43/0x1b0
[   29.385593]  [<ffffffff811f27c3>] vfs_kern_mount+0x73/0x110
[   29.386618]  [<ffffffff811f3264>] do_kern_mount+0x54/0x110
[   29.387629]  [<ffffffff811f4e15>] do_mount+0x315/0x8e0
[   29.388577]  [<ffffffff811f496a>] ? copy_mount_options+0x3a/0x180
[   29.389695]  [<ffffffff811f546d>] sys_mount+0x8d/0xe0
[   29.390632]  [<ffffffff816f3469>] system_call_fastpath+0x16/0x1b

Expected results:

No such messages are printed. 


Additional info:

Tested with QEMU/KVM instances and a Thinkpad T520 laptop.

Comment 1 Josh Boyer 2012-11-16 14:51:03 UTC
Can you report this to the usptream pstore developers?

Comment 2 Lingzhu Xiang 2012-11-19 09:45:18 UTC
I will, but I should investigate a bit more.

The 3.7 kernels grabbed from koji happen to be debug version so the debug log is here, while 3.6 ones are not.

And this at least dates back to pre-3.0, as I tested. So not a regression.

Comment 3 Rolf Fokkens 2012-11-23 19:19:11 UTC
This might be related. It happens frequently, not always during boot after upgrading the kernel, so full details will be shown in a screenshot I'll post as well.

kernel: 3.6.6-1.fc16.x86_64
System: ASUS P8Z77-V, Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz

Frequently during boot the following messages scroll over the display, I'm not able to see all details. In the end it all stops with:

Kernel panic - not syncing: Fatal exception in interrupt
panic occurred, switch back to text console

Some highlights (full details in the screenshot) of the kernel oops:
__schedule_bug
? netlink_broadcast_filtered
__cond_resched
...
efi_pstore_write
...
x86_start_kernel

Comment 4 Rolf Fokkens 2012-11-23 19:20:51 UTC
Created attachment 650622 [details]
Screenshot of kernel oops

Comment 5 Lingzhu Xiang 2012-11-26 02:10:45 UTC
It is not the same problem. Your kernel panic was caused by something else. pstore will try to save some dmesg when kernel panics, so you can
    mkdir -p /dev/pstore && mount -t pstore - /dev/pstore && ls /dev/pstore
to have a look if anything was saved.

The backtrace ended in pstore routines because there is some unsafe locking, but pstore was not causing the panic.

BTW upstream discussion on this bug: http://www.spinics.net/lists/linux-efi/msg00586.html

Comment 6 Rolf Fokkens 2012-11-26 20:42:14 UTC
Thx, I'll report new bug.

Comment 7 Fedora End Of Life 2013-04-03 18:17:12 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Comment 8 Justin M. Forbes 2013-04-05 16:57:17 UTC
Is this still a problem with 3.9 based F19 kernels?

Comment 9 Lingzhu Xiang 2013-04-15 02:21:49 UTC
(In reply to comment #8)
> Is this still a problem with 3.9 based F19 kernels?

Yes. I tested 3.9.0-0.rc6.git2.1.fc19.x86_64. Part of it was resolved. There are some other problems. Upstream report http://thread.gmane.org/gmane.linux.kernel.efi/1075

Comment 11 Michele Baldessari 2013-11-17 15:05:16 UTC
Hi Lingzhu,

is still an issue (after your report) with kernels 3.11.x?

thanks,
Michele

Comment 12 Michele Baldessari 2013-12-25 18:17:08 UTC
Hi Lingzhu,

is still an issue (after your report) with kernels 3.11.x or with the newly 3.12.x?
I've seen some patches were proposed on linux-efi but I am not 100% sure what
the state now is.

thanks and regards,
Michele

Comment 14 Michele Baldessari 2013-12-26 18:10:21 UTC
Thanks. Currently it is in 3.13-rc4 but it was copied CCed to stable
so I assume it will trickle down in the 3.12.x sometime.

Let me know if you need a build with that patch backported to confirm 100% that
you don't see the issue anymore.

Comment 15 Josh Boyer 2014-01-06 13:44:37 UTC
Should be in 3.12.5.  If you are still seeing this with kernels newer than that, please reopen.


Note You need to log in before you can comment on or make changes to this bug.