949721 – Convert UEFI boot failure data from /sys/fs/pstore to abrt problem data

Bug 949721 - Convert UEFI boot failure data from /sys/fs/pstore to abrt problem data

Summary: Convert UEFI boot failure data from /sys/fs/pstore to abrt problem data

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	abrt
Sub Component:
Version:	19
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Denys Vlasenko
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-04-08 20:13 UTC by Josh Boyer
Modified:	2013-09-15 00:54 UTC (History)
CC List:	14 users (show)
Fixed In Version:	gnome-abrt-0.3.1-1.fc19
Clone Of:
Environment:
Last Closed:	2013-09-15 00:54:04 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	947142	0	unspecified	CLOSED	write to /sys/firmware/efi/vars/new_var results in ENOSPC	2021-02-22 00:41:40 UTC

Internal Links: 947142

Description Josh Boyer 2013-04-08 20:13:48 UTC

Description of problem:

In Fedora 19, the pstore filesystem is automatically mounted on EFI systems at /sys/fs/pstore.  If the kernel crashes, it will store dump files under this location which can often be used to get failure data for hard to diagnose problems.  This is often the only source of crash information for "doesn't boot" or "hung machine" type bugs. It would be beneficial for ABRT to look in this directory upon startup, assemble a backtrace from the files if they exist, and report a bug.

Additionally, ABRT should delete these files from the /sys/fs/pstore/ directory.  They are saved in UEFI NVRAM, which is a very limited resource.  Leaving these around will eventually prevent new UEFI variables from being created.

Comment 1 Peter Jones 2013-04-08 20:18:40 UTC

Just FYI, this will trigger behaviour like 928982 even once we fix the problem that's currently causing that bug, so it is pretty important.

Comment 2 Adam Williamson 2013-04-08 21:14:20 UTC

Given "They are saved in UEFI NVRAM, which is a very limited resource", should we perhaps consider disabling this until abrt is actually ready to handle the dumps?

Comment 3 Denys Vlasenko 2013-04-16 14:27:04 UTC

> It would be beneficial for ABRT to look in this directory upon startup, assemble a backtrace from the files if they exist, and report a bug.

Please give me a few examples of /sys/fs/pstore contents.
Re deletion: what should be deleted there? Everyting?

Comment 4 Josh Boyer 2013-04-16 18:42:30 UTC

(In reply to comment #3)
> > It would be beneficial for ABRT to look in this directory upon startup, assemble a backtrace from the files if they exist, and report a bug.
> 
> Please give me a few examples of /sys/fs/pstore contents.
> Re deletion: what should be deleted there? Everyting?

Anaconda deletes everything under /sys/fs/pstore on installation, but that probably isn't the proper thing to do for ABRT.  The files ABRT will be interested in are of the type:

dmesg-efi-N

where N is a number.  E.g.

[jwboyer@sb ~]$ ls -l /sys/fs/pstore/
total 0
-r--r--r--. 1 root root  998 Apr 16 14:14 dmesg-efi-1
-r--r--r--. 1 root root  978 Apr 16 14:14 dmesg-efi-10
-r--r--r--. 1 root root  976 Apr 16 14:14 dmesg-efi-11
-r--r--r--. 1 root root  977 Apr 16 14:14 dmesg-efi-2
-r--r--r--. 1 root root 1022 Apr 16 14:14 dmesg-efi-3
-r--r--r--. 1 root root 1005 Apr 16 14:14 dmesg-efi-4
-r--r--r--. 1 root root  980 Apr 16 14:14 dmesg-efi-5
-r--r--r--. 1 root root  981 Apr 16 14:14 dmesg-efi-6
-r--r--r--. 1 root root 1016 Apr 16 14:14 dmesg-efi-7
-r--r--r--. 1 root root  897 Apr 16 14:14 dmesg-efi-8
-r--r--r--. 1 root root  946 Apr 16 14:14 dmesg-efi-9
[jwboyer@sb ~]$

The contents contained within are records for backtraces that look like this:

[jwboyer@sb ~]$ cat /sys/fs/pstore/dmesg-efi-1
Panic#2 Part1
<4>[  793.321074] Call Trace:
<4>[  793.323855]  [<ffffffff8144be47>] __handle_sysrq+0x127/0x190
<4>[  793.326659]  [<ffffffff8144beb0>] ? __handle_sysrq+0x190/0x190
<4>[  793.329474]  [<ffffffff8144befa>] write_sysrq_trigger+0x4a/0x50
<4>[  793.332285]  [<ffffffff812529ac>] proc_reg_write+0x7c/0xc0
<4>[  793.335102]  [<ffffffff811e108f>] vfs_write+0xaf/0x190
<4>[  793.337915]  [<ffffffff811e1545>] sys_write+0x55/0xa0
<4>[  793.340720]  [<ffffffff81740519>] system_call_fastpath+0x16/0x1b
<4>[  793.343536] Code: e1 f7 ff ff eb d8 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 66 66 66 66 90 55 c7 05 3c d9 ce 00 01 00 00 00 48 89 e5 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 66 66 66 66 90 55 48 89 e5 53 48 
<1>[  793.349823] RIP  [<ffffffff8144b616>] sysrq_handle_crash+0x16/0x20
<4>[  793.352888]  RSP <ffff880104a01e28>
<4>[  793.355937] CR2: 0000000000000000
<4>[  793.372892] ---[ end trace 1675b8f5d63a931e ]---
<0>[  793.534721] Kernel panic - not syncing: Fatal exception

Comment 5 Matthew Garrett 2013-04-16 20:20:10 UTC

Don't assume that they'll contain -efi- in the filename - different backends will have different strings there, and they should probably all be handled.

Comment 6 Denys Vlasenko 2013-04-24 11:57:08 UTC

That's it? No kernel version? No explanation of the reason why kernel panics?

Here is an example of a "regular" oops:

BUG: unable to handle kernel NULL pointer dereference at 00000000
IP: [<f88dec25>] :radeon:radeon_cp_init_ring_buffer+0x90/0x302
*pde = 6f5c6067
Oops: 0000 [#1] SMP.
Modules linked in: r8169 mii fuse nfsd lockd nfs_acl auth_rpcgss exportfs bridge stp bnep sco l2cap bl
Pid: 8003, comm: Xorg Not tainted (2.6.27.9-159.fc10.i686 #1)
                                   ^^^^^^^^^^^^^^^^^^^^^^ important!
EIP: 0060:[<f88dec25>] EFLAGS: 00213246 CPU: 1
EIP is at radeon_cp_init_ring_buffer+0x90/0x302 [radeon]
EAX: 00000000 EBX: f78b4000 ECX: f78b4000 EDX: 00000000
ESI: f5dbe800 EDI: 00006458 EBP: f0a0cf18 ESP: f0a0cf08
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process Xorg (pid: 8003, ti=f0a0c000 task=f2380000 task.ti=f0a0c000)
Stack: f0a0cf18 f78b4000 f5dbe800 00006458 f0a0cf28 f88e11c7 f8911a24 00000000.
      f0a0cf4c f88745f8 f30c3ba0 f5dbe800 f88e114a f5dbe828 f890fd78 f097ac00.
      00000000 f0a0cf68 c049b1c0 00000000 00006458 f097ac00 f097ac00 00000000.
Call Trace:
[<f88e11c7>] radeon_cp_resume+0x7d/0xbc [radeon]
[<f88745f8>] drm_ioctl+0x1b0/0x225 [drm]
[<f88e114a>] radeon_cp_resume+0x0/0xbc [radeon]
[<c049b1c0>] vfs_ioctl+0x50/0x69
[<c049b414>] do_vfs_ioctl+0x23b/0x247
[<c0460a56>] audit_syscall_entry+0xf9/0x123
[<c049b460>] sys_ioctl+0x40/0x5c
[<c0403c76>] syscall_call+0x7/0xb
=======================
Code: 66 31 d2 09 c2 89 d8 e8 fc e7 ff ff 8b 83 cc 00 00 00 8b 53 34 03 10 8b 86 70 02 00 00 2b  50 44
EIP: [<f88dec25>] radeon_cp_init_ring_buffer+0x90/0x302 [radeon] SS:ESP 0068:f0a0cf08


The example you provided looks like merely a *tail* of such oops.
It even has no kernel version.
Imagine that we (abrt reporter tool) are trying to file a BZ. Under which version we should file it?

Is it really that bad?

Comment 7 Matthew Garrett 2013-04-24 11:59:49 UTC

No. Josh provided the end of the oops. You need to reassemble the files in order to obtain the full backtrace.

Comment 8 Denys Vlasenko 2013-04-25 10:51:51 UTC

(In reply to comment #7)
> No. Josh provided the end of the oops. You need to reassemble the files in
> order to obtain the full backtrace.

Please give me a complete example (as in: set of /sys/fs/pstore files which constitutes one complete oops).

Comment 9 Denys Vlasenko 2013-05-06 11:42:25 UTC

Committed to upstream git:

commit 79d189a1e0a35108954f1476dd4ddbd46760b0fe
Author: Denys Vlasenko <dvlasenk>
Date:   Fri Apr 26 17:33:19 2013 +0200

    tests/runtests/uefioops: new test

    Signed-off-by: Denys Vlasenko <dvlasenk>
    Signed-off-by: Martin Milata <mmilata>

commit c8f39aacf8356cd70763fc2802b544da38a24fbb
Author: Denys Vlasenko <dvlasenk>
Date:   Fri Apr 26 17:33:18 2013 +0200

    specfile: hook up abrt-uefioops service

    Signed-off-by: Denys Vlasenko <dvlasenk>
    Signed-off-by: Martin Milata <mmilata>

commit 865f7b88b391f2fbd71baa1aa3d6320455d4f519
Author: Denys Vlasenko <dvlasenk>
Date:   Fri Apr 26 17:33:17 2013 +0200

    abrt-uefioops: new service

    This service, once per boot, scans /sys/fs/pstore/* and crates
    oops problem dirs from this data.

    Signed-off-by: Denys Vlasenko <dvlasenk>
    Signed-off-by: Martin Milata <mmilata>

Comment 10 Matthew Garrett 2013-05-06 14:47:59 UTC

Pstore isn't UEFI-specific - calling it uefioops probably isn't the best idea.

Comment 11 Denys Vlasenko 2013-08-23 11:43:15 UTC

Fixed in abrt git.

Comment 12 Denys Vlasenko 2013-08-26 13:25:35 UTC

uefi-oops renamed to pstore-oops in commit 205a7d29cdb91bec9a7174dad90e5c39d466434e

Comment 13 Fedora Update System 2013-09-13 13:09:07 UTC

gnome-abrt-0.3.1-1.fc19,abrt-2.1.7-1.fc19,libreport-2.1.7-1.fc19,satyr-0.9-1.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/gnome-abrt-0.3.1-1.fc19,abrt-2.1.7-1.fc19,libreport-2.1.7-1.fc19,satyr-0.9-1.fc19

Comment 14 Fedora Update System 2013-09-14 02:39:09 UTC

Package gnome-abrt-0.3.1-1.fc19, abrt-2.1.7-1.fc19, libreport-2.1.7-1.fc19, satyr-0.9-1.fc19:
* should fix your issue,
* was pushed to the Fedora 19 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing gnome-abrt-0.3.1-1.fc19 abrt-2.1.7-1.fc19 libreport-2.1.7-1.fc19 satyr-0.9-1.fc19'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-16707/gnome-abrt-0.3.1-1.fc19,abrt-2.1.7-1.fc19,libreport-2.1.7-1.fc19,satyr-0.9-1.fc19
then log in and leave karma (feedback).

Comment 15 Fedora Update System 2013-09-15 00:54:04 UTC

gnome-abrt-0.3.1-1.fc19, abrt-2.1.7-1.fc19, libreport-2.1.7-1.fc19, satyr-0.9-1.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.