Bug 1860616 - abrt-server errors when processing zstd compressed core dumps produced by systemd-246~rc1-1.fc33
Summary: abrt-server errors when processing zstd compressed core dumps produced by sys...
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: libreport
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: abrt
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: AcceptedBlocker
Depends On:
Blocks: BetaBlocker, F33BetaBlocker
TreeView+ depends on / blocked
 
Reported: 2020-07-25 19:18 UTC by Matt Fagnani
Modified: 2020-08-03 17:07 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-26 09:00:56 UTC
Type: Bug


Attachments (Terms of Use)

Description Matt Fagnani 2020-07-25 19:18:59 UTC
Description of problem:

Core dumps and journals produced by systemd-246~rc1-1.fc33 were changed to be zstd compressed. coredumpctl (info) had problems processing zstd compressed core dumps which were fixed in systemd-246~rc2-1.fc33 https://bugzilla.redhat.com/show_bug.cgi?id=1856037

abrt-server errors indicating that the core dump wasn't a valid ELF file were shown in the journal for each crash I've seen with systemd-246~rc1 and 246~rc2.
The following example is from kwalletd5 aborting when logging out of Plasma on Wayland which is one of several such crashes of KDE programs each time I've done so.

Jul 13 11:54:00 abrt-server[7256]: Error: File './coredump' is not a coredump
Jul 13 11:54:00 abrt-server[7256]: eu-readelf: failed reading 'coredump': not a valid ELF file
Jul 13 11:54:01 abrt-server[7256]: Error while running gdb:
Jul 13 11:54:01 abrt-server[7256]: /usr/libexec/gdb: warning: Couldn't determine a path for the index cache directory.
Jul 13 11:54:01 abrt-server[7256]: "/var/spool/abrt/ccpp-2020-07-13-11:53:55.832972-1127/./coredump" is not a core dump: file format not recognized
Jul 13 11:54:01 abrt-server[7256]: Python Exception <class 'ValueError'> invalid literal for int() with base 10: '':
Jul 13 11:54:01 abrt-server[7256]: Error occurred in Python: invalid literal for int() with base 10: ''
Jul 13 11:54:01 abrt-server[7256]: eu-unstrip: cannot read ELF core file: not a valid ELF file
Jul 13 11:54:01 abrt-server[7256]: Can't open file 'core_backtrace' for reading: No such file or directory
Jul 13 11:54:03 abrt-server[7256]: Deleting problem directory ccpp-2020-07-13-11:53:55.832972-1127 (dup of ccpp-2020-07-10-05:13:54.484968-1137)
Jul 13 11:54:04 abrt-notification[7330]: Process 1137 (kwalletd5) crashed in ??()

The full journal with these abrt-server errors is at https://bugzilla.redhat.com/show_bug.cgi?id=1856037#c2

The crashes which happened with systemd-246~rc1-1.fc33 and 246~rc2-1 installed all showed up in gnome-abrt, but there aren't any traces or core dumps in the gnome-abrt entries. abrt might need to be updated to support zstd compressed core dumps.

Version-Release number of selected component (if applicable):
systemd-246~rc2-1.fc33.x86_64
abrt-2.14.2-3.fc33.x86_64

How reproducible:
The abrt-server errors have happened every time a core dump was created by 
systemd-246~rc1-1 and 246~rc2-1.

Steps to Reproduce:
1. Boot a Fedora Rawhide KDE Plasma spin installation updated to 2020-7-25 with 
systemd-246~rc1-1.fc33 or 246~rc2-1, kwin-wayland, plasma-workspace-wayland and their dependencies installed
2. Log in to Plasma 5.19.3 on Wayland from sddm
3. Log out of Plasma. Several KDE programs aborted with errors that the Wayland connection broke each time I've done this.
4. Log in to Plasma
5. start konsole
6. journalctl -b (in konsole)
7. start gnome-abrt
8. select the entries of the new crashes in gnome-abrt looking for the traces and core dumps

I've also reproduced this problem using 
kill -6 $(pidof <process>) for other processes than those I reported.
1. man abrt (in konsole)
2. start another tab in konsole and switch to the other terminal
3. kill -6 $(pidof man)

Actual results:
abrt-server errors indicating that the core dump wasn't a valid ELF file were shown in the journal for each crash I've seen with systemd-246~rc1 and 246~rc2.
gnome-abrt didn't show the traces or core dumps for any of those crashes. 

Expected results:
abrt-server would process the core dumps correctly. gnome-abrt would show the traces and core dump files normally.

Additional info:
The changelog for systemd-246~rc1 noted "coredumps collected by systemd-coredump may now be compressed using the zstd algorithm." at https://raw.githubusercontent.com/systemd/systemd/v246-rc1/NEWS

Zbigniew Jędrzejewski-Szmek wrote "Please open a new bug report against abrt. This one here was a real bug that has been fixed and it'd be confusing to reuse this for another issue." at https://bugzilla.redhat.com/show_bug.cgi?id=1856037#c8 I've created this report based on Zbigniew's suggestion.

Comment 1 Ernestas Kulik 2020-07-26 09:00:56 UTC
Please create an issue upstream: https://github.com/abrt/abrt/issues/new

Comment 2 Matt Fagnani 2020-07-27 23:03:39 UTC
(In reply to Ernestas Kulik from comment #1)
> Please create an issue upstream: https://github.com/abrt/abrt/issues/new

I'd prefer not to use an account on github. Free free to open an issue upstream with the information in this report though. Thanks.

Comment 3 Matt Fagnani 2020-07-27 23:12:23 UTC
Feel free to open an issue upstream with the information in this report though (is what I meant to write). Sorry.

Comment 4 Kamil Páral 2020-07-29 17:26:40 UTC
I'm reopening this for the purpose of a blocker bug discussion. I've already hit this bug in bug 1861700 and it has dire consequences. If I'm not mistaken, it basically means that users can't easily report bugs against Fedora 33, which means we'll get *substantially* fewer reports, which is likely to negatively impact release quality. I believe we need to fix this ASAP, because it's extremely important for QA.

I'm proposing this to be a F33 blocker, at least according to this criterion:
https://fedoraproject.org/wiki/Fedora_33_Final_Release_Criteria#Default_application_functionality

If ABRT can't report any crashes because it doesn't understand zstd-compressed coredumps, it doesn't withstand a basic functionality test.

Note that I think this should be accepted against Beta rather than Final, but I can't find a fitting criterion. I think we had some paragraph about substantially reducing QA coverage somewhere, but I can't find it.

Comment 5 Ernestas Kulik 2020-07-30 07:44:22 UTC
https://github.com/abrt/libreport/pull/656 should help with this.

Comment 6 Chris Murphy 2020-08-02 17:42:45 UTC
https://fedoraproject.org/wiki/Fedora_32_Beta_Release_Criteria#Beta_Blocker_Bugs
"A bug is considered a Beta blocker bug if any of the following criteria are met:
* Bug hinders execution of required Beta test plans or dramatically reduces test coverage"

Comment 7 Kamil Páral 2020-08-03 10:35:37 UTC
Thank you, Chris, that's what I was looking for. I believe this issue dramatically reduces test coverage, because the QA community in large can't easily report most of program crashes.

The simplest reproducer is:
1. run gnome-calculator
2. pkill -ABRT -f gnome-calculator
3. see abrt notification pop up, try to report it, it can't, because of this issue

Comment 8 Geoffrey Marr 2020-08-03 17:07:12 UTC
Discussed during the 2020-08-03 blocker review meeting: [0]

The decision to classify this bug as an "AcceptedBlocker" was made as it violates the following criterion:

"Bug hinders execution of required Beta test plans or dramatically reduces test coverage" [1].

The impact of this bug could leave users with no ability to automatically report bugs, which would reduce the amount of testing coverage. As such, we find it warrants blocker status.

[0] https://meetbot.fedoraproject.org/fedora-blocker-review/2020-08-03/f33-blocker-review.2020-08-03-16.02.txt
[1] https://fedoraproject.org/wiki/Fedora_32_Beta_Release_Criteria#Beta_Blocker_Bugs


Note You need to log in before you can comment on or make changes to this bug.