Bug 1860616

Summary: abrt-server errors when processing zstd compressed core dumps produced by systemd-246~rc1-1.fc33
Product: [Fedora] Fedora Reporter: Matt Fagnani <matthew.fagnani>
Component: libreportAssignee: abrt <abrt-devel-list>
Status: VERIFIED --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 33CC: abrt-devel-list, alciregi, awilliam, bcotton, bugzilla, drusek, ekulik, gmarr, jakub, jmilan, jpesco, kparal, lruzicka, mcatanza, mfabik, mhabrnal, michal.toman, mkutlak, mmarusak, robatino
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: AcceptedBlocker
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-26 09:00:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1766775    

Description Matt Fagnani 2020-07-25 19:18:59 UTC
Description of problem:

Core dumps and journals produced by systemd-246~rc1-1.fc33 were changed to be zstd compressed. coredumpctl (info) had problems processing zstd compressed core dumps which were fixed in systemd-246~rc2-1.fc33 https://bugzilla.redhat.com/show_bug.cgi?id=1856037

abrt-server errors indicating that the core dump wasn't a valid ELF file were shown in the journal for each crash I've seen with systemd-246~rc1 and 246~rc2.
The following example is from kwalletd5 aborting when logging out of Plasma on Wayland which is one of several such crashes of KDE programs each time I've done so.

Jul 13 11:54:00 abrt-server[7256]: Error: File './coredump' is not a coredump
Jul 13 11:54:00 abrt-server[7256]: eu-readelf: failed reading 'coredump': not a valid ELF file
Jul 13 11:54:01 abrt-server[7256]: Error while running gdb:
Jul 13 11:54:01 abrt-server[7256]: /usr/libexec/gdb: warning: Couldn't determine a path for the index cache directory.
Jul 13 11:54:01 abrt-server[7256]: "/var/spool/abrt/ccpp-2020-07-13-11:53:55.832972-1127/./coredump" is not a core dump: file format not recognized
Jul 13 11:54:01 abrt-server[7256]: Python Exception <class 'ValueError'> invalid literal for int() with base 10: '':
Jul 13 11:54:01 abrt-server[7256]: Error occurred in Python: invalid literal for int() with base 10: ''
Jul 13 11:54:01 abrt-server[7256]: eu-unstrip: cannot read ELF core file: not a valid ELF file
Jul 13 11:54:01 abrt-server[7256]: Can't open file 'core_backtrace' for reading: No such file or directory
Jul 13 11:54:03 abrt-server[7256]: Deleting problem directory ccpp-2020-07-13-11:53:55.832972-1127 (dup of ccpp-2020-07-10-05:13:54.484968-1137)
Jul 13 11:54:04 abrt-notification[7330]: Process 1137 (kwalletd5) crashed in ??()

The full journal with these abrt-server errors is at https://bugzilla.redhat.com/show_bug.cgi?id=1856037#c2

The crashes which happened with systemd-246~rc1-1.fc33 and 246~rc2-1 installed all showed up in gnome-abrt, but there aren't any traces or core dumps in the gnome-abrt entries. abrt might need to be updated to support zstd compressed core dumps.

Version-Release number of selected component (if applicable):
systemd-246~rc2-1.fc33.x86_64
abrt-2.14.2-3.fc33.x86_64

How reproducible:
The abrt-server errors have happened every time a core dump was created by 
systemd-246~rc1-1 and 246~rc2-1.

Steps to Reproduce:
1. Boot a Fedora Rawhide KDE Plasma spin installation updated to 2020-7-25 with 
systemd-246~rc1-1.fc33 or 246~rc2-1, kwin-wayland, plasma-workspace-wayland and their dependencies installed
2. Log in to Plasma 5.19.3 on Wayland from sddm
3. Log out of Plasma. Several KDE programs aborted with errors that the Wayland connection broke each time I've done this.
4. Log in to Plasma
5. start konsole
6. journalctl -b (in konsole)
7. start gnome-abrt
8. select the entries of the new crashes in gnome-abrt looking for the traces and core dumps

I've also reproduced this problem using 
kill -6 $(pidof <process>) for other processes than those I reported.
1. man abrt (in konsole)
2. start another tab in konsole and switch to the other terminal
3. kill -6 $(pidof man)

Actual results:
abrt-server errors indicating that the core dump wasn't a valid ELF file were shown in the journal for each crash I've seen with systemd-246~rc1 and 246~rc2.
gnome-abrt didn't show the traces or core dumps for any of those crashes. 

Expected results:
abrt-server would process the core dumps correctly. gnome-abrt would show the traces and core dump files normally.

Additional info:
The changelog for systemd-246~rc1 noted "coredumps collected by systemd-coredump may now be compressed using the zstd algorithm." at https://raw.githubusercontent.com/systemd/systemd/v246-rc1/NEWS

Zbigniew Jędrzejewski-Szmek wrote "Please open a new bug report against abrt. This one here was a real bug that has been fixed and it'd be confusing to reuse this for another issue." at https://bugzilla.redhat.com/show_bug.cgi?id=1856037#c8 I've created this report based on Zbigniew's suggestion.

Comment 1 Ernestas Kulik 2020-07-26 09:00:56 UTC
Please create an issue upstream: https://github.com/abrt/abrt/issues/new

Comment 2 Matt Fagnani 2020-07-27 23:03:39 UTC
(In reply to Ernestas Kulik from comment #1)
> Please create an issue upstream: https://github.com/abrt/abrt/issues/new

I'd prefer not to use an account on github. Free free to open an issue upstream with the information in this report though. Thanks.

Comment 3 Matt Fagnani 2020-07-27 23:12:23 UTC
Feel free to open an issue upstream with the information in this report though (is what I meant to write). Sorry.

Comment 4 Kamil Páral 2020-07-29 17:26:40 UTC
I'm reopening this for the purpose of a blocker bug discussion. I've already hit this bug in bug 1861700 and it has dire consequences. If I'm not mistaken, it basically means that users can't easily report bugs against Fedora 33, which means we'll get *substantially* fewer reports, which is likely to negatively impact release quality. I believe we need to fix this ASAP, because it's extremely important for QA.

I'm proposing this to be a F33 blocker, at least according to this criterion:
https://fedoraproject.org/wiki/Fedora_33_Final_Release_Criteria#Default_application_functionality

If ABRT can't report any crashes because it doesn't understand zstd-compressed coredumps, it doesn't withstand a basic functionality test.

Note that I think this should be accepted against Beta rather than Final, but I can't find a fitting criterion. I think we had some paragraph about substantially reducing QA coverage somewhere, but I can't find it.

Comment 5 Ernestas Kulik 2020-07-30 07:44:22 UTC
https://github.com/abrt/libreport/pull/656 should help with this.

Comment 6 Chris Murphy 2020-08-02 17:42:45 UTC
https://fedoraproject.org/wiki/Fedora_32_Beta_Release_Criteria#Beta_Blocker_Bugs
"A bug is considered a Beta blocker bug if any of the following criteria are met:
* Bug hinders execution of required Beta test plans or dramatically reduces test coverage"

Comment 7 Kamil Páral 2020-08-03 10:35:37 UTC
Thank you, Chris, that's what I was looking for. I believe this issue dramatically reduces test coverage, because the QA community in large can't easily report most of program crashes.

The simplest reproducer is:
1. run gnome-calculator
2. pkill -ABRT -f gnome-calculator
3. see abrt notification pop up, try to report it, it can't, because of this issue

Comment 8 Geoffrey Marr 2020-08-03 17:07:12 UTC
Discussed during the 2020-08-03 blocker review meeting: [0]

The decision to classify this bug as an "AcceptedBlocker" was made as it violates the following criterion:

"Bug hinders execution of required Beta test plans or dramatically reduces test coverage" [1].

The impact of this bug could leave users with no ability to automatically report bugs, which would reduce the amount of testing coverage. As such, we find it warrants blocker status.

[0] https://meetbot.fedoraproject.org/fedora-blocker-review/2020-08-03/f33-blocker-review.2020-08-03-16.02.txt
[1] https://fedoraproject.org/wiki/Fedora_32_Beta_Release_Criteria#Beta_Blocker_Bugs

Comment 9 Adam Williamson 2020-08-10 17:00:33 UTC
since a PR was referenced, setting POST.

Comment 10 Adam Williamson 2020-08-10 18:33:31 UTC
abrt folks, can you please either cut a new release of libreport and package it for f33, or backport the fix to f33? thanks!

Comment 11 Ernestas Kulik 2020-08-11 05:44:46 UTC
(In reply to Adam Williamson from comment #10)
> abrt folks, can you please either cut a new release of libreport and package
> it for f33, or backport the fix to f33? thanks!

Yeah, that’s in the works.

Comment 12 Ben Cotton 2020-08-11 13:49:30 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 33 development cycle.
Changing version to 33.

Comment 13 Adam Williamson 2020-08-17 18:51:25 UTC
So there was a new libreport version cut and packages built, but there were issues with that:
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/HCDKM64RKM3H27WGCDV66S4KSHCEJX2A/
and the libreport 2.14.0 builds are being untagged until its dependencies are rebuilt. I'm trying to get the abrt team to take care of this at present.

Thanks!

Comment 14 Adam Williamson 2020-08-24 21:36:07 UTC
So there's now a libreport 2.4.0 with a matching abrt rebuild in a side tag. But...I installed them, and this still doesn't seem to work:

[adamw@adam tmp]$ report-cli -e analyze_LocalGDB ccpp-2020-08-24-09\:52\:58.756288-2021/
Analyzing coredump 'coredump'
eu-unstrip: cannot read ELF core file: not a valid ELF file
Can't get build ids from coredump
('analyze_LocalGDB' exited with 1)
[adamw@adam tmp]$ file ccpp-2020-08-24-09\:52\:58.756288-2021/coredump 
ccpp-2020-08-24-09:52:58.756288-2021/coredump: Zstandard compressed data (v0.8+), Dictionary ID: None

Comment 15 Ernestas Kulik 2020-08-25 06:11:40 UTC
(In reply to Adam Williamson from comment #14)
> So there's now a libreport 2.4.0 with a matching abrt rebuild in a side tag.
> But...I installed them, and this still doesn't seem to work:
> 
> [adamw@adam tmp]$ report-cli -e analyze_LocalGDB
> ccpp-2020-08-24-09\:52\:58.756288-2021/
> Analyzing coredump 'coredump'
> eu-unstrip: cannot read ELF core file: not a valid ELF file
> Can't get build ids from coredump
> ('analyze_LocalGDB' exited with 1)
> [adamw@adam tmp]$ file ccpp-2020-08-24-09\:52\:58.756288-2021/coredump 
> ccpp-2020-08-24-09:52:58.756288-2021/coredump: Zstandard compressed data
> (v0.8+), Dictionary ID: None

Trigger a new crash and try again. This is not going to retroactively extract core dumps correctly.

Comment 16 Adam Williamson 2020-08-25 06:37:28 UTC
Huh. Why wouldn't it? If it's capable of decompressing zstd now, why not just...do it? Does something have to happen at the time the dump is created?

Comment 17 Fedora Update System 2020-08-26 18:15:28 UTC
FEDORA-2020-59e144acee has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-59e144acee

Comment 18 Fedora Update System 2020-08-26 18:18:05 UTC
FEDORA-2020-59e144acee has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-59e144acee

Comment 19 Kamil Páral 2020-08-27 08:14:20 UTC
(In reply to Fedora Update System from comment #18)
> FEDORA-2020-59e144acee has been submitted as an update to Fedora 33.
> https://bodhi.fedoraproject.org/updates/FEDORA-2020-59e144acee

This update caused bug 1873029 for me. I guess this particular issue is fixed, but I can't confirm it.

Comment 20 Fedora Update System 2020-08-27 19:04:36 UTC
FEDORA-2020-59e144acee has been pushed to the Fedora 33 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-59e144acee`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-59e144acee

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 21 Kamil Páral 2020-09-07 14:18:05 UTC
*** Bug 1875418 has been marked as a duplicate of this bug. ***

Comment 22 Lukas Ruzicka 2020-09-08 06:59:16 UTC
This is still happening on the latest compose available for testing -> 20200906.

Comment 23 Adam Williamson 2020-09-08 17:29:41 UTC
Well, yeah, we didn't push the update stable because of the issue kparal reported. The top-line status here, for me, is "waiting for abrt team to provide a set of packages that actually allows bugs to be successfully reported". I guess it would help if more people could try reporting crashes with the update installed and see if they run into the same bug kparal did - https://bugzilla.redhat.com/show_bug.cgi?id=1873029 - or not.

Comment 24 Fedora Update System 2020-09-11 21:33:51 UTC
FEDORA-2020-fd3d0e6879 has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-fd3d0e6879

Comment 25 Kamil Páral 2020-09-15 07:15:45 UTC
This seems to be fixed with this update:
https://bodhi.fedoraproject.org/updates/FEDORA-2020-444a3363f0
meaning:
abrt-2.14.4-5.fc33.x86_64
gnome-abrt-1.3.6-5.fc33.x86_64
libreport-2.14.0-8.fc33.x86_64

However, that update still causes bug 1878317, so it's not clear whether it'll get pushed stable.

Comment 26 Fedora Update System 2020-09-15 22:31:01 UTC
FEDORA-2020-444a3363f0 has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-444a3363f0

Comment 27 Fedora Update System 2020-09-16 18:55:56 UTC
FEDORA-2020-444a3363f0 has been pushed to the Fedora 33 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-444a3363f0`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-444a3363f0

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.