Bug 2135778
Summary: | systemd-coredump times out while processing a crash, gdb can't attach to a stuck process | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Kamil Páral <kparal> | ||||||
Component: | systemd | Assignee: | systemd-maint | ||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 37 | CC: | awilliam, fedoraproject, filbranden, flepied, lnykryn, msekleta, robatino, ryncsn, ssahani, s, systemd-maint, yuwatana, zbyszek | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | RejectedBlocker AcceptedFreezeException | ||||||||
Fixed In Version: | systemd-251.7-611.fc37 systemd-250.9-1.fc36 | Doc Type: | If docs needed, set a value | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2022-11-02 19:53:14 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 2009540 | ||||||||
Attachments: |
|
Description
Kamil Páral
2022-10-18 12:00:23 UTC
Created attachment 1918736 [details]
system journal
Created attachment 1918737 [details]
rpm -qa output
I'd like to raise awareness of this issue and potentially propose it as a blocker. It seems to be that this could be an issue somewhere in the lower stack and could cause us some troubles. It definitely seems to break ABRT, because it can't notify the user about a crash and offer to report it, when not even systemd-coredump completes. However, that seems to be happening just for this gnome-calendar crash. When I send SIGSEGV e.g. to gnome-calculator, the whole systemd-coredump -> ABRT workflow works correctly. Seems duplicate of RHBZ#2134741 and already fixed by https://github.com/systemd/systemd/commit/f6e88aac2c30392a934507591d70a35ca1ea7acf in the upstream. No, I don't think it can be the same issue. Here we're observing a timeout, not a crash. Actually, systemd-251.6 doesn't have the bug fixed by f6e88aac2c30392a934507591d70a35ca1ea7acf. So it's definitely a different issue. I managed gnome-calendar to crash as described, and systemd-coredump (from git) runs fine. But with 251.6 I can reproduce the issue. Phew, it's just a straightforward deadlock. We fork the child and wait for it to exit. The child tries to write 69596 bytes to a pipe, and and after the first 64k blocks on the parent because the pipe is full. Zbigniew: can you give us a sense whether this is likely to be quite a general problem - will it affect a lot of crashes - or is it quite an unusual scenario that likely won't affect many? I don't think it'll be super-common. You need a program linked to way too many libraries for the issue to occur. https://github.com/systemd/systemd/pull/25055 should fix the issue. -4 blocker / +3 FE in https://pagure.io/fedora-qa/blocker-review/issue/978 , marking rejected blocker, accepted FE. I just found a crash in gnome-control-center, which is also affected by this systemd bug. So it's not that rare, probably. @zbyszek.pl We delayed F37 release by a week. Would it be possible to backport a patch to fix this? Or is a new systemd release planned soon after F37 is go (to at least shorten the window where people can't report certain crashes)? Thanks. kparal: I did a scratch build with a backport of the fix and found it still didn't work as the coredump generation timed out. If you want to try it, it's at https://koji.fedoraproject.org/koji/taskinfo?taskID=93196487 . (In reply to Adam Williamson from comment #13) > kparal: I did a scratch build with a backport of the fix and found it still > didn't work as the coredump generation timed out. If you want to try it, > it's at https://koji.fedoraproject.org/koji/taskinfo?taskID=93196487 . Well, it worked for me. I reproduced the calendar crash again, and it got printed to the system journal and is visible in coredumpctl. I'm fairly confident that the patch fixes the issue, though not 100%. Please test the build that will be started soon. FEDORA-2022-c72fd8b071 has been submitted as an update to Fedora 37. https://bodhi.fedoraproject.org/updates/FEDORA-2022-c72fd8b071 FEDORA-2022-c72fd8b071 has been pushed to the Fedora 37 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2022-c72fd8b071` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-c72fd8b071 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. (In reply to Fedora Update System from comment #16) > FEDORA-2022-c72fd8b071 has been submitted as an update to Fedora 37. > https://bodhi.fedoraproject.org/updates/FEDORA-2022-c72fd8b071 Thanks, it is fixed for me. (Note: While system-coredump works now, ABRT doesn't, for that particular crash. I reported that as bug 2137249 ). FEDORA-2022-c72fd8b071 has been pushed to the Fedora 37 stable repository. If problem still persists, please make note of it in this bug report. FEDORA-2022-ef4f57b072 has been submitted as an update to Fedora 36. https://bodhi.fedoraproject.org/updates/FEDORA-2022-ef4f57b072 FEDORA-2022-ef4f57b072 has been pushed to the Fedora 36 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2022-ef4f57b072` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-ef4f57b072 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. FEDORA-2022-ef4f57b072 has been pushed to the Fedora 36 stable repository. If problem still persists, please make note of it in this bug report. |