Bug 1787222 - Coredumps are broken since Fedora 26
Summary: Coredumps are broken since Fedora 26
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.40.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ovirt-4.4.0
: 4.40.7
Assignee: Marcin Sobczyk
QA Contact: Petr Kubica
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-01 17:38 UTC by Nir Soffer
Modified: 2020-05-20 20:03 UTC (History)
3 users (show)

Fixed In Version: vdsm-4.40.7
Clone Of:
Environment:
Last Closed: 2020-05-20 20:03:39 UTC
oVirt Team: Infra
Embargoed:
mperina: ovirt-4.4?


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 106048 0 master MERGED init: Enable coredumps 2020-04-23 07:56:50 UTC
oVirt gerrit 107514 0 master MERGED abrt: Fix generating core dumps on el8 2020-04-23 07:56:50 UTC

Description Nir Soffer 2020-01-01 17:38:52 UTC
Description of problem:

Vdsm is overriding /proc/sys/kernel/core_pattern during startup to:

    |/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t %e %i"

But /usr/libexec/abrt-hook-ccpp is not installed since Fedora 26. So
when a program crashes, the core dump is dropped.

This was added in this commit:

commit 893ac2a4d610791e26f6debdab8b06f8c36bc18d
Author: Yeela Kaplan <ykaplan>
Date:   Mon Jul 6 18:27:47 2015 +0300

    Adding abrt dependency and introduce configurator for it

This is wrong; configuring core_pattern is done using
/usr/lib/sysctl/*.conf, and on Fedora it configured by:

$ cat /usr/lib/sysctl.d/50-coredump.conf
...
kernel.core_pattern=|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e

Removing the bad configuration restore core dumps.

Version-Release number of selected component (if applicable):
v4.40.0

How reproducible:
Always

Steps to Reproduce:
1. Run vm

2. Kill the vm

   kill -ABRT pid

Actual results:
No coredump is generated

Expected results:
Coredump generate

Setting to urgent since we must have coredumps to debug issue with qemu.

Comment 2 Nir Soffer 2020-01-08 16:52:38 UTC
Testing RHEL 8.1

We depend on abrt-addon-ccpp in vdsm spec:

 160 Requires: abrt-addon-ccpp

And abrt-ccpp.service is running:

# systemctl status abrt-ccpp
● abrt-ccpp.service - Install ABRT coredump hook
   Loaded: loaded (/usr/lib/systemd/system/abrt-ccpp.service; enabled; vendor preset: enabled)
   Active: active (exited) since Wed 2020-01-08 18:37:50 IST; 6min ago
  Process: 968 ExecStart=/usr/sbin/abrt-install-ccpp-hook install (code=exited, status=0/SUCCESS)
 Main PID: 968 (code=exited, status=0/SUCCESS)

Jan 08 18:37:50 host3 systemd[1]: Starting Install ABRT coredump hook...
Jan 08 18:37:50 host3 systemd[1]: Started Install ABRT coredump hook.

But core_pattern is configured to use coredumpctl:

$ cat /proc/sys/kernel/core_pattern 
|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e

Restarting abrt-ccpp changes the core pattern:

# systemctl start abrt-ccpp.service

# systemctl status abrt-ccpp.service 
● abrt-ccpp.service - Install ABRT coredump hook
   Loaded: loaded (/usr/lib/systemd/system/abrt-ccpp.service; enabled; vendor preset: enabled)
   Active: active (exited) since Wed 2020-01-08 18:02:32 IST; 2s ago
  Process: 7013 ExecStop=/usr/sbin/abrt-install-ccpp-hook uninstall (code=exited, status=0/SUCCESS)
  Process: 7018 ExecStart=/usr/sbin/abrt-install-ccpp-hook install (code=exited, status=0/SUCCESS)
 Main PID: 7018 (code=exited, status=0/SUCCESS)

# cat /proc/sys/kernel/core_pattern 
|/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t %P %I %h %e

So looks like abrt-ccpp.service is broken on RHEL 8.1.

To use abrt-ccpp on RHEL 8.1 we can install a sysctl drop-in configuration
(only on RHEL8.1):

# cat /usr/lib/sysctl.d/60-vdsmd.conf 
# Install by vdsm

kernel.core_pattern=|/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t %P %I %h %e

This is kind of ugly because this what abrt-ccpp.service should do, or at
least handled by abrt-ccpp, but it is a quick fix to continue to use abrt
on RHEL 8.1.

But all this trouble tell me that we need to move to systemd-coredump also
on RHEL 8.1. This is probably another RHEL 8 porting task that we missed,
because we did not follow upstream changes in Fedora.

If we want to keep using abrt-ccp on RHEL 8.1, we need to file abrt bug
for this.

Comment 3 Marcin Sobczyk 2020-03-13 09:15:37 UTC
> And abrt-ccpp.service is running
> ...
> But core_pattern is configured to use coredumpctl
> ...
> Restarting abrt-ccpp changes the core pattern
> ...
> So looks like abrt-ccpp.service is broken on RHEL 8.1.

Not really - this is simply how abrt-ccpp.service works. A fragment of 'abrt-ccpp.service' file:

[Service]
Type=oneshot
ExecStart=/usr/sbin/abrt-install-ccpp-hook install
ExecStop=/usr/sbin/abrt-install-ccpp-hook uninstall
RemainAfterExit=yes

It's a 'oneshot' service - installs the hook and says goodbye.
Systemd doesn't give up so easily though and on many occasions it will try to restore
its original coredump handler. Therefore, per sysctl.d manual,
we also need to mask systemd's '50-coredump.conf' configuration by creating
a symlink of '/etc/sysctl.d/50-coredump.conf' pointing do '/dev/null'.

> 
> To use abrt-ccpp on RHEL 8.1 we can install a sysctl drop-in configuration
> (only on RHEL8.1):
> ...

Turns out that core dumps are currently also broken on RHEL/CentOS!
This is because the pattern that we used to inject in 'vdsmd_init_common.sh'
is wrong (some % fields were missing that were added by newer abrt version).
This only strengthens my belief that we should definitely not try to define
the core pattern ourselves.

I wrote a test module to OST basic suite that crashes a VM on purpose and tests
if a core dump is generated so we have no regressions in this area.

> But all this trouble tell me that we need to move to systemd-coredump also
> on RHEL 8.1. This is probably another RHEL 8 porting task that we missed,
> because we did not follow upstream changes in Fedora.

Given the rather short timeline to 4.4 GA I think we should stick with
the known-and-tried abrt-ccpp for now, but switch to systemd-coredump in 4.5.

Comment 4 Marcin Sobczyk 2020-03-16 13:01:09 UTC
There is actually a bug filed for abrt on this issue [1], but since everyone else wants
to jump on the 'systemd-coredump' train it didn't get much attention.

After some chat with 'abrt' maintainers it's probably too late to get it fixed on their
side before 4.4 GA, but a possibility for the future.

In the long term we also should switch to systemd-coredump, but from what I've learned,
the feature that avoids core dump duplication is at the planning stage.
It would be nice to influence the 'abrt' team on this and prioritize appropriately.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1657158

Comment 5 Petr Kubica 2020-04-24 11:34:04 UTC
Verified in vdsm-4.40.13-1.el8ev.x86_64

Comment 6 Sandro Bonazzola 2020-05-20 20:03:39 UTC
This bugzilla is included in oVirt 4.4.0 release, published on May 20th 2020.

Since the problem described in this bug report should be
resolved in oVirt 4.4.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.