Bug 1899162

Summary: RFE: let the admin configure the coredump naming
Product: Red Hat Enterprise Linux 8 Reporter: Renaud Métrich <rmetrich>
Component: systemdAssignee: systemd maint <systemd-maint>
Status: NEW --- QA Contact: Frantisek Sumsal <fsumsal>
Severity: high Docs Contact:
Priority: high    
Version: 8.3CC: bhershbe, christoph.obexer, dtardon, fkrohn, jseunghw, kpelc, systemd-maint-list, zbyszek
Target Milestone: rcKeywords: FutureFeature
Target Release: 8.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Story
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Renaud Métrich 2020-11-18 16:12:25 UTC
Description of problem:

On RHEL8, the coredump naming in /var/lib/systemd/coredump is hardcoded to "core.<COMM>.<UID>.<bootid>.<PID>.<TS>000000"

This needs to be enhanced to let the admin name the coredumps as he needs to, typically adding the <HOSTNAME> (hostname or container name), which is initially available when systemd-coredump executes through kernel.core_pattern.

This would help a lot analyzing OCP issues for examples.


Version-Release number of selected component (if applicable):

systemd-239 but also Upstream (looks like at least, from reading the code)


How reproducible:

ALWAYS

Steps to Reproduce:
1. Create a miniroot to execute "bash"

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
# yum -y install --installroot=/tmp/test --releasever=/ bash
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

2. Spawn a container and crash bash in the container

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
# systemd-nspawn -D /tmp/test bash
Spawning container test on /tmp/test.
Press ^] three times within 1s to kill container.

bash-4.4# ulimit -c unlimited
bash-4.4# function foo {
foo
}
bash-4.4# foo
Container test terminated by signal SEGV.
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

3. Check the core filename (it doesn't show the container name, which would be useful)

Actual results:

------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
# ls /var/lib/systemd/coredump
core.bash.0.33efdaea7a3c4cce86184cbbb6c28368.2343.1605715274000000
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Expected results:

Some custom naming based on available properties

Comment 1 David Tardon 2020-11-19 09:28:33 UTC
AFAIK hostname is available in metadata, so what you want should already be possible with coredumpctl, like this:

# coredumpctl list _HOSTNAME=<hostname>

Comment 2 Renaud Métrich 2020-11-23 07:59:39 UTC
This doesn't work. coredumpctl shows always Hostname as the name of the host ("vm-rhel8" in my case), even though the %h passed is "test" (name of my container).

Additionally, even though this could be a possibility, it wouldn't be convenient since admins would have to use coredumpctl commands to find out what's they are interested into.

Comment 3 David Tardon 2020-11-23 09:46:27 UTC
(In reply to Renaud Métrich from comment #2)
> This doesn't work. coredumpctl shows always Hostname as the name of the host
> ("vm-rhel8" in my case), even though the %h passed is "test" (name of my
> container).

Yeah, I didn't check... The right field is COREDUMP_HOSTNAME.

> Additionally, even though this could be a possibility, it wouldn't be
> convenient since admins would have to use coredumpctl commands to find out
> what's they are interested into.

IMHO that's what they should do anyway. It allows to match all available metadata and it is more reliable than parsing coredump filenames with ad hoc globs or regexes.

Comment 4 Renaud Métrich 2020-11-23 09:57:48 UTC
Indeed this works.
However, this isn't "reliable" since coredumpctl bases itself on the journal, so:
- if the journal rotates / vacuums, you may not have the information
- if the journal is not persistent, after reboot you lose the information

Comment 5 Zbigniew Jędrzejewski-Szmek 2020-12-01 10:28:40 UTC
To summarize the discussion during the upstream meeting today:
The file name is supposed to be unique and somewhat informative, but the details are not fixed API.
It does include the machine id, so files from different containers are somewhat segregated.

Two "official" interfaces exist:
1. the journal entry which has all the metadata
2. the extended attributes on the file

$ sudo getfattr --absolute-names -d /var/lib/systemd/coredump/core.systemd.0.b07239dbd2264cc2bf9b070929ead2a7.697927.1606814763000000.zst
# file: var/lib/systemd/coredump/core.systemd.0.b07239dbd2264cc2bf9b070929ead2a7.697927.1606814763000000.zst
user.coredump.comm="systemd"
user.coredump.exe="/usr/lib/systemd/systemd"
user.coredump.gid="0"
user.coredump.hostname="rawhide"
user.coredump.pid="697927"
user.coredump.rlimit="18446744073709551615"
user.coredump.signal="11"
user.coredump.timestamp="1606814763000000"
user.coredump.uid="0"

The attributes obviously stay behind even if the journal entry is removed.

Comment 12 Christoph Obexer 2022-05-12 09:20:54 UTC
A list of all the required information for coredump analysis:
 * Kubernetes namespace
 * Pod name
 * Container name
 * Executable name
 * Signal
 * Timestamp
 * User
 * Group
 * Image used to run the binary
 * SELinux information
 * possibly more

Putting all of these in the filename is not going to work...

How about we instead get as much of that metadata as possible into an appropriate place like the system journal and/or the core dump file attributes or a file next to the dump?

Pros:
 * less stuff to configure
 * fewer things to coordinate across teams
 * more relevant information available
 * defined API

Cons:
 * None

Comment 13 Renaud Métrich 2022-05-12 11:10:21 UTC
There are already metadata stored as extended attributes.