Bug 1927640 - enahancement/debug: Option to generate core dump without killing the process
Summary: enahancement/debug: Option to generate core dump without killing the process
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterfs
Version: rhgs-3.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: RHGS 3.5.z Batch Update 7
Assignee: Vinayak Hariharmath
QA Contact: milind
URL:
Whiteboard:
Depends On:
Blocks: 1841608
TreeView+ depends on / blocked
 
Reported: 2021-02-11 08:37 UTC by Vinayak Hariharmath
Modified: 2021-10-06 08:38 UTC (History)
8 users (show)

Fixed In Version: glusterfs-6.0-57
Doc Type: Enhancement
Doc Text:
With this update, the option to generate a core dump without killing the process is introduced.
Clone Of:
Environment:
Last Closed: 2021-10-05 07:55:30 UTC
Embargoed:
vharihar: needinfo-
shilpsha: needinfo+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2021:3728 0 None None None 2021-10-05 07:55:33 UTC

Internal Links: 1959348

Description Vinayak Hariharmath 2021-02-11 08:37:04 UTC
On production systems sometimes we see a log message saying that an assertion
has failed. But it's hard to track why it failed without additional information
(on debug builds, a GF_ASSERT() generates a core dump and kills the process,
so it can be used to debug the issue, but many times we are only able to
reproduce assertion failures on production systems, where GF_ASSERT() only logs
a message and continues).

In other cases, we may have a core dump caused by a bug, but the core dump doesn't
necessarily happen when the bug has happened. Sometimes the crash happens so much
later that the causes that triggered the bug are lost. In these cases we can add
more assertions to the places that touch the potential candidates to cause the bug,
but the only thing we'll get is a log message, which may not be enough.

One solution would be to always generate a core dump in case of assertion failure,
but this was already discussed and it was decided that it was too drastic. If a
core dump was really needed, a new macro was created to do so: GF_ABORT(),
but GF_ASSERT() would continue to not kill the process on production systems.

I'm proposing to modify GF_ASSERT() on production builds so that it conditionally
triggers a signal when a debugger is attached. When this happens, the debugger
will generate a core dump and continue the process as if nothing had happened.
If there's no debugger attached, GF_ASSERT() will behave as always.

The idea I have is to use SIGCONT to do that. This signal is harmless, so we can
unmask it (we currently mask all unneeded signals) and raise it inside a GF_ASSERT()
when some global variable is set to true.

To produce the core dump, run the script under extras/debug/gfcore.py on other
terminal. gdb breaks and produces coredump when GF_ASSERT is hit.

Comment 27 errata-xmlrpc 2021-10-05 07:55:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHGS 3.5.z Batch Update 5 glusterfs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3728


Note You need to log in before you can comment on or make changes to this bug.