Bug 2185784 - [OCP Tracker] [RDR] Rook should be pre-configured to collect auto-generated coredumps in case of crash events
Summary: [OCP Tracker] [RDR] Rook should be pre-configured to collect auto-generated c...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: rook
Version: 4.13
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Subham Rai
QA Contact: Neha Berry
URL:
Whiteboard:
Depends On:
Blocks: 2098118
TreeView+ depends on / blocked
 
Reported: 2023-04-11 07:41 UTC by Aman Agrawal
Modified: 2023-08-14 05:09 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-02 09:08:12 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OCPBUGS-16786 0 None None None 2023-07-26 06:35:09 UTC
Red Hat Issue Tracker RHSTOR-4573 0 None None None 2023-05-30 06:14:51 UTC

Comment 2 Subham Rai 2023-04-24 07:23:57 UTC
In rook, if you enable the `logCollector`, the core dump should be collected under `ls -lhsa /var/lib/systemd/coredump`. could you double check if logCollector is enabled, and also how the process is terminated?.

Please check this upstream comment https://github.com/rook/rook/issues/10788#issuecomment-1280809186 where we have confirmed that core dump is collected once the process is terminated. 

Also, you can read https://rook.github.io/docs/rook/latest/CRDs/Cluster/ceph-cluster-crd/#cluster-settings under `logCollector`

```
logCollector: The settings for log collector daemon.
enabled: if set to true, the log collector will run as a side-car next to each Ceph daemon. The Ceph configuration option log_to_file will be turned on, meaning Ceph daemons will log on files in addition to still logging to container's stdout. These logs will be rotated. In case a daemon terminates with a segfault, the coredump files will be commonly be generated in /var/lib/systemd/coredump directory on the host, depending on the underlying OS location.
```

Comment 4 Subham Rai 2023-04-25 10:22:24 UTC
Given the sort time (the 2nd of May is DF), this requires some good work IMO and testing too. We can try to get this in 4.13.z.

@tnielsen Thoughts on this?

Comment 5 Travis Nielsen 2023-04-25 14:07:57 UTC
What changes are needed in Rook? If I follow the links from the conversation above, this mentions changes to systemd: 
https://bugzilla.redhat.com/show_bug.cgi?id=2098118#c63

That may work for RHCS, but not in an OCP environment, or at least Rook doesn't have the ability to modify systemd.

Comment 6 Subham Rai 2023-04-25 15:15:11 UTC
I was not aware that Rook doesn't have the ability to modify systems. So, I think we can move this other component since these only require changes on systems.

cc @muagarwa

Comment 8 Bipin Kunal 2023-05-15 06:17:53 UTC
Subham, Travis, and Mudit, what do you think?

Comment 10 Subham Rai 2023-05-15 10:04:35 UTC
(In reply to Bipin Kunal from comment #8)
> Subham, Travis, and Mudit, what do you think?

Make sense to open Jira on ocp team and meantime have the documentation ready.

Comment 12 Mudit Agarwal 2023-05-30 06:14:51 UTC
We should keep it open and mark it as a tracker for OCP BZ

Comment 13 Blaine Gardner 2023-07-25 15:22:47 UTC
@muagarwa could you help by opening the OCP BZ? I'm not sure what the right process here is since they are using Jira now.

Comment 14 Mudit Agarwal 2023-07-26 06:35:09 UTC
I converted the Jira raised by Subham to OCP bug, please check if the component is correct or not.

https://issues.redhat.com/browse/OCPBUGS-16786

Comment 15 Subham Rai 2023-08-02 09:08:12 UTC
Closing this bz since the Jira mentioned above is closed.

cc @muagarwa


Note You need to log in before you can comment on or make changes to this bug.