Bug 2185784

Summary: [OCP Tracker] [RDR] Rook should be pre-configured to collect auto-generated coredumps in case of crash events
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Aman Agrawal <amagrawa>
Component: rookAssignee: Subham Rai <srai>
Status: CLOSED NOTABUG QA Contact: Neha Berry <nberry>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.13CC: bkunal, brgardne, ebenahar, kseeger, muagarwa, ocs-bugs, odf-bz-bot, srai, tnielsen
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-08-02 09:08:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2098118    

Comment 2 Subham Rai 2023-04-24 07:23:57 UTC
In rook, if you enable the `logCollector`, the core dump should be collected under `ls -lhsa /var/lib/systemd/coredump`. could you double check if logCollector is enabled, and also how the process is terminated?.

Please check this upstream comment https://github.com/rook/rook/issues/10788#issuecomment-1280809186 where we have confirmed that core dump is collected once the process is terminated. 

Also, you can read https://rook.github.io/docs/rook/latest/CRDs/Cluster/ceph-cluster-crd/#cluster-settings under `logCollector`

```
logCollector: The settings for log collector daemon.
enabled: if set to true, the log collector will run as a side-car next to each Ceph daemon. The Ceph configuration option log_to_file will be turned on, meaning Ceph daemons will log on files in addition to still logging to container's stdout. These logs will be rotated. In case a daemon terminates with a segfault, the coredump files will be commonly be generated in /var/lib/systemd/coredump directory on the host, depending on the underlying OS location.
```

Comment 4 Subham Rai 2023-04-25 10:22:24 UTC
Given the sort time (the 2nd of May is DF), this requires some good work IMO and testing too. We can try to get this in 4.13.z.

@tnielsen Thoughts on this?

Comment 5 Travis Nielsen 2023-04-25 14:07:57 UTC
What changes are needed in Rook? If I follow the links from the conversation above, this mentions changes to systemd: 
https://bugzilla.redhat.com/show_bug.cgi?id=2098118#c63

That may work for RHCS, but not in an OCP environment, or at least Rook doesn't have the ability to modify systemd.

Comment 6 Subham Rai 2023-04-25 15:15:11 UTC
I was not aware that Rook doesn't have the ability to modify systems. So, I think we can move this other component since these only require changes on systems.

cc @muagarwa

Comment 8 Bipin Kunal 2023-05-15 06:17:53 UTC
Subham, Travis, and Mudit, what do you think?

Comment 10 Subham Rai 2023-05-15 10:04:35 UTC
(In reply to Bipin Kunal from comment #8)
> Subham, Travis, and Mudit, what do you think?

Make sense to open Jira on ocp team and meantime have the documentation ready.

Comment 12 Mudit Agarwal 2023-05-30 06:14:51 UTC
We should keep it open and mark it as a tracker for OCP BZ

Comment 13 Blaine Gardner 2023-07-25 15:22:47 UTC
@muagarwa could you help by opening the OCP BZ? I'm not sure what the right process here is since they are using Jira now.

Comment 14 Mudit Agarwal 2023-07-26 06:35:09 UTC
I converted the Jira raised by Subham to OCP bug, please check if the component is correct or not.

https://issues.redhat.com/browse/OCPBUGS-16786

Comment 15 Subham Rai 2023-08-02 09:08:12 UTC
Closing this bz since the Jira mentioned above is closed.

cc @muagarwa