Bug 1869406 - must-gather should include historical pod logs
Summary: must-gather should include historical pod logs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: must-gather
Version: 4.5
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: OCS 4.7.0
Assignee: Pulkit Kundra
QA Contact: Aviad Polak
URL:
Whiteboard:
Depends On: 1901134
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-17 21:18 UTC by Josh Durgin
Modified: 2021-05-19 09:16 UTC (History)
10 users (show)

Fixed In Version: v4.7.0-249.ci
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-19 09:14:56 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2021:2041 0 None None None 2021-05-19 09:16:07 UTC

Description Josh Durgin 2020-08-17 21:18:02 UTC
If a pod is restarted, the prior logs are lost.

In case of a crash, for example, we lose all context leading up to the crash, which is critical for debugging.

One example is here: 

http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-vuf1cs33-p/jnk-vuf1cs33-p_20200815T005845/logs/failed_testcase_ocs_logs_1597456807/test_fio_workload_simple[CephBlockPool-sequential]_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-191e1a9fadc5b379104a64cc6516b8712acaf72f7c1ec31ad80263f5a3ba8128/ceph/namespaces/openshift-storage/pods/

osd.1 crashed, but the only log for it is from after the crash, when it was restarted.

Comment 2 Mudit Agarwal 2020-08-18 12:48:18 UTC
Doesn't look like a 4.5 candidate to me, moving it to 4.6. Please retarget if required.

Comment 7 Travis Nielsen 2020-10-07 20:36:42 UTC
K8s only saves one previous container after failure, so logs are only available for the one container previous to the current one. These two logs are already being collected by must-gather.

I haven't found a way to configure K8s to store more logs. It is unfortunate when a pod is crashlooping as you have found since you quickly lose the logs from the original crash.

Comment 8 Mudit Agarwal 2020-10-08 03:52:18 UTC
Doesn't look like something we can fix in OCS, can we close it as WONT_FIX? 

@karthick: In any case, shouldn't be a blocker for 4.6

Comment 9 krishnaram Karthick 2020-10-08 04:32:22 UTC
(In reply to Mudit Agarwal from comment #8)
> Doesn't look like something we can fix in OCS, can we close it as WONT_FIX? 
> 
> @karthick: In any case, shouldn't be a blocker for 4.6

Ack, removing the blocker flag for 4.6 
Thanks Travis & Pulkit for the explanation.

Comment 10 Mudit Agarwal 2020-10-08 05:21:09 UTC
Closing it as there is no good way to fix this.

Comment 11 Josh Durgin 2020-10-08 05:29:15 UTC
Re-opening as this renders the product unsupportable. It may not be easy, but there must be a way to get logs from kubernetes. Will discuss more with Travis and Sebastien.

Comment 13 Mudit Agarwal 2020-10-09 11:04:59 UTC
Moving it to 4.7 as it needs more discussion, please bring it back if we can fix it in 4.6 timeframe.

Comment 16 Travis Nielsen 2020-12-18 19:56:52 UTC
It should be under the dataDirHostPath, which for OCS should be /var/lib/rook/openshift-storage.
Note that the sidecar is not enabled by default, see Seb's PR for the setting to enable: https://github.com/rook/rook/pull/6679

Comment 18 Mudit Agarwal 2021-02-02 12:04:08 UTC
Pulkit, https://bugzilla.redhat.com/show_bug.cgi?id=1901134 is now fixed. Do we have plans to fix this one in 4.7?

Comment 26 errata-xmlrpc 2021-05-19 09:14:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2041


Note You need to log in before you can comment on or make changes to this bug.