1869406 – must-gather should include historical pod logs

Bug 1869406 - must-gather should include historical pod logs

Summary: must-gather should include historical pod logs

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Container Storage
Classification:	Red Hat Storage
Component:	must-gather
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	OCS 4.7.0
Assignee:	Pulkit Kundra
QA Contact:	Aviad Polak
Docs Contact:
URL:
Whiteboard:
Depends On:	1901134
Blocks:
TreeView+	depends on / blocked

Reported:	2020-08-17 21:18 UTC by Josh Durgin
Modified:	2021-05-19 09:16 UTC (History)
CC List:	10 users (show)
Fixed In Version:	v4.7.0-249.ci
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-05-19 09:14:56 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2021:2041	0	None	None	None	2021-05-19 09:16:07 UTC

Description Josh Durgin 2020-08-17 21:18:02 UTC

If a pod is restarted, the prior logs are lost.

In case of a crash, for example, we lose all context leading up to the crash, which is critical for debugging.

One example is here: 

http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-vuf1cs33-p/jnk-vuf1cs33-p_20200815T005845/logs/failed_testcase_ocs_logs_1597456807/test_fio_workload_simple[CephBlockPool-sequential]_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-191e1a9fadc5b379104a64cc6516b8712acaf72f7c1ec31ad80263f5a3ba8128/ceph/namespaces/openshift-storage/pods/

osd.1 crashed, but the only log for it is from after the crash, when it was restarted.

Comment 2 Mudit Agarwal 2020-08-18 12:48:18 UTC

Doesn't look like a 4.5 candidate to me, moving it to 4.6. Please retarget if required.

Comment 7 Travis Nielsen 2020-10-07 20:36:42 UTC

K8s only saves one previous container after failure, so logs are only available for the one container previous to the current one. These two logs are already being collected by must-gather.

I haven't found a way to configure K8s to store more logs. It is unfortunate when a pod is crashlooping as you have found since you quickly lose the logs from the original crash.

Comment 8 Mudit Agarwal 2020-10-08 03:52:18 UTC

Doesn't look like something we can fix in OCS, can we close it as WONT_FIX? 

@karthick: In any case, shouldn't be a blocker for 4.6

Comment 9 krishnaram Karthick 2020-10-08 04:32:22 UTC

(In reply to Mudit Agarwal from comment #8)
> Doesn't look like something we can fix in OCS, can we close it as WONT_FIX? 
> 
> @karthick: In any case, shouldn't be a blocker for 4.6

Ack, removing the blocker flag for 4.6 
Thanks Travis & Pulkit for the explanation.

Comment 10 Mudit Agarwal 2020-10-08 05:21:09 UTC

Closing it as there is no good way to fix this.

Comment 11 Josh Durgin 2020-10-08 05:29:15 UTC

Re-opening as this renders the product unsupportable. It may not be easy, but there must be a way to get logs from kubernetes. Will discuss more with Travis and Sebastien.

Comment 13 Mudit Agarwal 2020-10-09 11:04:59 UTC

Moving it to 4.7 as it needs more discussion, please bring it back if we can fix it in 4.6 timeframe.

Comment 16 Travis Nielsen 2020-12-18 19:56:52 UTC

It should be under the dataDirHostPath, which for OCS should be /var/lib/rook/openshift-storage.
Note that the sidecar is not enabled by default, see Seb's PR for the setting to enable: https://github.com/rook/rook/pull/6679

Comment 18 Mudit Agarwal 2021-02-02 12:04:08 UTC

Pulkit, https://bugzilla.redhat.com/show_bug.cgi?id=1901134 is now fixed. Do we have plans to fix this one in 4.7?

Comment 26 errata-xmlrpc 2021-05-19 09:14:56 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2041

Note You need to log in before you can comment on or make changes to this bug.