Bug 1869406

Summary: must-gather should include historical pod logs
Product: [Red Hat Storage] Red Hat OpenShift Container Storage Reporter: Josh Durgin <jdurgin>
Component: must-gatherAssignee: Pulkit Kundra <pkundra>
Status: CLOSED ERRATA QA Contact: Aviad Polak <apolak>
Severity: high Docs Contact:
Priority: medium    
Version: 4.5CC: assingh, bhubbard, ebenahar, kramdoss, muagarwa, ocs-bugs, pkundra, sabose, tdesala, tnielsen
Target Milestone: ---Keywords: Automation, Reopened
Target Release: OCS 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: v4.7.0-249.ci Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-19 09:14:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1901134    
Bug Blocks:    

Description Josh Durgin 2020-08-17 21:18:02 UTC
If a pod is restarted, the prior logs are lost.

In case of a crash, for example, we lose all context leading up to the crash, which is critical for debugging.

One example is here: 

http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-vuf1cs33-p/jnk-vuf1cs33-p_20200815T005845/logs/failed_testcase_ocs_logs_1597456807/test_fio_workload_simple[CephBlockPool-sequential]_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-191e1a9fadc5b379104a64cc6516b8712acaf72f7c1ec31ad80263f5a3ba8128/ceph/namespaces/openshift-storage/pods/

osd.1 crashed, but the only log for it is from after the crash, when it was restarted.

Comment 2 Mudit Agarwal 2020-08-18 12:48:18 UTC
Doesn't look like a 4.5 candidate to me, moving it to 4.6. Please retarget if required.

Comment 7 Travis Nielsen 2020-10-07 20:36:42 UTC
K8s only saves one previous container after failure, so logs are only available for the one container previous to the current one. These two logs are already being collected by must-gather.

I haven't found a way to configure K8s to store more logs. It is unfortunate when a pod is crashlooping as you have found since you quickly lose the logs from the original crash.

Comment 8 Mudit Agarwal 2020-10-08 03:52:18 UTC
Doesn't look like something we can fix in OCS, can we close it as WONT_FIX? 

@karthick: In any case, shouldn't be a blocker for 4.6

Comment 9 krishnaram Karthick 2020-10-08 04:32:22 UTC
(In reply to Mudit Agarwal from comment #8)
> Doesn't look like something we can fix in OCS, can we close it as WONT_FIX? 
> 
> @karthick: In any case, shouldn't be a blocker for 4.6

Ack, removing the blocker flag for 4.6 
Thanks Travis & Pulkit for the explanation.

Comment 10 Mudit Agarwal 2020-10-08 05:21:09 UTC
Closing it as there is no good way to fix this.

Comment 11 Josh Durgin 2020-10-08 05:29:15 UTC
Re-opening as this renders the product unsupportable. It may not be easy, but there must be a way to get logs from kubernetes. Will discuss more with Travis and Sebastien.

Comment 13 Mudit Agarwal 2020-10-09 11:04:59 UTC
Moving it to 4.7 as it needs more discussion, please bring it back if we can fix it in 4.6 timeframe.

Comment 16 Travis Nielsen 2020-12-18 19:56:52 UTC
It should be under the dataDirHostPath, which for OCS should be /var/lib/rook/openshift-storage.
Note that the sidecar is not enabled by default, see Seb's PR for the setting to enable: https://github.com/rook/rook/pull/6679

Comment 18 Mudit Agarwal 2021-02-02 12:04:08 UTC
Pulkit, https://bugzilla.redhat.com/show_bug.cgi?id=1901134 is now fixed. Do we have plans to fix this one in 4.7?

Comment 26 errata-xmlrpc 2021-05-19 09:14:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2041