Bug 1952238

Summary:	Catalog pods don't report termination logs to catalog-operator
Product:	OpenShift Container Platform	Reporter:	Anik <anbhatta>
Component:	OLM	Assignee:	Anik <anbhatta>
OLM sub component:	OLM	QA Contact:	Jian Zhang <jiazha>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	medium
Priority:	urgent	CC:	tflannag
Version:	4.8
Target Milestone:	---
Target Release:	4.8.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-07-27 23:02:36 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Anik 2021-04-21 20:17:34 UTC

Description of problem:

When a container in a catalog pod terminates, the logs for the terminated containers are not reported back to the catalog operator.

Version-Release number of selected component (if applicable):


How reproducible:

Always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

https://github.com/operator-framework/operator-lifecycle-manager/blob/master/pkg/controller/registry/reconciler/reconciler.go#L105 the pod created here has terminationMessagePolicy: terminationMessageReadFile (the default policy) set, and should be set to terminationMessagePolicy: terminationMessageFallBackToLogsOnError instead.

Comment 1 Anik 2021-04-21 20:20:06 UTC

Setting the priority as urgent since the logs are needed to investigate why catalog pods in the openshift-marketplace namespace are crashlooping around 20% of the time https://bugzilla.redhat.com/show_bug.cgi?id=1949991#c6

Comment 3 tflannag 2021-04-22 17:42:39 UTC

Moving back to ASSIGNED as the PR this BZ is tracking was merged against the upstream repository so QE has no way of validating these changes.

Comment 5 Jian Zhang 2021-04-26 07:27:45 UTC

Cluster version is 4.8.0-0.nightly-2021-04-25-195440
[jzhang@dhcp-140-36 ~]$ oc  -n openshift-operator-lifecycle-manager  exec catalog-operator-7b6d5b8c8f-cxscr  -- olm --version
OLM version: 0.17.0
git commit: 9fa1f1249e3acc15b1f628d5f96e7b7047e9f176

[jzhang@dhcp-140-36 ~]$ oc project
Using project "openshift-marketplace" on server "https://api.huirwang-0426a.qe.devcluster.openshift.com:6443".

[jzhang@dhcp-140-36 ~]$ oc get pods
NAME                                   READY   STATUS    RESTARTS   AGE
certified-operators-fwcxv              1/1     Running   0          31m
community-operators-mb2tz              1/1     Running   0          5h
marketplace-operator-5d97446c8-wlv5z   1/1     Running   0          5h7m
qe-app-registry-jnjk5                  1/1     Running   0          5h5m
redhat-marketplace-k9xts               1/1     Running   0          5h
redhat-operators-x969k                 1/1     Running   0          5h9m

[jzhang@dhcp-140-36 ~]$ for l in `oc get pod|awk 'NR == 1 {next} {print $1}'`; do oc get pod $l -o=jsonpath={.spec.containers[0].terminationMessagePolicy}; echo "-$l"; done
FallbackToLogsOnError-certified-operators-fwcxv
FallbackToLogsOnError-community-operators-mb2tz
File-marketplace-operator-5d97446c8-wlv5z
FallbackToLogsOnError-qe-app-registry-jnjk5
FallbackToLogsOnError-redhat-marketplace-k9xts
FallbackToLogsOnError-redhat-operators-x969k

LGTM, all of the CatalogSources' pods use the "FallbackToLogsOnError" termination message policy now.

Comment 8 errata-xmlrpc 2021-07-27 23:02:36 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438