Bug 1952238

Summary: Catalog pods don't report termination logs to catalog-operator
Product: OpenShift Container Platform Reporter: Anik <anbhatta>
Component: OLMAssignee: Anik <anbhatta>
OLM sub component: OLM QA Contact: Jian Zhang <jiazha>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: urgent CC: tflannag
Version: 4.8   
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-27 23:02:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Anik 2021-04-21 20:17:34 UTC
Description of problem:

When a container in a catalog pod terminates, the logs for the terminated containers are not reported back to the catalog operator.

Version-Release number of selected component (if applicable):

How reproducible:


Steps to Reproduce:

Actual results:

Expected results:

Additional info:

https://github.com/operator-framework/operator-lifecycle-manager/blob/master/pkg/controller/registry/reconciler/reconciler.go#L105 the pod created here has terminationMessagePolicy: terminationMessageReadFile (the default policy) set, and should be set to terminationMessagePolicy: terminationMessageFallBackToLogsOnError instead.

Comment 1 Anik 2021-04-21 20:20:06 UTC
Setting the priority as urgent since the logs are needed to investigate why catalog pods in the openshift-marketplace namespace are crashlooping around 20% of the time https://bugzilla.redhat.com/show_bug.cgi?id=1949991#c6

Comment 3 tflannag 2021-04-22 17:42:39 UTC
Moving back to ASSIGNED as the PR this BZ is tracking was merged against the upstream repository so QE has no way of validating these changes.

Comment 5 Jian Zhang 2021-04-26 07:27:45 UTC
Cluster version is 4.8.0-0.nightly-2021-04-25-195440
[jzhang@dhcp-140-36 ~]$ oc  -n openshift-operator-lifecycle-manager  exec catalog-operator-7b6d5b8c8f-cxscr  -- olm --version
OLM version: 0.17.0
git commit: 9fa1f1249e3acc15b1f628d5f96e7b7047e9f176

[jzhang@dhcp-140-36 ~]$ oc project
Using project "openshift-marketplace" on server "https://api.huirwang-0426a.qe.devcluster.openshift.com:6443".

[jzhang@dhcp-140-36 ~]$ oc get pods
NAME                                   READY   STATUS    RESTARTS   AGE
certified-operators-fwcxv              1/1     Running   0          31m
community-operators-mb2tz              1/1     Running   0          5h
marketplace-operator-5d97446c8-wlv5z   1/1     Running   0          5h7m
qe-app-registry-jnjk5                  1/1     Running   0          5h5m
redhat-marketplace-k9xts               1/1     Running   0          5h
redhat-operators-x969k                 1/1     Running   0          5h9m

[jzhang@dhcp-140-36 ~]$ for l in `oc get pod|awk 'NR == 1 {next} {print $1}'`; do oc get pod $l -o=jsonpath={.spec.containers[0].terminationMessagePolicy}; echo "-$l"; done

LGTM, all of the CatalogSources' pods use the "FallbackToLogsOnError" termination message policy now.

Comment 8 errata-xmlrpc 2021-07-27 23:02:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.