Bug 1952238 - Catalog pods don't report termination logs to catalog-operator
Summary: Catalog pods don't report termination logs to catalog-operator
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.8
Hardware: Unspecified
OS: Unspecified
urgent
medium
Target Milestone: ---
: 4.8.0
Assignee: Anik
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-21 20:17 UTC by Anik
Modified: 2021-07-27 23:02 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 23:02:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift operator-framework-olm pull 65 0 None open Bug 1952238: Report catalog pod termination logs to catalog operator on exit 2021-04-22 18:08:05 UTC
Github operator-framework operator-lifecycle-manager pull 2112 0 None open Bug 1952238: Report catalog pod termination logs to catalog operator on exit 2021-04-21 20:25:17 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:02:54 UTC

Description Anik 2021-04-21 20:17:34 UTC
Description of problem:

When a container in a catalog pod terminates, the logs for the terminated containers are not reported back to the catalog operator.

Version-Release number of selected component (if applicable):


How reproducible:

Always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

https://github.com/operator-framework/operator-lifecycle-manager/blob/master/pkg/controller/registry/reconciler/reconciler.go#L105 the pod created here has terminationMessagePolicy: terminationMessageReadFile (the default policy) set, and should be set to terminationMessagePolicy: terminationMessageFallBackToLogsOnError instead.

Comment 1 Anik 2021-04-21 20:20:06 UTC
Setting the priority as urgent since the logs are needed to investigate why catalog pods in the openshift-marketplace namespace are crashlooping around 20% of the time https://bugzilla.redhat.com/show_bug.cgi?id=1949991#c6

Comment 3 tflannag 2021-04-22 17:42:39 UTC
Moving back to ASSIGNED as the PR this BZ is tracking was merged against the upstream repository so QE has no way of validating these changes.

Comment 5 Jian Zhang 2021-04-26 07:27:45 UTC
Cluster version is 4.8.0-0.nightly-2021-04-25-195440
[jzhang@dhcp-140-36 ~]$ oc  -n openshift-operator-lifecycle-manager  exec catalog-operator-7b6d5b8c8f-cxscr  -- olm --version
OLM version: 0.17.0
git commit: 9fa1f1249e3acc15b1f628d5f96e7b7047e9f176

[jzhang@dhcp-140-36 ~]$ oc project
Using project "openshift-marketplace" on server "https://api.huirwang-0426a.qe.devcluster.openshift.com:6443".

[jzhang@dhcp-140-36 ~]$ oc get pods
NAME                                   READY   STATUS    RESTARTS   AGE
certified-operators-fwcxv              1/1     Running   0          31m
community-operators-mb2tz              1/1     Running   0          5h
marketplace-operator-5d97446c8-wlv5z   1/1     Running   0          5h7m
qe-app-registry-jnjk5                  1/1     Running   0          5h5m
redhat-marketplace-k9xts               1/1     Running   0          5h
redhat-operators-x969k                 1/1     Running   0          5h9m

[jzhang@dhcp-140-36 ~]$ for l in `oc get pod|awk 'NR == 1 {next} {print $1}'`; do oc get pod $l -o=jsonpath={.spec.containers[0].terminationMessagePolicy}; echo "-$l"; done
FallbackToLogsOnError-certified-operators-fwcxv
FallbackToLogsOnError-community-operators-mb2tz
File-marketplace-operator-5d97446c8-wlv5z
FallbackToLogsOnError-qe-app-registry-jnjk5
FallbackToLogsOnError-redhat-marketplace-k9xts
FallbackToLogsOnError-redhat-operators-x969k

LGTM, all of the CatalogSources' pods use the "FallbackToLogsOnError" termination message policy now.

Comment 8 errata-xmlrpc 2021-07-27 23:02:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.