Bug 1707071 - no termination message provided by failing olm pods
Summary: no termination message provided by failing olm pods
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.1.0
Assignee: Evan Cordell
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-05-06 18:01 UTC by Luis Sanchez
Modified: 2023-09-14 05:28 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-04 10:48:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 0 None None None 2019-06-04 10:48:42 UTC

Description Luis Sanchez 2019-05-06 18:01:50 UTC
The OLM pods (catalog-operator, olm-operator, packageserver, etc..)  do not provide a termination message, hindering debugging efforts when the pods are crash looping.

At minimum, the pod's terminationMessagePolicy should be "FallbackToLogsOnError".

See https://kubernetes.io/docs/tasks/debug-application-cluster/determine-reason-pod-failure/#customizing-the-termination-message

Expected Results:
The termination message should appear in a pod container's  .status.lastState.terminated.message field.

Comment 4 Jian Zhang 2019-05-10 06:55:54 UTC
OLM version: io.openshift.build.commit.id=19e7914e33f723c6f77f7aaa0892c7684ce94ed4
Cluster version is 4.1.0-rc.2

mac:beta5 jianzhang$ oc get pods -n openshift-operator-lifecycle-manager
NAME                                READY   STATUS    RESTARTS   AGE
catalog-operator-744f687cf7-vqgn2   1/1     Running   0          63m
olm-operator-d86789c4b-xk2g5        1/1     Running   0          63m
olm-operators-m4lgh                 1/1     Running   0          61m
packageserver-7f57998d79-9crwd      1/1     Running   0          60m
packageserver-7f57998d79-j75nr      1/1     Running   0          60m

mac:beta5 jianzhang$ oc get pods -n openshift-operator-lifecycle-manager catalog-operator-744f687cf7-vqgn2 -o yaml|grep terminationMessage
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: FallbackToLogsOnError
mac:beta5 jianzhang$ oc get pods -n openshift-operator-lifecycle-manager olm-operator-d86789c4b-xk2g5 -o yaml|grep terminationMessage
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: FallbackToLogsOnError
mac:beta5 jianzhang$ oc get pods -n openshift-operator-lifecycle-manager packageserver-7f57998d79-9crwd  -o yaml|grep terminationMessage
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: FallbackToLogsOnError

LGTM, but for the ConfigMap server pod, we used the default `File`, which means the termination messages are retrieved only from the termination message file.
But, the `/dev/termination-log` file is empty and no log writing in it, is it as expected? 
mac:beta5 jianzhang$ oc get pods -n openshift-operator-lifecycle-manager olm-operators-m4lgh   -o yaml|grep terminationMessage
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File

mac:beta5 jianzhang$ oc rsh olm-operators-m4lgh 
sh-4.2$ cat /dev/termination-log 

sh-4.2$ ps -elf|cat  
F S UID         PID   PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME CMD
4 S 1001          1      0  0  80   0 - 164650 -     05:41 ?        00:00:01 configmap-server -c olm-operators -n openshift-operator-lifecycle-manager
4 S 1001      11662      0  0  80   0 -  2957 -      06:46 pts/0    00:00:00 /bin/sh
4 R 1001      11696  11662  0  80   0 - 12938 -      06:47 pts/0    00:00:00 ps -elf
0 S 1001      11697  11662  0  80   0 -  1098 -      06:47 pts/0    00:00:00 cat

Comment 5 Jian Zhang 2019-05-10 08:33:21 UTC
Aha, my misunderstanding, only the termination logs will be stored in the `/dev/termination-log` file, not the full logs. Correct me if I'm wrong.
LGTM, verify it.

Comment 7 errata-xmlrpc 2019-06-04 10:48:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758

Comment 8 Red Hat Bugzilla 2023-09-14 05:28:14 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.