Bug 1707071 - no termination message provided by failing olm pods [NEEDINFO]
Summary: no termination message provided by failing olm pods
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.1.0
Assignee: Evan Cordell
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-05-06 18:01 UTC by Luis Sanchez
Modified: 2019-12-02 16:03 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-04 10:48:31 UTC
Target Upstream Version:
jiazha: needinfo? (amerdler)


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 None None None 2019-06-04 10:48:42 UTC

Description Luis Sanchez 2019-05-06 18:01:50 UTC
The OLM pods (catalog-operator, olm-operator, packageserver, etc..)  do not provide a termination message, hindering debugging efforts when the pods are crash looping.

At minimum, the pod's terminationMessagePolicy should be "FallbackToLogsOnError".

See https://kubernetes.io/docs/tasks/debug-application-cluster/determine-reason-pod-failure/#customizing-the-termination-message

Expected Results:
The termination message should appear in a pod container's  .status.lastState.terminated.message field.

Comment 4 Jian Zhang 2019-05-10 06:55:54 UTC
OLM version: io.openshift.build.commit.id=19e7914e33f723c6f77f7aaa0892c7684ce94ed4
Cluster version is 4.1.0-rc.2

mac:beta5 jianzhang$ oc get pods -n openshift-operator-lifecycle-manager
NAME                                READY   STATUS    RESTARTS   AGE
catalog-operator-744f687cf7-vqgn2   1/1     Running   0          63m
olm-operator-d86789c4b-xk2g5        1/1     Running   0          63m
olm-operators-m4lgh                 1/1     Running   0          61m
packageserver-7f57998d79-9crwd      1/1     Running   0          60m
packageserver-7f57998d79-j75nr      1/1     Running   0          60m

mac:beta5 jianzhang$ oc get pods -n openshift-operator-lifecycle-manager catalog-operator-744f687cf7-vqgn2 -o yaml|grep terminationMessage
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: FallbackToLogsOnError
mac:beta5 jianzhang$ oc get pods -n openshift-operator-lifecycle-manager olm-operator-d86789c4b-xk2g5 -o yaml|grep terminationMessage
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: FallbackToLogsOnError
mac:beta5 jianzhang$ oc get pods -n openshift-operator-lifecycle-manager packageserver-7f57998d79-9crwd  -o yaml|grep terminationMessage
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: FallbackToLogsOnError

LGTM, but for the ConfigMap server pod, we used the default `File`, which means the termination messages are retrieved only from the termination message file.
But, the `/dev/termination-log` file is empty and no log writing in it, is it as expected? 
mac:beta5 jianzhang$ oc get pods -n openshift-operator-lifecycle-manager olm-operators-m4lgh   -o yaml|grep terminationMessage
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File

mac:beta5 jianzhang$ oc rsh olm-operators-m4lgh 
sh-4.2$ cat /dev/termination-log 

sh-4.2$ ps -elf|cat  
F S UID         PID   PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME CMD
4 S 1001          1      0  0  80   0 - 164650 -     05:41 ?        00:00:01 configmap-server -c olm-operators -n openshift-operator-lifecycle-manager
4 S 1001      11662      0  0  80   0 -  2957 -      06:46 pts/0    00:00:00 /bin/sh
4 R 1001      11696  11662  0  80   0 - 12938 -      06:47 pts/0    00:00:00 ps -elf
0 S 1001      11697  11662  0  80   0 -  1098 -      06:47 pts/0    00:00:00 cat

Comment 5 Jian Zhang 2019-05-10 08:33:21 UTC
Aha, my misunderstanding, only the termination logs will be stored in the `/dev/termination-log` file, not the full logs. Correct me if I'm wrong.
LGTM, verify it.

Comment 7 errata-xmlrpc 2019-06-04 10:48:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.