Bug 1838973 - Insights operator should collect pod logs
Summary: Insights operator should collect pod logs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Insights Operator
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.6.0
Assignee: Alexandre Vicenzi
QA Contact: Angelina Vasileva
Radek Vokál
URL:
Whiteboard:
Depends On:
Blocks: 1844413
TreeView+ depends on / blocked
 
Reported: 2020-05-22 08:14 UTC by Vadim Rutkovsky
Modified: 2020-10-27 16:00 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:00:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift insights-operator pull 115 0 None closed Bug 1838973: GatherClusterOperators: store pod logs 2020-10-21 13:16:26 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:00:54 UTC

Description Vadim Rutkovsky 2020-05-22 08:14:05 UTC
Description of problem:
Insights operator collects pod manifests, related to failing operator. It should also collect current/previous pod logs (at least latest ~100 lines) of all containers in that pod to diagnose the failure better without requiring a must-gather

Comment 3 Angelina Vasileva 2020-06-05 12:18:16 UTC
Fixed and verified in 4.6.0-0.nightly-2020-06-04-232426.

Verification steps:

1. Degrade some operator

oc -n openshift-monitoring create configmap cluster-monitoring-config
oc -n openshift-monitoring edit configmap cluster-monitoring-config

Add the following data to the config (with invalid value):

apiVersion: v1
data:
  config.yaml: |
    telemeterClient:
      enabled: NOT_BOOELAN
kind: ConfigMap
metadata:
...

Delete cluster-monitoring-operator* pod

oc get pods  -n openshift-monitoring
oc delete pod cluster-monitoring-operator-7b8665747f-w2fwv -n openshift-monitoring

2. Check operator is degraded

$ oc get co monitoring
NAME         VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
monitoring   4.6.0-0.nightly-2020-06-04-232426   False       False         True       51s

3. Download fresh archive from AWS S3

4. Check the content of the archive

$ ll -R openshift-monitoring/logs/
openshift-monitoring/logs/:
total 8
drwxr-xr-x. 2 anikifor anikifor 4096 Jun  5 13:58 prometheus-k8s-0
drwxr-xr-x. 2 anikifor anikifor 4096 Jun  5 13:58 prometheus-k8s-1

openshift-monitoring/logs/prometheus-k8s-0:
total 52
-rw-r-----. 1 anikifor anikifor   512 Jun  5 13:56 kube-rbac-proxy_current.log
-rw-r-----. 1 anikifor anikifor  1036 Jun  5 13:56 prometheus-config-reloader_current.log
-rw-r-----. 1 anikifor anikifor 16734 Jun  5 13:56 prometheus_current.log
-rw-r-----. 1 anikifor anikifor  2788 Jun  5 13:56 prometheus_previous.log
-rw-r-----. 1 anikifor anikifor  4549 Jun  5 13:56 prometheus-proxy_current.log
-rw-r-----. 1 anikifor anikifor    59 Jun  5 13:56 prom-label-proxy_current.log
-rw-r-----. 1 anikifor anikifor   180 Jun  5 13:56 rules-configmap-reloader_current.log
-rw-r-----. 1 anikifor anikifor  2392 Jun  5 13:56 thanos-sidecar_current.log

openshift-monitoring/logs/prometheus-k8s-1:
total 68
-rw-r-----. 1 anikifor anikifor   512 Jun  5 13:56 kube-rbac-proxy_current.log
-rw-r-----. 1 anikifor anikifor  1035 Jun  5 13:56 prometheus-config-reloader_current.log
-rw-r-----. 1 anikifor anikifor 38134 Jun  5 13:56 prometheus_current.log
-rw-r-----. 1 anikifor anikifor  2788 Jun  5 13:56 prometheus_previous.log
-rw-r-----. 1 anikifor anikifor  1587 Jun  5 13:56 prometheus-proxy_current.log
-rw-r-----. 1 anikifor anikifor    59 Jun  5 13:56 prom-label-proxy_current.log
-rw-r-----. 1 anikifor anikifor   180 Jun  5 13:56 rules-configmap-reloader_current.log
-rw-r-----. 1 anikifor anikifor  2392 Jun  5 13:56 thanos-sidecar_current.log

Comment 7 errata-xmlrpc 2020-10-27 16:00:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.