Bug 1838973

Summary: Insights operator should collect pod logs
Product: OpenShift Container Platform Reporter: Vadim Rutkovsky <vrutkovs>
Component: Insights OperatorAssignee: Alexandre Vicenzi <avicenzi>
Status: CLOSED ERRATA QA Contact: Angelina Vasileva <anikifor>
Severity: medium Docs Contact: Radek Vokál <rvokal>
Priority: unspecified    
Version: 4.5CC: avicenzi, inecas
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:00:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1844413    

Description Vadim Rutkovsky 2020-05-22 08:14:05 UTC
Description of problem:
Insights operator collects pod manifests, related to failing operator. It should also collect current/previous pod logs (at least latest ~100 lines) of all containers in that pod to diagnose the failure better without requiring a must-gather

Comment 3 Angelina Vasileva 2020-06-05 12:18:16 UTC
Fixed and verified in 4.6.0-0.nightly-2020-06-04-232426.

Verification steps:

1. Degrade some operator

oc -n openshift-monitoring create configmap cluster-monitoring-config
oc -n openshift-monitoring edit configmap cluster-monitoring-config

Add the following data to the config (with invalid value):

apiVersion: v1
data:
  config.yaml: |
    telemeterClient:
      enabled: NOT_BOOELAN
kind: ConfigMap
metadata:
...

Delete cluster-monitoring-operator* pod

oc get pods  -n openshift-monitoring
oc delete pod cluster-monitoring-operator-7b8665747f-w2fwv -n openshift-monitoring

2. Check operator is degraded

$ oc get co monitoring
NAME         VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
monitoring   4.6.0-0.nightly-2020-06-04-232426   False       False         True       51s

3. Download fresh archive from AWS S3

4. Check the content of the archive

$ ll -R openshift-monitoring/logs/
openshift-monitoring/logs/:
total 8
drwxr-xr-x. 2 anikifor anikifor 4096 Jun  5 13:58 prometheus-k8s-0
drwxr-xr-x. 2 anikifor anikifor 4096 Jun  5 13:58 prometheus-k8s-1

openshift-monitoring/logs/prometheus-k8s-0:
total 52
-rw-r-----. 1 anikifor anikifor   512 Jun  5 13:56 kube-rbac-proxy_current.log
-rw-r-----. 1 anikifor anikifor  1036 Jun  5 13:56 prometheus-config-reloader_current.log
-rw-r-----. 1 anikifor anikifor 16734 Jun  5 13:56 prometheus_current.log
-rw-r-----. 1 anikifor anikifor  2788 Jun  5 13:56 prometheus_previous.log
-rw-r-----. 1 anikifor anikifor  4549 Jun  5 13:56 prometheus-proxy_current.log
-rw-r-----. 1 anikifor anikifor    59 Jun  5 13:56 prom-label-proxy_current.log
-rw-r-----. 1 anikifor anikifor   180 Jun  5 13:56 rules-configmap-reloader_current.log
-rw-r-----. 1 anikifor anikifor  2392 Jun  5 13:56 thanos-sidecar_current.log

openshift-monitoring/logs/prometheus-k8s-1:
total 68
-rw-r-----. 1 anikifor anikifor   512 Jun  5 13:56 kube-rbac-proxy_current.log
-rw-r-----. 1 anikifor anikifor  1035 Jun  5 13:56 prometheus-config-reloader_current.log
-rw-r-----. 1 anikifor anikifor 38134 Jun  5 13:56 prometheus_current.log
-rw-r-----. 1 anikifor anikifor  2788 Jun  5 13:56 prometheus_previous.log
-rw-r-----. 1 anikifor anikifor  1587 Jun  5 13:56 prometheus-proxy_current.log
-rw-r-----. 1 anikifor anikifor    59 Jun  5 13:56 prom-label-proxy_current.log
-rw-r-----. 1 anikifor anikifor   180 Jun  5 13:56 rules-configmap-reloader_current.log
-rw-r-----. 1 anikifor anikifor  2392 Jun  5 13:56 thanos-sidecar_current.log

Comment 7 errata-xmlrpc 2020-10-27 16:00:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196