Bug 1767719 - Gather events for namespaces defined as related objects in degraded operators
Summary: Gather events for namespaces defined as related objects in degraded operators
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Insights Operator
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.2.z
Assignee: Michal Fojtik
QA Contact: Angelina Vasileva
Radek Vokál
URL:
Whiteboard:
Depends On: 1768298
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-01 08:15 UTC by Michal Fojtik
Modified: 2019-12-11 22:36 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1768298 (view as bug list)
Environment:
Last Closed: 2019-12-11 22:36:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift insights-operator pull 38 0 'None' closed Bug 1767719: gather: include network events for namespace that has degraded operator 2020-08-03 13:19:12 UTC
Github openshift insights-operator pull 56 0 'None' closed Bug 1767719: Fix the event reporting in insights operator 2020-08-03 13:19:11 UTC
Red Hat Product Errata RHBA-2019:4093 0 None None None 2019-12-11 22:36:16 UTC

Description Michal Fojtik 2019-11-01 08:15:13 UTC
Description of problem:

Events provide timelines when failures happen and help debugging problems.
A concrete example is CNI "network not ready" that will cause containers to be stucked in pending when the container network is down (SDN). In case of kube-apiserver-operator it can cause the installer pods being stucked and the operator will be effectively blocked.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Dmitry Misharov 2019-11-26 09:03:47 UTC
I cannot verify this on 4.2.0-0.nightly-2019-11-25-200935.

I have one degraded operator.

$ oc get clusteroperators
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.2.0-0.nightly-2019-11-25-200935   True        False         True       31m
...

But insights operator doesn't collect events from it.

$ oc rsh insights-operator-5f8db86747-lttn4
$ ls /var/lib/insights-operator/
$ tar -xzvf /var/lib/insights-operator/insights-2019-11-26-084959.tar.gz
config/authentication
config/clusteroperator/authentication
config/clusteroperator/cloud-credential
config/clusteroperator/cluster-autoscaler
config/clusteroperator/console
config/clusteroperator/dns
config/clusteroperator/image-registry
config/clusteroperator/ingress
config/clusteroperator/insights
config/clusteroperator/kube-apiserver
config/clusteroperator/kube-controller-manager
config/clusteroperator/kube-scheduler
config/clusteroperator/machine-api
config/clusteroperator/machine-config
config/clusteroperator/marketplace
config/clusteroperator/monitoring
config/clusteroperator/network
config/clusteroperator/node-tuning
config/clusteroperator/openshift-apiserver
config/clusteroperator/openshift-controller-manager
config/clusteroperator/openshift-samples
config/clusteroperator/operator-lifecycle-manager
config/clusteroperator/operator-lifecycle-manager-catalog
config/clusteroperator/operator-lifecycle-manager-packageserver
config/clusteroperator/service-ca
config/clusteroperator/service-catalog-apiserver
config/clusteroperator/service-catalog-controller-manager
config/clusteroperator/storage
config/featuregate
config/id
config/infrastructure
config/ingress
config/network
config/oauth
config/version

Comment 6 Angelina Vasileva 2019-12-03 11:31:54 UTC
Verified in 4.2.0-0.nightly-2019-12-02-165545


Verification steps:
1. Create locally patch.yaml with the following contents:
- op: add
  path: /spec/overrides
  value:
  - group: apps/v1
    kind: Deployment
    name: ingress-operator
    namespace: openshift-ingress-operator
    unmanaged: true

2. oc patch clusterversion version --type json -p "$(cat patch.yaml)"

3. Scale the ingress operator to 0 in the web-console

4. Scale the openshift-ingress to 0 in the web-console

5. oc delete pods --all -n openshift-authentication-operator (that will rekick auth operator so you don't have to wait)

oc get clusteroperators
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.2.0-0.nightly-2019-12-02-165545   True        False         True       82m


6. oc delete pods --all -n openshift-insights (that will kick insights operator)

7. Check insights-archive, there is a folder with recorded events:

$oc project openshift-insights
$oc get pods -n openshift-insights 
NAME                                 READY   STATUS    RESTARTS   AGE
insights-operator-5dbd8b898d-pdgw7   1/1     Running   0          27s

$oc rsh insights-operator-5dbd8b898d-pdgw7

#tar -xzvf /var/lib/insights-operator/insights-2019-12-03-100819.tar.gz 
config/authentication
config/clusteroperator/authentication
config/clusteroperator/cloud-credential
config/clusteroperator/cluster-autoscaler
config/clusteroperator/console
config/clusteroperator/dns
config/clusteroperator/image-registry
config/clusteroperator/ingress
config/clusteroperator/insights
config/clusteroperator/kube-apiserver
config/clusteroperator/kube-controller-manager
config/clusteroperator/kube-scheduler
config/clusteroperator/machine-api
config/clusteroperator/machine-config
config/clusteroperator/marketplace
config/clusteroperator/monitoring
config/clusteroperator/network
config/clusteroperator/node-tuning
config/clusteroperator/openshift-apiserver
config/clusteroperator/openshift-controller-manager
config/clusteroperator/openshift-samples
config/clusteroperator/operator-lifecycle-manager
config/clusteroperator/operator-lifecycle-manager-catalog
config/clusteroperator/operator-lifecycle-manager-packageserver
config/clusteroperator/service-ca
config/clusteroperator/service-catalog-apiserver
config/clusteroperator/service-catalog-controller-manager
config/clusteroperator/storage
config/featuregate
config/id
config/infrastructure
config/ingress
config/network
config/oauth
config/version
events/openshift-authentication
events/openshift-authentication-operator
events/openshift-config
events/openshift-config-managed
events/openshift-ingress

# cat events/openshift-ingress 
{"items":[{"namespace":"openshift-ingress","lastTimestamp":"2019-12-03T10:02:08Z","reason":"Killing","message":"Stopping container router"},{"namespace":"openshift-ingress","lastTimestamp":"2019-12-03T10:02:08Z","reason":"SuccessfulDelete","message":"Deleted pod: router-default-7dbd7cbb94-f287d"},{"namespace":"openshift-ingress","lastTimestamp":"2019-12-03T10:02:08Z","reason":"ScalingReplicaSet","message":"Scaled down replica set router-default-7dbd7cbb94 to 1"},{"namespace":"openshift-ingress","lastTimestamp":"2019-12-03T10:02:10Z","reason":"Killing","message":"Stopping container router"},{"namespace":"openshift-ingress","lastTimestamp":"2019-12-03T10:02:10Z","reason":"SuccessfulDelete","message":"Deleted pod: router-default-7dbd7cbb94-n98gx"},{"namespace":"openshift-ingress","lastTimestamp":"2019-12-03T10:02:10Z","reason":"ScalingReplicaSet","message":"Scaled down replica set router-default-7dbd7cbb94 to 0"},{"namespace":"openshift-ingress","lastTimestamp":"2019-12-03T10:07:51Z","reason":"NoPods","message":"No matching pods found"}]}
 
# cat events/openshift-authentication
{"items":[{"namespace":"openshift-authentication","lastTimestamp":"2019-12-03T10:05:51Z","reason":"SuccessfulCreate","message":"Created pod: oauth-openshift-c4d5fdf98-j7p8k"},{"namespace":"openshift-authentication","lastTimestamp":"2019-12-03T10:05:51Z","reason":"Scheduled","message":"Successfully assigned openshift-authentication/oauth-openshift-c4d5fdf98-j7p8k to ip-10-0-128-110.ec2.internal"},{"namespace":"openshift-authentication","lastTimestamp":"2019-12-03T10:05:51Z","reason":"ScalingReplicaSet","message":"Scaled up replica set oauth-openshift-c4d5fdf98 to 1"},{"namespace":"openshift-authentication","lastTimestamp":"2019-12-03T10:05:59Z","reason":"Pulled","message":"Container image \"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9eb33aaf5c732e0967454e6861cf10d0a3323fbbf2962da7e1d450b15b59a364\" already present on machine"},{"namespace":"openshift-authentication","lastTimestamp":"2019-12-03T10:05:59Z","reason":"Created","message":"Created container oauth-openshift"},{"namespace":"openshift-authentication","lastTimestamp":"2019-12-03T10:06:00Z","reason":"Started","message":"Started container oauth-openshift"},{"namespace":"openshift-authentication","lastTimestamp":"2019-12-03T10:06:04Z","reason":"SuccessfulCreate","message":"Created pod: oauth-openshift-c4d5fdf98-bj568"},{"namespace":"openshift-authentication","lastTimestamp":"2019-12-03T10:06:04Z","reason":"ScalingReplicaSet","message":"Scaled up replica set oauth-openshift-c4d5fdf98 to 2"},{"namespace":"openshift-authentication","lastTimestamp":"2019-12-03T10:06:04Z","reason":"SuccessfulDelete","message":"Deleted pod: oauth-openshift-6cf8b94896-cbdz2"},{"namespace":"openshift-authentication","lastTimestamp":"2019-12-03T10:06:04Z","reason":"ScalingReplicaSet","message":"Scaled down replica set oauth-openshift-6cf8b94896 to 1"},{"namespace":"openshift-authentication","lastTimestamp":"2019-12-03T10:06:04Z","reason":"Scheduled","message":"Successfully assigned openshift-authentication/oauth-openshift-c4d5fdf98-bj568 to ip-10-0-155-216.ec2.internal"},{"namespace":"openshift-authentication","lastTimestamp":"2019-12-03T10:06:04Z","reason":"Killing","message":"Stopping container oauth-openshift"},{"namespace":"openshift-authentication","lastTimestamp":"2019-12-03T10:06:12Z","reason":"Started","message":"Started container oauth-openshift"},{"namespace":"openshift-authentication","lastTimestamp":"2019-12-03T10:06:12Z","reason":"Created","message":"Created container oauth-openshift"},{"namespace":"openshift-authentication","lastTimestamp":"2019-12-03T10:06:12Z","reason":"Pulled","message":"Container image \"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9eb33aaf5c732e0967454e6861cf10d0a3323fbbf2962da7e1d450b15b59a364\" already present on machine"},{"namespace":"openshift-authentication","lastTimestamp":"2019-12-03T10:06:19Z","reason":"SuccessfulDelete","message":"Deleted pod: oauth-openshift-6cf8b94896-mlxzl"},{"namespace":"openshift-authentication","lastTimestamp":"2019-12-03T10:06:19Z","reason":"Killing","message":"Stopping container oauth-openshift"},{"namespace":"openshift-authentication","lastTimestamp":"2019-12-03T10:06:19Z","reason":"ScalingReplicaSet","message":"Scaled down replica set oauth-openshift-6cf8b94896 to 0"}]}

Comment 8 errata-xmlrpc 2019-12-11 22:36:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:4093


Note You need to log in before you can comment on or make changes to this bug.