Bug 1881848 - Do not crash Insights Operator if a CRD is missing
Summary: Do not crash Insights Operator if a CRD is missing
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Insights Operator
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.6.0
Assignee: Martin Kunc
QA Contact: Pavel Šimovec
Marc Muehlfeld
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-23 08:27 UTC by Ivo Meixner
Modified: 2020-10-27 16:44 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:44:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift insights-operator pull 188 0 None closed Bug 1881848: Do not return CRD not found error, just log it 2020-09-29 08:39:50 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:44:25 UTC

Description Ivo Meixner 2020-09-23 08:27:01 UTC
Description of problem:
Currently, if a CRD is missing from a cluster and the Insights Operator with unsucessfully attempt to collect it, it will crash. However, some CRDs are supposed to be missing, which would cause I.O. to repeatedly crash. Instead, these errors should only be logged and the error should not be reported.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:
A missing CRD that is supposed to be collected will crash the Insights Operator.


Expected results:
Insights Operator should log the missing CRD and continue gathering without sending the error further down the line.

Additional info:

Comment 2 Ivo Meixner 2020-09-23 09:35:55 UTC
I'm sorry about that. I set them, but I forgot to click the "Save changes" button.

Comment 4 Pavel Šimovec 2020-09-25 14:19:32 UTC
this one was difficult to verify, could not get rid of default crds ( oc delete crd volumesnapshotcontents.snapshot.storage.k8s.io )
didn't manage to delete them permanently
but found a workaround:

spam
oc delete project openshift-cluster-storage-operator ; oc delete crd volumesnapshotcontents.snapshot.storage.k8s.io

and run TestArchiveContains with extended logging

When I deleted whole namespace openshift-cluster-storage-operator, it gave me enough time - few minutes to reproduce Error on older version
E0924 13:56:01.653098       1 periodic.go:157] config failed after 1.473s with: customresourcedefinitions.apiextensions.k8s.io "volumesnapshotcontents.snapshot.storage.k8s.io" not found

then it didn't even mention  volumesnapshots.snapshot.storage.k8s.io

and record config/crd/volumesnapshots.snapshot.storage.k8s.io.json was not in archive

on version 4.6.0-0.ci-2020-09-24-134740 with the same steps
when spamming:
oc delete project openshift-cluster-storage-operator ; oc delete crd volumesnapshotcontents.snapshot.storage.k8s.io
Error from server (NotFound): namespaces "openshift-cluster-storage-operator" not found
customresourcedefinition.apiextensions.k8s.io "volumesnapshotcontents.snapshot.storage.k8s.io" deleted
❯ oc delete project openshift-cluster-storage-operator ; oc delete crd volumesnapshotcontents.snapshot.storage.k8s.io
Error from server (NotFound): namespaces "openshift-cluster-storage-operator" not found
Error from server (NotFound): customresourcedefinitions.apiextensions.k8s.io "volumesnapshotcontents.snapshot.storage.k8s.io" not found

...
We get info that CRD was not found, but no error.. and it moves on with second CRD which is available
I0925 14:10:39.783743       1 clusterconfig.go:730] Cannot find CRD: "volumesnapshotcontents.snapshot.storage.k8s.io"
I0925 14:10:39.783759       1 diskrecorder.go:66] Recording config/crd/volumesnapshots.snapshot.storage.k8s.io with fingerprint=

The error from older version did not appear, and in archive there is record config/crd/volumesnapshots.snapshot.storage.k8s.io.json
insights archive content:
...
config/configmaps/openshift-install/version
config/crd/volumesnapshots.snapshot.storage.k8s.io.json
config/featuregate.json
...


VERIFIED on version 4.6.0-0.ci-2020-09-24-134740

Comment 7 errata-xmlrpc 2020-10-27 16:44:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.