Bug 1881848

Summary: Do not crash Insights Operator if a CRD is missing
Product: OpenShift Container Platform Reporter: Ivo Meixner <imeixner>
Component: Insights OperatorAssignee: Martin Kunc <mkunc>
Status: CLOSED ERRATA QA Contact: Pavel Šimovec <psimovec>
Severity: high Docs Contact: Marc Muehlfeld <mmuehlfe>
Priority: high    
Version: 4.6CC: aos-bugs, avicenzi, inecas, tremes
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:44:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ivo Meixner 2020-09-23 08:27:01 UTC
Description of problem:
Currently, if a CRD is missing from a cluster and the Insights Operator with unsucessfully attempt to collect it, it will crash. However, some CRDs are supposed to be missing, which would cause I.O. to repeatedly crash. Instead, these errors should only be logged and the error should not be reported.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:
A missing CRD that is supposed to be collected will crash the Insights Operator.


Expected results:
Insights Operator should log the missing CRD and continue gathering without sending the error further down the line.

Additional info:

Comment 2 Ivo Meixner 2020-09-23 09:35:55 UTC
I'm sorry about that. I set them, but I forgot to click the "Save changes" button.

Comment 4 Pavel Šimovec 2020-09-25 14:19:32 UTC
this one was difficult to verify, could not get rid of default crds ( oc delete crd volumesnapshotcontents.snapshot.storage.k8s.io )
didn't manage to delete them permanently
but found a workaround:

spam
oc delete project openshift-cluster-storage-operator ; oc delete crd volumesnapshotcontents.snapshot.storage.k8s.io

and run TestArchiveContains with extended logging

When I deleted whole namespace openshift-cluster-storage-operator, it gave me enough time - few minutes to reproduce Error on older version
E0924 13:56:01.653098       1 periodic.go:157] config failed after 1.473s with: customresourcedefinitions.apiextensions.k8s.io "volumesnapshotcontents.snapshot.storage.k8s.io" not found

then it didn't even mention  volumesnapshots.snapshot.storage.k8s.io

and record config/crd/volumesnapshots.snapshot.storage.k8s.io.json was not in archive

on version 4.6.0-0.ci-2020-09-24-134740 with the same steps
when spamming:
oc delete project openshift-cluster-storage-operator ; oc delete crd volumesnapshotcontents.snapshot.storage.k8s.io
Error from server (NotFound): namespaces "openshift-cluster-storage-operator" not found
customresourcedefinition.apiextensions.k8s.io "volumesnapshotcontents.snapshot.storage.k8s.io" deleted
❯ oc delete project openshift-cluster-storage-operator ; oc delete crd volumesnapshotcontents.snapshot.storage.k8s.io
Error from server (NotFound): namespaces "openshift-cluster-storage-operator" not found
Error from server (NotFound): customresourcedefinitions.apiextensions.k8s.io "volumesnapshotcontents.snapshot.storage.k8s.io" not found

...
We get info that CRD was not found, but no error.. and it moves on with second CRD which is available
I0925 14:10:39.783743       1 clusterconfig.go:730] Cannot find CRD: "volumesnapshotcontents.snapshot.storage.k8s.io"
I0925 14:10:39.783759       1 diskrecorder.go:66] Recording config/crd/volumesnapshots.snapshot.storage.k8s.io with fingerprint=

The error from older version did not appear, and in archive there is record config/crd/volumesnapshots.snapshot.storage.k8s.io.json
insights archive content:
...
config/configmaps/openshift-install/version
config/crd/volumesnapshots.snapshot.storage.k8s.io.json
config/featuregate.json
...


VERIFIED on version 4.6.0-0.ci-2020-09-24-134740

Comment 7 errata-xmlrpc 2020-10-27 16:44:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196