Bug 2145146

Summary: CDI operator is not creating PrometheusRule resource with alerts if CDI resource is incorrect
Product: Container Native Virtualization (CNV) Reporter: João Vilaça <jvilaca>
Component: StorageAssignee: Arnon Gilboa <agilboa>
Status: VERIFIED --- QA Contact: Yan Du <yadu>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.12.0CC: agilboa, akalenyu, alitke, jvilaca, stirabos, yadu
Target Milestone: ---   
Target Release: 4.12.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: v4.12.1-38 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description João Vilaça 2022-11-23 10:55:36 UTC
Description of problem:

When we create a CDI resource, the operator should expose the `kubevirt_cdi_cr_ready` metric, and create the `PrometheusRule` resource with the CDI alerts. Right now, if we create the CDI resource with a wrong infra node selector,  the operator exposes the metric but the `PrometheusRule` is not created, and therefore alerts are not fired (namely `CDINotReady`) 

See https://github.com/kubevirt/containerized-data-importer/blob/a19238ebbdadb8cc02ce91d3ed01c98935ff5475/tests/monitoring_test.go#L65 for the related test

Version-Release number of selected component (if applicable):


How reproducible: 100%


Steps to Reproduce:
1. Delete CDI if it exists
2. Create a new CDI with wrong .Spec.Infra.NodeSelector (p.e. "wrong": "wrong")

Actual results:

> kubectl get PrometheusRule -n cdi prometheus-cdi-rules
Error from server (NotFound): prometheusrules.monitoring.coreos.com "prometheus-cdi-rules" not found

Expected results:

> kubectl get PrometheusRule -n cdi prometheus-cdi-rules
NAME                   AGE
prometheus-cdi-rules   3m40s


Additional info:

Comment 1 Alex Kalenyuk 2022-11-23 11:17:51 UTC
The operator is probably crashing because of this config error, and thus cannot deploy the resources
If that is the case, CDI CR status should reflect that CDI is in a "failing" state
If not, we could take a look at the operator logs to understand what is happening

Comment 3 Adam Litke 2023-02-08 14:46:33 UTC
Arnon, looks like this failed QA.  Please take a look.

Comment 4 Arnon Gilboa 2023-02-08 14:51:51 UTC
Sure Adam, I'm on it. It's a tier-1 test bug failing it D/S.

Comment 5 Yan Du 2023-02-14 11:10:06 UTC
Verified on CNV v4.12.1-40