Bug 2145146 - CDI operator is not creating PrometheusRule resource with alerts if CDI resource is incorrect
Summary: CDI operator is not creating PrometheusRule resource with alerts if CDI resou...
Keywords:
Status: VERIFIED
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Storage
Version: 4.12.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.12.1
Assignee: Arnon Gilboa
QA Contact: Yan Du
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-11-23 10:55 UTC by João Vilaça
Modified: 2023-08-08 10:31 UTC (History)
6 users (show)

Fixed In Version: v4.12.1-38
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt containerized-data-importer pull 2546 0 None Merged Ensure Prometheus resources exist for CDINotReady 2023-02-01 12:26:15 UTC
Github kubevirt containerized-data-importer pull 2562 0 None Merged [release-v1.55] Ensure Prometheus resources exist for CDINotReady 2023-02-01 12:26:19 UTC
Github kubevirt containerized-data-importer pull 2577 0 None Merged Fix operator_test CDI CR Gets 2023-02-08 14:45:14 UTC
Github kubevirt containerized-data-importer pull 2581 0 None Merged [release-v1.55] Fix cdi_cr_ready test CDI CR name 2023-02-09 08:55:12 UTC
Red Hat Issue Tracker CNV-22813 0 None None None 2022-11-23 11:05:45 UTC

Description João Vilaça 2022-11-23 10:55:36 UTC
Description of problem:

When we create a CDI resource, the operator should expose the `kubevirt_cdi_cr_ready` metric, and create the `PrometheusRule` resource with the CDI alerts. Right now, if we create the CDI resource with a wrong infra node selector,  the operator exposes the metric but the `PrometheusRule` is not created, and therefore alerts are not fired (namely `CDINotReady`) 

See https://github.com/kubevirt/containerized-data-importer/blob/a19238ebbdadb8cc02ce91d3ed01c98935ff5475/tests/monitoring_test.go#L65 for the related test

Version-Release number of selected component (if applicable):


How reproducible: 100%


Steps to Reproduce:
1. Delete CDI if it exists
2. Create a new CDI with wrong .Spec.Infra.NodeSelector (p.e. "wrong": "wrong")

Actual results:

> kubectl get PrometheusRule -n cdi prometheus-cdi-rules
Error from server (NotFound): prometheusrules.monitoring.coreos.com "prometheus-cdi-rules" not found

Expected results:

> kubectl get PrometheusRule -n cdi prometheus-cdi-rules
NAME                   AGE
prometheus-cdi-rules   3m40s


Additional info:

Comment 1 Alex Kalenyuk 2022-11-23 11:17:51 UTC
The operator is probably crashing because of this config error, and thus cannot deploy the resources
If that is the case, CDI CR status should reflect that CDI is in a "failing" state
If not, we could take a look at the operator logs to understand what is happening

Comment 3 Adam Litke 2023-02-08 14:46:33 UTC
Arnon, looks like this failed QA.  Please take a look.

Comment 4 Arnon Gilboa 2023-02-08 14:51:51 UTC
Sure Adam, I'm on it. It's a tier-1 test bug failing it D/S.

Comment 5 Yan Du 2023-02-14 11:10:06 UTC
Verified on CNV v4.12.1-40


Note You need to log in before you can comment on or make changes to this bug.