Bug 2179991
| Summary: | VirtApiRESTErrorsBurst threshold high | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Container Native Virtualization (CNV) | Reporter: | Ohad <orevah> | ||||
| Component: | Virtualization | Assignee: | Igor Bezukh <ibezukh> | ||||
| Status: | CLOSED NOTABUG | QA Contact: | Akriti Gupta <akrgupta> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 4.13.0 | CC: | acardace, kedar.lad, sradco, stirabos | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 4.14.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | hco-bundle-v4.14.0.rhel9-2029 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2023-10-05 07:47:47 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Ohad
2023-03-20 14:16:40 UTC
We need to drop the evaluation time. I'm very confused here. Looking at the steps to reproduce the scenario, it appears that virt-api has been left in a non-running state? Its not clear to me what removing its role binding does after its already running. But, the REST API endpoint for this alert is virt-api itself, is it not? Shirly, you mention that we should drop the evaluation time, but it's not clear that will do anything useful. Can you help us understand what needs to be done and why? I can't really comment on the steps to reproduce. I think this is a question for Ohad. Probably what he was trying to do is get the requests to fail in high %. We need to drop the evaluation time, since in the expression itself we are looking back 5 minutes and checking the % of failed requests. If the failure % is greater than 80% than the alert should fire immediately and not wait for 5m. It the same as VirtApiRESTErrorsHigh. Targeting this to CNV 4.15 depending upon the severity and anticipated capacity at this point. Discussed with Virt Devs, targeting it back to 4.14 Target Version. Closed NOTABUG, when disabling also the olm with the virt-operator the virt-operator not reconcile then the cluster-role-binding not reconciled so the rest-calls keep failing so I managed to trigger the alert without problems, alsoI added more 5-mins to wait for the alert because the time of the setup to make the rest-calls fail 80%. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |