Bug 2234399

Summary: VirtControllerRESTErrorsHigh not fired
Product: Container Native Virtualization (CNV) Reporter: Ohad <orevah>
Component: MetricsAssignee: João Vilaça <jvilaca>
Status: CLOSED NOTABUG QA Contact: Natalie Gavrielov <ngavrilo>
Severity: high Docs Contact:
Priority: high    
Version: 4.14.0CC: acardace, kmajcher, stirabos
Target Milestone: ---   
Target Release: 4.15.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-10-24 17:02:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
screenshot of the alerts metrics on the same time
none
screenshot of the alerts metrics on the same time none

Description Ohad 2023-08-24 10:07:43 UTC
Created attachment 1985006 [details]
screenshot of the alerts metrics on the same time

Description of problem:
The VirtControllerRESTErrorsHigh alert is not fired while the VirtControllerRESTErrorsBurst fired.

VirtControllerRESTErrorsHigh is when more than 5% of rest calls failed, this alert didn't fired while the VirtControllerRESTErrorsBurst is fired (more than 80% of the rest calls failed)


Version-Release number of selected component (if applicable):


How reproducible:
https://polarion.engineering.redhat.com/polarion/#/project/CNV/workitem?id=CNV-9992

Steps to Reproduce:
1.
2.
3.

Actual results:
The alert is not fired

Expected results:
The alert should fire

Additional info:
Added screenshots from the metrics from both of the alerts on the same time where the test is executed

Comment 1 Ohad 2023-08-24 10:10:19 UTC
Created attachment 1985007 [details]
screenshot of the alerts metrics on the same time

Comment 2 Krzysztof Majcher 2023-09-13 10:28:35 UTC
Hi, it seems this bug should be fixed by Virt team. Can you fix it? (we can assist if needed)

Comment 3 Antonio Cardace 2023-09-22 09:41:36 UTC
Deferring to 4.15 due to capacity since we're in blockers-only phase.

Comment 5 Krzysztof Majcher 2023-10-11 13:11:11 UTC
@acardace - I though since it's a Virt alert it should be on Virt team, but let me doublecheck and if we are more suited for fixing it, we'll take it. Will update the bug this week.

Comment 6 João Vilaça 2023-10-16 08:48:05 UTC
@orevah 

this behavior is totally normal

"VirtControllerRESTErrorsHigh is when more than 5% of rest calls failed" 
"VirtControllerRESTErrorsBurst ... more than 80% of the rest calls failed"

that is correct, but the time frames make all the difference here. 
'VirtControllerRESTErrorsHigh' check requests in the last hour while
'VirtControllerRESTErrorsBurst' check requests in the last 5 minutes.

'VirtControllerRESTErrorsBurst' is firing because all requests (most or at least)
are probably failing because the service is facing catastrophic failures.

'VirtControllerRESTErrorsBurst' is useful to know when the service is mostly
working correctly but some endpoints/requests are failing.

Comment 7 Krzysztof Majcher 2023-10-17 12:56:11 UTC
Ohad, can you please retest, having in mind that 

'VirtControllerRESTErrorsHigh' check requests in the last hour while
'VirtControllerRESTErrorsBurst' check requests in the last 5 minutes.

(if the test is not running for an hour, it would be not deterministic to verify it with certainty)

Comment 8 Ohad 2023-10-24 16:59:35 UTC
I tested it again and it seems to be working now, closing this bug.