Created attachment 1985006 [details] screenshot of the alerts metrics on the same time Description of problem: The VirtControllerRESTErrorsHigh alert is not fired while the VirtControllerRESTErrorsBurst fired. VirtControllerRESTErrorsHigh is when more than 5% of rest calls failed, this alert didn't fired while the VirtControllerRESTErrorsBurst is fired (more than 80% of the rest calls failed) Version-Release number of selected component (if applicable): How reproducible: https://polarion.engineering.redhat.com/polarion/#/project/CNV/workitem?id=CNV-9992 Steps to Reproduce: 1. 2. 3. Actual results: The alert is not fired Expected results: The alert should fire Additional info: Added screenshots from the metrics from both of the alerts on the same time where the test is executed
Created attachment 1985007 [details] screenshot of the alerts metrics on the same time
Hi, it seems this bug should be fixed by Virt team. Can you fix it? (we can assist if needed)
Deferring to 4.15 due to capacity since we're in blockers-only phase.
@acardace - I though since it's a Virt alert it should be on Virt team, but let me doublecheck and if we are more suited for fixing it, we'll take it. Will update the bug this week.
@orevah this behavior is totally normal "VirtControllerRESTErrorsHigh is when more than 5% of rest calls failed" "VirtControllerRESTErrorsBurst ... more than 80% of the rest calls failed" that is correct, but the time frames make all the difference here. 'VirtControllerRESTErrorsHigh' check requests in the last hour while 'VirtControllerRESTErrorsBurst' check requests in the last 5 minutes. 'VirtControllerRESTErrorsBurst' is firing because all requests (most or at least) are probably failing because the service is facing catastrophic failures. 'VirtControllerRESTErrorsBurst' is useful to know when the service is mostly working correctly but some endpoints/requests are failing.
Ohad, can you please retest, having in mind that 'VirtControllerRESTErrorsHigh' check requests in the last hour while 'VirtControllerRESTErrorsBurst' check requests in the last 5 minutes. (if the test is not running for an hour, it would be not deterministic to verify it with certainty)
I tested it again and it seems to be working now, closing this bug.