Bug 2238218
| Summary: | VirtHandlerRESTErrorsHigh alert in firing state during ocp upgrade z stream 4.12.5 > 4.12.6 | ||
|---|---|---|---|
| Product: | Container Native Virtualization (CNV) | Reporter: | Ahmad <ahafe> |
| Component: | Installation | Assignee: | João Vilaça <jvilaca> |
| Status: | CLOSED MIGRATED | QA Contact: | Ahmad <ahafe> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.12.5 | CC: | dbasunag, kmajcher, stirabos |
| Target Milestone: | --- | Keywords: | Reopened |
| Target Release: | 4.12.9 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-12-05 13:42:52 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Cannot reproduce at the moment. Will be reopened if needed. During fixing we should see how many calls was happening during the whole our before alert fired. It's possible that cluster was not very active during that time, and then it only requires just a few failed calls to trigger the alert. Maybe we should reconsider the alert logic to account for that? Maybe it's enough to adjust the threshold. @kmajcher I think this might only happen in the automated tests since the cluster is recently created Does it make sense to complicate the expression if this is not happening in live clusters? Please sync with Debarati and Ahmad if they agree with that. We had a short discussion on this bug with Debarati, Simone and Shirly and the agreement is it would be better to have a fix. Please sync with them what would be the simplest fix. Properly fixing can be problematic, we can add a mitigation note in the runbook saying that this could be visible just after an upgrade. |
Description of problem: During ocp upgrade v4.12.5 to 4.12.6 (cnv: v4.12.5-50), noticed alert 'VirtHandlerRESTErrorsHigh' in firing state. Version-Release number of selected component (if applicable): How reproducible: 1 of multiple attempts Steps to Reproduce: 1.upgrade ocp z streams 4.12.5 > 4.12.6 2.Check alerts fired during cnv upgrade Actual results: logs: [{'labels': {'alertname': 'VirtHandlerRESTErrorsHigh', 'kubernetes_operator_component': 'kubevirt', 'kubernetes_operator_part_of': 'kubevirt', 'severity': 'warning'}, 'annotations': {'runbook_url': 'https://kubevirt.io/monitoring/runbooks/VirtHandlerRESTErrorsHigh', 'summary': 'More than 5% of the rest calls failed in virt-handler for the last hour'}, 'state': 'firing', 'activeAt': '2023-09-14T12:02:04.531593741Z', 'value': '6.1842357154408945e-02'} Expected results: no alerts should fire during ocp upgrade process, we are trying to capture all the alerts that are fired during upgrades and reduce the noise generated. Additional info: must-gather log attached