This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .
Bug 2238218 - VirtHandlerRESTErrorsHigh alert in firing state during ocp upgrade z stream 4.12.5 > 4.12.6
Summary: VirtHandlerRESTErrorsHigh alert in firing state during ocp upgrade z stream 4...
Keywords:
Status: CLOSED MIGRATED
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Installation
Version: 4.12.5
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.12.9
Assignee: João Vilaça
QA Contact: Ahmad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-09-10 12:24 UTC by Ahmad
Modified: 2023-12-05 13:42 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-12-05 13:42:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker   CNV-32814 0 None None None 2023-12-05 13:42:51 UTC

Description Ahmad 2023-09-10 12:24:58 UTC
Description of problem:
During ocp upgrade v4.12.5 to 4.12.6 (cnv: v4.12.5-50), noticed alert 'VirtHandlerRESTErrorsHigh' in firing state.

Version-Release number of selected component (if applicable):


How reproducible: 1 of multiple attempts


Steps to Reproduce:
1.upgrade ocp z streams 4.12.5 > 4.12.6
2.Check alerts fired during cnv upgrade

Actual results:

logs:
 [{'labels': {'alertname': 'VirtHandlerRESTErrorsHigh', 'kubernetes_operator_component': 'kubevirt', 'kubernetes_operator_part_of': 'kubevirt', 'severity': 'warning'}, 'annotations': {'runbook_url': 'https://kubevirt.io/monitoring/runbooks/VirtHandlerRESTErrorsHigh', 'summary': 'More than 5% of the rest calls failed in virt-handler for the last hour'}, 'state': 'firing', 'activeAt': '2023-09-14T12:02:04.531593741Z', 'value': '6.1842357154408945e-02'}




Expected results:
no alerts should fire during ocp upgrade process, we are trying to capture all the alerts that are fired during upgrades and reduce the noise generated.

Additional info:
must-gather log attached

Comment 2 Krzysztof Majcher 2023-09-12 12:44:52 UTC
Cannot reproduce at the moment. Will be reopened if needed.

Comment 5 Krzysztof Majcher 2023-09-26 12:44:06 UTC
During fixing we should see how many calls was happening during the whole our before alert fired. 
It's possible that cluster was not very active during that time, and then it only requires just a few failed calls to trigger the alert.
Maybe we should reconsider the alert logic to account for that?

Comment 6 Krzysztof Majcher 2023-09-26 12:44:55 UTC
Maybe it's enough to adjust the threshold.

Comment 7 João Vilaça 2023-10-16 12:35:13 UTC
@kmajcher 

I think this might only happen in the automated tests since the cluster is recently created
Does it make sense to complicate the expression if this is not happening in live clusters?

Comment 8 Krzysztof Majcher 2023-10-17 09:11:01 UTC
Please sync with Debarati and Ahmad if they agree with that.

Comment 9 Krzysztof Majcher 2023-10-17 12:51:37 UTC
We had a short discussion on this bug with Debarati, Simone and Shirly and the agreement is it would be better to have a fix. 
Please sync with them what would be the simplest fix.

Comment 10 Simone Tiraboschi 2023-10-31 13:54:44 UTC
Properly fixing can be problematic, we can add a mitigation note in the runbook saying that this could be visible just after an upgrade.


Note You need to log in before you can comment on or make changes to this bug.